POST Compute Request

Congratulations 🥳 Now that you have received your txnId you can start to enable and configure required models.

Different Configurable Models can be invoked on the unique txnId using the Speech Analytics APIs. Each model can be independently enabled or disabled depending on the requirement using enableModels. Each model can be configured using modelConfig.

For more information on how to configure each model refer to the Configuring Models section of the documentation.

Prerequisite Information

Metadata can be computed on any Conversation with a Transaction ID txnID

  • A Transaction ID txnID must be obtained before metadata can be computed.

  • Depending on the Type of Conversation (audio or video) some of the models will not be available. (Eg. Screengrabs model is only available for video-based Conversations)

  • Some models are dependent on the output of other models and the dependent models have to be enabled. This dependency can be understood in the Flow of Data and stacking models section.

  • Each Compute request will generate a requestIdThe progress of the model and its output metadata can be fetched using the requestId . More information on request ID can be obtained from the What is requestId section.

Enable & Configure Models using enableModels andmodelConfig

Using txnId you can now select models usingenableModels and Configure Models using modelConfig and submit a request using POST Compute Request.

Each enabled model will be given a unique requestId.

The following are the models that can be enabled using the txnId . Click on each of the Models to learn how to enable and configure it.

Models

Description

Marsview Automatic Speech Recognition (ASR) technology accurately converts speech into text in live or batch mode. API can be deployed in the cloud or on-premise. Get superior accuracy, speaker separation, punctuation, casing, word-level time markers, and more. (Supported Language: English)

automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker

Extract the most relevant topics, concepts, discussion points from the conversation are generated based on each paragraph spoken (Topics by Sentence).

Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis.

Marsview is capable of detecting the following tone in an audio file:

  • Calm

  • Happy

  • Sad

  • Angry

  • Fearful

  • Disgust

  • Surprised

The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form of free text or spoken text and is designed after the emotion wheel.

Marsview is capable of detecting the following Emotions in an audio file:

  • Anger

  • Anticipation

  • Disgust

  • Fear

  • Joy

  • Love

  • Optimism

  • Pessimism

  • Sadness

  • Surprise

  • Trust

Sentiment Analysis will help you interpret and quantify if the conversation in the audio or text is Positive, Negative, or Neutral.

Speech Type model helps you understand the type of conversation at any given time. Every phone call, online or offline conversation can be broadly classified into four categories - Statement, Command, Action Item, or a Question.

Action Item API detects an event, task, activity, or action that needs to take place in the future (after the conversation). These Action items can be of high priority with a definite assignee and due date or of lower priority with a non-definite due date.

All action items are generated with action phrases, assignees, and due dates to make the output immediately consumable by your CRM or project management tools

Automatically identify and detect questions or requests posed during the conversation and also the apt response in the conversation in a consumable form. The API detects the Question and Response by the speaker.

Extractive summarization aims at identifying the salient information that is then extracted and grouped together to form a concise summary.

Captures keyframes and slides from videos and screen sharing from an online web conference.

Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API.

Marsview detects the following Screen Activity:

  • Screen Share

  • Interaction

  • Whiteboard

  • Presentation

To learn more on how to Configure Models goto:

pageConfiguring Models

Flow of Data and Stacking Models

In the above diagram, the arrows show the flow of data from one model to another. Some of these models are dependent on the output of previous models. (For Example: Sentiment Analysis model is dependent on the outputs of Speech to Text and Diarization model outputs).

Therefore in the case where Sentiment Analysis has to be enabled the Compute Request API has to be configured to enableModels (Speech to text , Diarization and Sentiment analysis models).Shown below is the sample configuration for the same.

"enableModels":[
    {
        "modelType":"speech_to_text",
        "modelConfig": {
            "speaker_seperation":{
                "num_speakers":2
            }
        }
    },
    {
        "modelType":"sentiment_analysis",
    }
]

Stacking of requests

Requests for models can be stacked in the same API call or they can be sent in separate API calls. Shown below are two tabs with examples for the same.

----------------------> API CALL #1

"enableModels":[
    {
        "type":"speech_to_text",
    },
    {
        "type":"sentiment_analysis",
    },
    {
        "type":"emotion_analysis",
    }
]

Note that when you are sending in Separate API Calls on the same txnId, Requests from previous API Calls must be in "completed" or "error" state

Compute Request

POST https://api.marsview.ai/cb/v1/conversation/compute

Headers

NameTypeDescription

authorization

string

The JWT token used for authentication with type

application-type

string

application/json

Request Body

NameTypeDescription

txnId

string

Transaction ID generated by the file upload request

enableModels

array

A list of models/analysis that needs to be done on the input video/audio

modelType

string

A string used to specify the model/analysis that needs to be done on the input video/audio.A request can have multiple modelTypes.

modelConfig

object

An object used to specify model specific configurations

{
    "status": true,
    "data": {
        "requestId": [
            {
                "type": "speech_to_text",
                "requestId": "req-1c6q6fnvkq6vp1gl-1624295656100"
            },
            {
                "type": "emotion_analysis",
                "requestId": "req-1c6q6fnvkq6vp1gm-1624295656100"
            },
            {
                "type": "sentiment_analysis",
                "requestId": "req-1c6q6fnvkq6vp1gn-1624295656100"
            },
            {
                "type": "speech_type_analysis",
                "requestId": "req-1c6q6fnvkq6vp1go-1624295656100"
            },
            {
                "type": "action_items",
                "requestId": "req-1c6q6fnvkq6vp1gp-1624295656100"
            },
            {
                "type": "question_response",
                "requestId": "req-1c6q6fnvkq6vp1gq-1624295656100"
            },
            {
                "type": "extractive_summary",
                "requestId": "req-1c6q6fnvkq6vp1gr-1624295656100"
            },
            {
                "type": "meeting_topics",
                "requestId": "req-1c6q6fnvkq6vp1gs-1624295656100"
            },
            {
                "type": "screengrabs",
                "requestId": "req-1c6q6fnvkq6vp1gt-1624295656100"
            },
            {
                "type": "screen_activity",
                "requestId": "req-1c6q6fnvkq6vp1gu-1624295656100"
            }
        ]
    }
}

What is a requestId?

For each Transaction ID txnId, multiple models can be requested to be computed, and for each one of these requests, a unique requestId is created.

Using the requestId the model progress and the model metadata output can be obtained.

States of a Request ID

State

Description

Uploaded

When the model request is uploaded/queued by the system. In this state, the metadata generation by the model is either in progress or in queued State

Processed

When the model request has finished processing.

Error

When there was an error processing the request.

Example: How to compute only STT and Diarization on a transaction ID?

Step 1: Get the authentication token.

Using your apiKey and apiSecretyou can generate the token as shown below.

curl --location --request POST 'https://api.marsview.ai/cb/v1/auth/create_access_token' \
--header 'Content-Type: application/json' \
--data-raw '{
    "apiKey":    "{{Insert API Key}}",
    "apiSecret": "{{Insert API Secret}}",
	  "userId":    "demo@marsview.ai"
}'

Step 2: Send a Compute Request

curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/compute' \
--header 'Content-Type: application/json' \
--header 'authorization: <Your access token>' \
--data-raw '{
    "txnId": "your txn id",
    "enableModels":[
        {
        "modelType":"speech_to_text",
        "modelConfig":{
            "custom_vocabulary":["Marsview", "Communication"],
            "speaker_seperation":{
                "num_speakers":2
            },
            "topics":true
            }
        }
    ]
}'

Example Response for POST Compute Request

{
    "status": true,
    "data": {
        "requestId": [
            {
                "type": "speech_to_text",
                "requestId": "req-1c6q6f7dkq1y6lm0-1623997503719"
            },
            {
                "type": "emotion_analysis",
                "requestId": "req-1c6q6f7dkq1y6lm1-1623997503719"
            },
            {
                "type": "sentiment_analysis",
                "requestId": "req-1c6q6f7dkq1y6lm2-1623997503719"
            },
            {
                "type": "speech_type_analysis",
                "requestId": "req-1c6q6f7dkq1y6lm3-1623997503719"
            },
            {
                "type": "action_items",
                "requestId": "req-1c6q6f7dkq1y6lm4-1623997503719"
            },
            {
                "type": "question_response",
                "requestId": "req-1c6q6f7dkq1y6lm5-1623997503719"
            },
            {
                "type": "extractive_summary",
                "requestId": "req-1c6q6f7dkq1y6lm6-1623997503719"
            },
            {
                "type": "meeting_topics",
                "requestId": "req-1c6q6f7dkq1y6lm7-1623997503719"
            },
            {
                "type": "screengrabs",
                "requestId": "req-1c6q6f7dkq1y6lm8-1623997503719"
            },
            {
                "type": "screen_activity",
                "requestId": "req-1c6q6f7dkq1y6lm9-1623997503719"
            }
        ]
    }
}

Last updated