POST Compute Request

Congratulations 🥳 Now that you have received your txnId you can start to enable and configure required models.
Different Configurable Models can be invoked on the unique txnId using the Speech Analytics APIs. Each model can be independently enabled or disabled depending on the requirement using enableModels. Each model can be configured using modelConfig.
For more information on how to configure each model refer to the Configuring Models section of the documentation.

Prerequisite Information

Metadata can be computed on any Conversation with a Transaction ID txnID
  • A Transaction ID txnID must be obtained before metadata can be computed.
  • Depending on the Type of Conversation (audio or video) some of the models will not be available. (Eg. Screengrabs model is only available for video-based Conversations)
  • Some models are dependent on the output of other models and the dependent models have to be enabled. This dependency can be understood in the Flow of Data and stacking models section.
  • Each Compute request will generate a requestIdThe progress of the model and its output metadata can be fetched using the requestId . More information on request ID can be obtained from the What is requestId section.

Enable & Configure Models using enableModels andmodelConfig

Using txnId you can now select models usingenableModels and Configure Models using modelConfig and submit a request using POST Compute Request.
Each enabled model will be given a unique requestId.
The following are the models that can be enabled using the txnId . Click on each of the Models to learn how to enable and configure it.
Models
Description
Marsview Automatic Speech Recognition (ASR) technology accurately converts speech into text in live or batch mode. API can be deployed in the cloud or on-premise. Get superior accuracy, speaker separation, punctuation, casing, word-level time markers, and more. (Supported Language: English)
automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker
Extract the most relevant topics, concepts, discussion points from the conversation are generated based on each paragraph spoken (Topics by Sentence).
Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis.
Marsview is capable of detecting the following tone in an audio file:
  • Calm
  • Happy
  • Sad
  • Angry
  • Fearful
  • Disgust
  • Surprised
The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form of free text or spoken text and is designed after the emotion wheel.
Marsview is capable of detecting the following Emotions in an audio file:
  • Anger
  • Anticipation
  • Disgust
  • Fear
  • Joy
  • Love
  • Optimism
  • Pessimism
  • Sadness
  • Surprise
  • Trust
Sentiment Analysis will help you interpret and quantify if the conversation in the audio or text is Positive, Negative, or Neutral.
Speech Type model helps you understand the type of conversation at any given time. Every phone call, online or offline conversation can be broadly classified into four categories - Statement, Command, Action Item, or a Question.
Action Item API detects an event, task, activity, or action that needs to take place in the future (after the conversation). These Action items can be of high priority with a definite assignee and due date or of lower priority with a non-definite due date.
All action items are generated with action phrases, assignees, and due dates to make the output immediately consumable by your CRM or project management tools
Automatically identify and detect questions or requests posed during the conversation and also the apt response in the conversation in a consumable form. The API detects the Question and Response by the speaker.
Summary
Extractive summarization aims at identifying the salient information that is then extracted and grouped together to form a concise summary.
Captures keyframes and slides from videos and screen sharing from an online web conference.
Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API.
Marsview detects the following Screen Activity:
  • Screen Share
  • Interaction
  • Whiteboard
  • Presentation
To learn more on how to Configure Models goto:

Flow of Data and Stacking Models

Shown above is the flow of data between different models (left to right).
In the above diagram, the arrows show the flow of data from one model to another. Some of these models are dependent on the output of previous models. (For Example: Sentiment Analysis model is dependent on the outputs of Speech to Text and Diarization model outputs).
Therefore in the case where Sentiment Analysis has to be enabled the Compute Request API has to be configured to enableModels (Speech to text , Diarization and Sentiment analysis models).Shown below is the sample configuration for the same.
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig": {
"speaker_seperation":{
"num_speakers":2
}
}
},
{
"modelType":"sentiment_analysis",
}
]

Stacking of requests

Requests for models can be stacked in the same API call or they can be sent in separate API calls. Shown below are two tabs with examples for the same.
Stacked Requests
Separate Requests
----------------------> API CALL #1
"enableModels":[
{
"type":"speech_to_text",
},
{
"type":"sentiment_analysis",
},
{
"type":"emotion_analysis",
}
]
----------> API CALL #1
"enableModels":[
{
"modelType":"speech_to_text",
}
]
----------> API CALL #2
"enableModels":[
{
"modelType":"sentiment_analysis",
},
{
"modelType":"emotion_analysis",
}
]
Note that when you are sending in Separate API Calls on the same txnId, Requests from previous API Calls must be in "completed" or "error" state
post
https://api.marsview.ai/cb/v1/conversation/compute
Compute Request

What is a requestId?

For each Transaction ID txnId, multiple models can be requested to be computed, and for each one of these requests, a unique requestId is created.
Using the requestId the model progress and the model metadata output can be obtained.

States of a Request ID

State
Description
Uploaded
When the model request is uploaded/queued by the system. In this state, the metadata generation by the model is either in progress or in queued State
Processed
When the model request has finished processing.
Error
When there was an error processing the request.

Example: How to compute only STT and Diarization on a transaction ID?

Step 1: Get the authentication token.

Using your apiKey and apiSecretyou can generate the token as shown below.
Curl
Python
curl --location --request POST 'https://api.marsview.ai/cb/v1/auth/create_access_token' \
--header 'Content-Type: application/json' \
--data-raw '{
"apiKey": "{{Insert API Key}}",
"apiSecret": "{{Insert API Secret}}",
"userId": "[email protected]"
}'
import requests
userId = 'Paste the user ID here'
apiKey = 'Paste the API key here'
apiSecret = 'Paste the API secret here'
def get_token():
url = "https://api.marsview.ai/cb/v1/auth/get_access_token"
payload={"apiKey":apiKey,
"apiSecret":apiSecret,
"userId":userId}
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, json=payload)
print(response.text)
if __name__ == "__main__":
get_token()
Step 2: Send a Compute Request
Curl
Python
curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/compute' \
--header 'Content-Type: application/json' \
--header 'authorization: <Your access token>' \
--data-raw '{
"txnId": "your txn id",
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"topics":true
}
}
]
}'
import requests
auth_token = "Bearer <API TOKEN>"
txnId = "Your transaction ID"
def compute_request():
url = "https://api.marsview.ai/cb/v1/conversation/{userId}/compute"
payload={
"txnId":txnId,
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : True,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":True
}
},
{
"modelType":"emotion_analysis"
},
{
"modelType":"sentiment_analysis"
},
{
"modelType":"speech_type_analysis"
},
{
"modelType":"action_items",
"modelConfig":{
"priority": 1
}
},
{
"modelType":"question_response",
"modelConfig":{
"quality" : 1
}
},
{
"modelType":"extractive_summary"
},
{
"modelType":"meeting_topics"
},
{
"modelType":"screengrabs",
"modelConfig":{
"ocr":{
"enable":True
}
}
},
{
"modelType":"screen_activity"
}
]
}
headers = {
'Content-Type': 'application/json',
'authorization': auth_token
}
print(url.format(userId=userId))
response = requests.request("POST", url.format(userId=userId), headers=headers, json=payload)
print(response.text)
if __name__ == "__main__":
compute_request()

Example Response for POST Compute Request

{
"status": true,
"data": {
"requestId": [
{
"type": "speech_to_text",
"requestId": "req-1c6q6f7dkq1y6lm0-1623997503719"
},
{
"type": "emotion_analysis",
"requestId": "req-1c6q6f7dkq1y6lm1-1623997503719"
},
{
"type": "sentiment_analysis",
"requestId": "req-1c6q6f7dkq1y6lm2-1623997503719"
},
{
"type": "speech_type_analysis",
"requestId": "req-1c6q6f7dkq1y6lm3-1623997503719"
},
{
"type": "action_items",
"requestId": "req-1c6q6f7dkq1y6lm4-1623997503719"
},
{
"type": "question_response",
"requestId": "req-1c6q6f7dkq1y6lm5-1623997503719"
},
{
"type": "extractive_summary",
"requestId": "req-1c6q6f7dkq1y6lm6-1623997503719"
},
{
"type": "meeting_topics",
"requestId": "req-1c6q6f7dkq1y6lm7-1623997503719"
},
{
"type": "screengrabs",
"requestId": "req-1c6q6f7dkq1y6lm8-1623997503719"
},
{
"type": "screen_activity",
"requestId": "req-1c6q6f7dkq1y6lm9-1623997503719"
}
]
}
}