Speech Insights (BETA)

Enable this model configuration to get useful conversational insights that can measure or help measure many of your KPIs.
This API is in BETA and will be provided on request. Please contact [email protected] to enable this API.

Overview

Perform in-depth analysis of conversational data to visualize trends on topics, sentiments, keywords, and behaviors to achieve better outcomes.
Marsview provides a way to capture the engagement level of speakers in real-time. Additionally, you can track user sentiment and emotions along with engagement data.

Insights

For each conversation/file uploaded it returns the following
Insight
Description
Talk-to-listen Ratio
Speaker’s talk and listen ratio and time
Speech Insights
Insights based on speakers such as- Longest monologue, filler words used, speech clarity, etc.
Call Sentiment Score
Gives an overall assessment of the conversation sentiment based on the sentiments, emotions, and tone used in the conversation
Call Engagement Score
Gives an overall assessment of the conversation engagement based on the talk-time, dead air, and other factors.
Call Score
Scores the call based on different quantitative and qualitative measurements of the conversation. This can be further customized to the business need.
Avg. Speech Speed
Get speech speed by the speaker in terms of WPM (words per minute)
Sentiment vs Time
Capture variations in sentiment over the course of the call by each speaker individually and combined.
Phrase Cloud (by Topics Type)
Captures salient topics found or spoken in the conversation.
Topic Sentiment over Time
Capture variations in sentiment over the course of the call by each speaker individually and combined along with the corresponding topics mentioned.
Speaker Emotions over Time
Capture variations in emotions over the course of the call by each speaker individually and combined.
Dead Air
timestamps of dead air (silence) found during the conversation

modelTypeConfiguration

Key
Value
modelType
data_insights
modelConfig
Model Configuration object for data_insights

modelConfig Parameters

modelConfig
Description
Defaults
dead_air.threshold
The time threshold(in milliseconds) beyond which silence in a meeting should be considered as dead air time.
3000

Example Request

Curl
Python
curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/{{userId}}/compute' \
--header 'Content-Type: application/json' \
--header "Authorization:{{Insert Auth Token}}" \
--data-raw '{
"userId":"{{Insert User ID}}",
"txnId": "{{Insert txn ID}}",
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : true,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":true,
"enableTopics":true
}
},
{
"modelType":"emotion_analysis"
},
{
"modelType":"sentiment_analysis"
},
{
"modelType":"data_insights",
"modelConfig": {
"dead_air": {
"threshold": 3000
}
}
}
]
}'
import requests
user_id = "[email protected]"
auth_token = "replace this with your auth token"
txn_id = "Replace this with your transaction ID"
#Note: the speech to text model does not depends on any other models, hence
#can be used independently
def get_speech_insights():
url = "https://api.marsview.ai/cb/v1/conversation/{user_id}/compute"
payload={
"userId":user_id,
"txnId": txn_id,
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : True,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":True,
"enableTopics":False
}
},
{
"modelType":"emotion_analysis"
},
{
"modelType":"sentiment_analysis"
},
{
"modelType":"data_insights",
"modelConfig": {
"dead_air": {
"threshold": 3000
}
}
}
]
}
headers = {'authorization': '{}'.format(auth_token)}
response = requests.request("POST", url.format(user_id=user_id), headers=headers, json=payload)
print(response.text)
if response.status_code == 200 and response.json()["status"] == "true":
return response.json()["data"]["enableModels"]["state"]["status"]
else:
raise Exception("Custom exception")
if __name__ == "__main__":
get_speech_insights()

Example Metadata Response

"data":{
"dataInsights": {
"meetingInsights": {
"meetingSentiment": [
{
"sentiment": "Very Positive",
"value": 0.17777777777777778
},
{
"sentiment": "Mostly Positive",
"value": 0.15555555555555556
},
{
"sentiment": "Neutral",
"value": 0.6666666666666666
},
{
"sentiment": "Mostly Negative",
"value": 0
},
{
"sentiment": "Very Negative",
"value": 0
}
],
"meetingEmotion": [
{
"emotion": "joy",
"value": 0.007523583540714161
},
{
"emotion": "optimism",
"value": 0.3097979975693039
},
{
"emotion": "anticipation",
"value": 0.2518085595231206
},
{
"emotion": "Misc",
"value": 0.2654667631228659
},
{
"emotion": "anger",
"value": 0.07488859308987789
},
{
"emotion": "fear",
"value": 0.08391689912610677
},
{
"emotion": "sadness",
"value": 0.00659760402801088
}
],
"conversationStartTime": 1390,
"engagementRatio": 0.9813153112221717,
"keywords": [
{
"keyword": "will",
"frequency": 2
},
],
"deadAir": 0.015454716272078131
},
"speakerInsights": {
"speakers": [
"-1"
],
"speakersTalktimePc": {
"-1": 0.007523583540714161
},
"speakersTalktime": {
"-1": 1300
},
"speakersMonologue": {
"-1": 13750
},
"speakersEmotion": {
"-1": [
{
"emotion": "joy",
"value": 0.007523583540714161
},
{
"emotion": "optimism",
"value": 0.3097979975693039
},
{
"emotion": "anticipation",
"value": 0.2518085595231206
},
{
"emotion": "Misc",
"value": 0.2654667631228659
},
{
"emotion": "anger",
"value": 0.07488859308987789
},
{
"emotion": "fear",
"value": 0.08391689912610677
},
{
"emotion": "sadness",
"value": 0.00659760402801088
}
]
},
"speakersSentiment": {
"-1": [
{
"sentiment": "Very Positive",
"value": 0.17777777777777778
},
{
"sentiment": "Mostly Positive",
"value": 0.15555555555555556
},
{
"sentiment": "Neutral",
"value": 0.6666666666666666
},
{
"sentiment": "Mostly Negative",
"value": 0
},
{
"sentiment": "Very Negative",
"value": 0
}
]
},
"speakerAvgWpm": {
"-1": 163.55519043739972
}
},
"transcriptInsights": [
{
"sentence": "I am currently attending Nicholas State University to complete my degree in secondary education with a focus on social studies.",
"startTime": 56180,
"endTime": 64779.999,
"speaker": "-1",
"topics": [
{
"tiers": [
{
"tierName": "Education",
"type": 1
}
],
"name": "Secondary Education"
},
],
"keywords": [
"state",
"education",
"focus",
"Nicholas State University"
],
"speechType": "statement",
"speechTypeConfidence": 0.9999955892562866,
"sentiment": "Neutral",
"polarity": -0.041666666666666664,
"subjectivity": 0.2916666666666667,
"tone": "angry",
"toneConfidence": 0.7229840755462646,
"emotion": "optimism",
"emotionConfidence": 0.640371561050415,
"wordsPerMinute": 132.57355506454238
},
]
},
},

Response Objects

Field
Description
dataInsights
Data insights object containing all the insights of the given audio.video
transcriptInsights
List of trabscript insight objects for each sentence identified by the model
meetingInsights
Object containing all the insights of the meeting
speakerInsights
Object containing all the insights of the speaker in the meeting

transcriptInsights List<Objects>

Field
Description
sentence
Sentence Identified in the given time frame
startTime
Start time of the sentence in the input Video/Audio in milliseconds
endTime
End time of the sentence in the input Video/Audio in milliseconds
speaker
Speaker id whose voice is identified in the given time frame
topics
List of topic object identified in the given time frame
keywords
List of keywords found in the given sentence
speechType
The type of speech best representing the sentence identified in the given time frame eg: Statement, Question,
speechTypeConfidence
The models confidence in the predicted speechType
sentiment
Sentiment of the speaker during the given time frame .
polarity
Integer representation of the sentiment of the speaker. Can have values between -1 and 1. -1 being very negative and 1 being very positive.
subjectivity
A scale of how much the sentence is based on facts and figures. A high subjectivity indicates that the information given by the speaker is not based on facts and that it is highly subjective.
tone
Tone of the speaker in the given time frame
toneConfidence
Value indicating the models confidence in the predicted tone value
emotion
Emotion of the speaker in the given time frame.
emotionConfidence
Value indicating the models confidence in the predicted emotion value.
wordsPerMinute
Average words per minute spoken by the speaker in the given time frame.

meetingInsights Object

Key
Description
meetingSentiment
List of meeting sentiment objects
meetingSentiment.sentiment
A specific sentiment identified in the meeting
meetingSentiment.value
Value specifying the presence if the given sentiment in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same.
meetingEmotion
List of meeting emotion objects
meetingEmotion.emotion
A specific emotion identified in the meeting
meetingEmotion.value
Value specifying the presence if the given emotion in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that emotion was present. Multiplying this with 100 will give you a percentage representation of the same.
conversationStartTime
Point of time at which the first conversation was initiated in the meeting. Time given is in milliseconds
engagementRatio
Value indicating how active the meeting was. This value can range between 0 and 1, 0 being no activity at all and 1 being active throughout.
keywords
List of keyword objects identified in the meeting
keywords.keyword
A specific keyword identified in the meeting
keywords.frequency
Frequency of the given keyword in the meeting
deadAir
The calculated inactive time in the meeting. This will vary depending upon the dead air threshold given

speakerInsights Object

Key
Value
speaker
List of speakers in present in the meeting
speakersTalktimePc
Object representing the talk time ratio of each user in the meeting
speakersTalktime
Object representing the talk time in milliseconds of each user in the meeting
speakersMonologue
speakersEmotion
Different emotions and their ratios for all users in the meeting. This can help identify the emotion of specific users during the meeting.
speakersEmotion.userId[index].emotion
A specific emotin of a specific user during the meeting
speakersEmotion.userId[index].value
Value specifying the presence if the given emotion for a specific user in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same.
speakerAvgWpm
The average words per minute spoken by the speaker.