Speech Insights (BETA)
Enable this model configuration to get useful conversational insights that can measure or help measure many of your KPIs.
This API is in BETA and will be provided on request. Please contact [email protected] to enable this API.
Perform in-depth analysis of conversational data to visualize trends on topics, sentiments, keywords, and behaviors to achieve better outcomes.
Marsview provides a way to capture the engagement level of speakers in real-time. Additionally, you can track user sentiment and emotions along with engagement data.
For each conversation/file uploaded it returns the following
Insight | Description |
Talk-to-listen Ratio | Speaker’s talk and listen ratio and time |
Speech Insights | Insights based on speakers such as- Longest monologue, filler words used, speech clarity, etc. |
Call Sentiment Score | Gives an overall assessment of the conversation sentiment based on the sentiments, emotions, and tone used in the conversation |
Call Engagement Score | Gives an overall assessment of the conversation engagement based on the talk-time, dead air, and other factors. |
Call Score | Scores the call based on different quantitative and qualitative measurements of the conversation. This can be further customized to the business need. |
Avg. Speech Speed | Get speech speed by the speaker in terms of WPM (words per minute) |
Sentiment vs Time | Capture variations in sentiment over the course of the call by each speaker individually and combined. |
Phrase Cloud (by Topics Type) | |
Topic Sentiment over Time | Capture variations in sentiment over the course of the call by each speaker individually and combined along with the corresponding topics mentioned. |
Speaker Emotions over Time | Capture variations in emotions over the course of the call by each speaker individually and combined. |
Dead Air | timestamps of dead air (silence) found during the conversation |
Key | Value |
modelType | data_insights |
modelConfig | Model Configuration object for data_insights |
modelConfig | Description | Defaults |
dead_air.threshold | The time threshold(in milliseconds) beyond which silence in a meeting should be considered as dead air time. | 3000 |
Curl
Python
curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/{{userId}}/compute' \
--header 'Content-Type: application/json' \
--header "Authorization:{{Insert Auth Token}}" \
--data-raw '{
"userId":"{{Insert User ID}}",
"txnId": "{{Insert txn ID}}",
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : true,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":true,
"enableTopics":true
}
},
{
"modelType":"emotion_analysis"
},
{
"modelType":"sentiment_analysis"
},
{
"modelType":"data_insights",
"modelConfig": {
"dead_air": {
"threshold": 3000
}
}
}
]
}'
import requests
user_id = "[email protected]"
auth_token = "replace this with your auth token"
txn_id = "Replace this with your transaction ID"
#Note: the speech to text model does not depends on any other models, hence
#can be used independently
def get_speech_insights():
url = "https://api.marsview.ai/cb/v1/conversation/{user_id}/compute"
payload={
"userId":user_id,
"txnId": txn_id,
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : True,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":True,
"enableTopics":False
}
},
{
"modelType":"emotion_analysis"
},
{
"modelType":"sentiment_analysis"
},
{
"modelType":"data_insights",
"modelConfig": {
"dead_air": {
"threshold": 3000
}
}
}
]
}
headers = {'authorization': '{}'.format(auth_token)}
response = requests.request("POST", url.format(user_id=user_id), headers=headers, json=payload)
print(response.text)
if response.status_code == 200 and response.json()["status"] == "true":
return response.json()["data"]["enableModels"]["state"]["status"]
else:
raise Exception("Custom exception")
if __name__ == "__main__":
get_speech_insights()
"data":{
"dataInsights": {
"meetingInsights": {
"meetingSentiment": [
{
"sentiment": "Very Positive",
"value": 0.17777777777777778
},
{
"sentiment": "Mostly Positive",
"value": 0.15555555555555556
},
{
"sentiment": "Neutral",
"value": 0.6666666666666666
},
{
"sentiment": "Mostly Negative",
"value": 0
},
{
"sentiment": "Very Negative",
"value": 0
}
],
"meetingEmotion": [
{
"emotion": "joy",
"value": 0.007523583540714161
},
{
"emotion": "optimism",
"value": 0.3097979975693039
},
{
"emotion": "anticipation",
"value": 0.2518085595231206
},
{
"emotion": "Misc",
"value": 0.2654667631228659
},
{
"emotion": "anger",
"value": 0.07488859308987789
},
{
"emotion": "fear",
"value": 0.08391689912610677
},
{
"emotion": "sadness",
"value": 0.00659760402801088
}
],
"conversationStartTime": 1390,
"engagementRatio": 0.9813153112221717,
"keywords": [
{
"keyword": "will",
"frequency": 2
},
],
"deadAir": 0.015454716272078131
},
"speakerInsights": {
"speakers": [
"-1"
],
"speakersTalktimePc": {
"-1": 0.007523583540714161
},
"speakersTalktime": {
"-1": 1300
},
"speakersMonologue": {
"-1": 13750
},
"speakersEmotion": {
"-1": [
{
"emotion": "joy",
"value": 0.007523583540714161
},
{
"emotion": "optimism",
"value": 0.3097979975693039
},
{
"emotion": "anticipation",
"value": 0.2518085595231206
},
{
"emotion": "Misc",
"value": 0.2654667631228659
},
{
"emotion": "anger",
"value": 0.07488859308987789
},
{
"emotion": "fear",
"value": 0.08391689912610677
},
{
"emotion": "sadness",
"value": 0.00659760402801088
}
]
},
"speakersSentiment": {
"-1": [
{
"sentiment": "Very Positive",
"value": 0.17777777777777778
},
{
"sentiment": "Mostly Positive",
"value": 0.15555555555555556
},
{
"sentiment": "Neutral",
"value": 0.6666666666666666
},
{
"sentiment": "Mostly Negative",
"value": 0
},
{
"sentiment": "Very Negative",
"value": 0
}
]
},
"speakerAvgWpm": {
"-1": 163.55519043739972
}
},
"transcriptInsights": [
{
"sentence": "I am currently attending Nicholas State University to complete my degree in secondary education with a focus on social studies.",
"startTime": 56180,
"endTime": 64779.999,
"speaker": "-1",
"topics": [
{
"tiers": [
{
"tierName": "Education",
"type": 1
}
],
"name": "Secondary Education"
},
],
"keywords": [
"state",
"education",
"focus",
"Nicholas State University"
],
"speechType": "statement",
"speechTypeConfidence": 0.9999955892562866,
"sentiment": "Neutral",
"polarity": -0.041666666666666664,
"subjectivity": 0.2916666666666667,
"tone": "angry",
"toneConfidence": 0.7229840755462646,
"emotion": "optimism",
"emotionConfidence": 0.640371561050415,
"wordsPerMinute": 132.57355506454238
},
]
},
},
Field | Description |
dataInsights | Data insights object containing all the insights of the given audio.video |
transcriptInsights | List of trabscript insight objects for each sentence identified by the model |
meetingInsights | Object containing all the insights of the meeting |
speakerInsights | Object containing all the insights of the speaker in the meeting |
Field | Description |
sentence | Sentence Identified in the given time frame |
startTime | Start time of the sentence in the input Video/Audio in milliseconds |
endTime | End time of the sentence in the input Video/Audio in milliseconds |
speaker | Speaker id whose voice is identified in the given time frame |
topics | List of topic object identified in the given time frame |
keywords | List of keywords found in the given sentence |
speechType | The type of speech best representing the sentence identified in the given time frame eg: Statement, Question, |
speechTypeConfidence | The models confidence in the predicted speechType |
sentiment | Sentiment of the speaker during the given time frame . |
polarity | Integer representation of the sentiment of the speaker. Can have values between -1 and 1. -1 being very negative and 1 being very positive. |
subjectivity | A scale of how much the sentence is based on facts and figures. A high subjectivity indicates that the information given by the speaker is not based on facts and that it is highly subjective. |
tone | Tone of the speaker in the given time frame |
toneConfidence | Value indicating the models confidence in the predicted tone value |
emotion | Emotion of the speaker in the given time frame. |
emotionConfidence | Value indicating the models confidence in the predicted emotion value. |
wordsPerMinute | Average words per minute spoken by the speaker in the given time frame. |
Key | Description |
meetingSentiment | List of meeting sentiment objects |
meetingSentiment.sentiment | A specific sentiment identified in the meeting |
meetingSentiment.value | Value specifying the presence if the given sentiment in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same. |
meetingEmotion | List of meeting emotion objects |
meetingEmotion .emotion | A specific emotion identified in the meeting |
meetingEmotion .value | Value specifying the presence if the given emotion in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that emotion was present. Multiplying this with 100 will give you a percentage representation of the same. |
conversationStartTime | Point of time at which the first conversation was initiated in the meeting. Time given is in milliseconds |
engagementRatio | Value indicating how active the meeting was. This value can range between 0 and 1, 0 being no activity at all and 1 being active throughout. |
keywords | List of keyword objects identified in the meeting |
keywords .keyword | A specific keyword identified in the meeting |
keywords .frequency | Frequency of the given keyword in the meeting |
deadAir | The calculated inactive time in the meeting. This will vary depending upon the dead air threshold given |
Key | Value |
speaker | List of speakers in present in the meeting |
speakersTalktimePc | Object representing the talk time ratio of each user in the meeting |
speakersTalktime | Object representing the talk time in milliseconds of each user in the meeting |
speakersMonologue | |
speakersEmotion | Different emotions and their ratios for all users in the meeting. This can help identify the emotion of specific users during the meeting. |
speakersEmotion.userId[index].emotion | A specific emotin of a specific user during the meeting |
speakersEmotion.userId[index].value | Value specifying the presence if the given emotion for a specific user in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same. |
speakerAvgWpm | The average words per minute spoken by the speaker. |
Last modified 1yr ago