Emotion & Tone(BETA)

Enable this model configuration to analyze speaker's tone (acoustic) & emotions based on spoken text (Lexical Emotion Analysis)

Overview

Emotion Analysis

The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form or free text or spoken text and is designed after the emotion wheel.
The Emotion wheel describes eight basic emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, and trust.

Emotion Types

Types of Emotions detected by enabling this model configuration in the Speech Analytics API:
Admiration Amusement Anger Annoyance Approval Caring Confusion Curiosity Desire Disappointment Disapproval Disgust Embarrassment Excitement Fear Gratitude Grief Joy Love Nervousness Optimism Pride Realization Relief Remorse Sadness Surprise Neutral

Tone Analysis

Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis.
Marsview's propritary Tone Analysis AI can detect the intonations in the tone to the statement level.

Types of Tone

Marsview is capable of detecting the following tones in an audio file:
negative positive neutral slightly-negative

modelTypeConfiguration

Keys
Value
modelType
emotion_analysis
modelConfig
Model Configuration object for emotion_analysis (No configurations)

Example Request

Curl
Python
curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/compute' \
--header 'Content-Type: application/json' \
--header "Authorization: {{Insert Auth Token With Type}}" \
--data-raw '{
"txnId": "{{Insert txn ID}}",
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : true,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":true,
"enableTopics":false
}
},
{
"modelType":"emotion_analysis"
}
]
}'
import requests
auth_token = "replace this with your auth token"
txn_id = 'Replace this with yout txn id'
request_url = "https://api.marsview.ai/cb/v1/conversation/compute"
#Note: Emotional analysis is dependant on the output from speech to text model,
# Hence both models needs to be given in the request for this to work
def get_emotion_and_tone():
payload={
"txnId": txn_id,
"enableModels":[
{
"modelType":"speech_to_text",
"modelConfig":{
"automatic_punctuation" : True,
"custom_vocabulary":["Marsview", "Communication"],
"speaker_seperation":{
"num_speakers":2
},
"enableKeywords":True,
"enableTopics":False
}
},
{
"modelType":"emotion_analysis"
},
]
}
headers = {'authorization': '{}'.format(auth_token)}
response = requests.request("POST", headers=headers, json=payload)
print(response.text)
if response.status_code == 200 and response.json()["status"] == "true":
return response.json()["data"]["enableModels"]["state"]["status"]
else:
raise Exception("Custom exception")
if __name__ == "__main__":
get_emotion_and_tone()

Example Metadata Response

"data": {
"emotion": [
{
"transcript": "Good evening teresa.",
"startTime": 1390,
"endTime": 2690,
"speaker": "1",
"tone": {
"value": "calm",
"confidence": 0.9030694961547852
},
"emotion": {
"confidence": 0.9549336433410645,
"value": "JOY"
},
"wordsPerMinute": 92.3076923076923
},
]
}

Response Object

Field
Description
emotion
A list of emotion objects
transcript
The sentence for which emotion is being analyzed
startTime
Start time of the sentence in the input Video/Audio in milliseconds.
endTime
End time of the sentence in the input Video/Audio in milliseconds.
speaker
Id of the speaker whose voice is identified in the given time frame.
tone
Object that describes the tone of the speaker
tone[value]
Tone of the speaker in the given time frame
tone[confidence]
Value indicating the models confidence in the predicted tone value
emotion(object)
Object that describes the emotion of the speaker
emotion[confidence]
Value indicating the models confidence in the predicted emotion value.
emotion[value]
Emotion of the speaker in the given time frame.
wordsPerMinute
Average words per minute spoken by the speaker.