Speech Insights (BETA)

Enable this model configuration to get useful conversational insights that can measure or help measure many of your KPIs.

This API is in BETA and will be provided on request. Please contact support@marsview.ai to enable this API.

Overview

Perform in-depth analysis of conversational data to visualize trends on topics, sentiments, keywords, and behaviors to achieve better outcomes.

Marsview provides a way to capture the engagement level of speakers in real-time. Additionally, you can track user sentiment and emotions along with engagement data.

Insights

For each conversation/file uploaded it returns the following

Insight

Description

Talk-to-listen Ratio

Speaker’s talk and listen ratio and time

Speech Insights

Insights based on speakers such as- Longest monologue, filler words used, speech clarity, etc.

Call Sentiment Score

Gives an overall assessment of the conversation sentiment based on the sentiments, emotions, and tone used in the conversation

Call Engagement Score

Gives an overall assessment of the conversation engagement based on the talk-time, dead air, and other factors.

Call Score

Scores the call based on different quantitative and qualitative measurements of the conversation. This can be further customized to the business need.

Avg. Speech Speed

Get speech speed by the speaker in terms of WPM (words per minute)

Sentiment vs Time

Capture variations in sentiment over the course of the call by each speaker individually and combined.

Phrase Cloud (by Topics Type)

Captures salient topics found or spoken in the conversation.

Topic Sentiment over Time

Capture variations in sentiment over the course of the call by each speaker individually and combined along with the corresponding topics mentioned.

Speaker Emotions over Time

Capture variations in emotions over the course of the call by each speaker individually and combined.

Dead Air

timestamps of dead air (silence) found during the conversation

modelTypeConfiguration

Key

Value

modelType

data_insights

modelConfig

Model Configuration object for data_insights

modelConfig Parameters

modelConfig

Description

Defaults

dead_air.threshold

The time threshold(in milliseconds) beyond which silence in a meeting should be considered as dead air time.

3000

Example Request

curl --location --request POST 'https://api.marsview.ai/cb/v1/conversation/{{userId}}/compute' \
--header 'Content-Type: application/json' \
--header "Authorization:{{Insert Auth Token}}" \
--data-raw '{
        "userId":"{{Insert User ID}}",
        "txnId": "{{Insert txn ID}}",
        "enableModels":[
            {
            "modelType":"speech_to_text",
                "modelConfig":{
                    "automatic_punctuation" : true,
                    "custom_vocabulary":["Marsview", "Communication"],
                    "speaker_seperation":{
                        "num_speakers":2
                    },
                    "enableKeywords":true,
                    "enableTopics":true
                    }  
            },
            {
            "modelType":"emotion_analysis"
            },
            {
            "modelType":"sentiment_analysis"
            },
            {
            "modelType":"data_insights",
            "modelConfig": {
                "dead_air": {
                    "threshold": 3000
                    }
                }
            }
        ]
}'

Example Metadata Response

"data":{
        "dataInsights": {
            "meetingInsights": {
                "meetingSentiment": [
                    {
                        "sentiment": "Very Positive",
                        "value": 0.17777777777777778
                    },
                    {
                        "sentiment": "Mostly Positive",
                        "value": 0.15555555555555556
                    },
                    {
                        "sentiment": "Neutral",
                        "value": 0.6666666666666666
                    },
                    {
                        "sentiment": "Mostly Negative",
                        "value": 0
                    },
                    {
                        "sentiment": "Very Negative",
                        "value": 0
                    }
                ],
                "meetingEmotion": [
                    {
                        "emotion": "joy",
                        "value": 0.007523583540714161
                    },
                    {
                        "emotion": "optimism",
                        "value": 0.3097979975693039
                    },
                    {
                        "emotion": "anticipation",
                        "value": 0.2518085595231206
                    },
                    {
                        "emotion": "Misc",
                        "value": 0.2654667631228659
                    },
                    {
                        "emotion": "anger",
                        "value": 0.07488859308987789
                    },
                    {
                        "emotion": "fear",
                        "value": 0.08391689912610677
                    },
                    {
                        "emotion": "sadness",
                        "value": 0.00659760402801088
                    }
                ],
                "conversationStartTime": 1390,
                "engagementRatio": 0.9813153112221717,
                "keywords": [
                    {
                        "keyword": "will",
                        "frequency": 2
                    },
                ],
                "deadAir": 0.015454716272078131
            },
            "speakerInsights": {
                "speakers": [
                    "-1"
                ],
                "speakersTalktimePc": {
                    "-1": 0.007523583540714161
                },
                "speakersTalktime": {
                    "-1": 1300
                },
                "speakersMonologue": {
                    "-1": 13750
                },
                "speakersEmotion": {
                    "-1": [
                        {
                            "emotion": "joy",
                            "value": 0.007523583540714161
                        },
                        {
                            "emotion": "optimism",
                            "value": 0.3097979975693039
                        },
                        {
                            "emotion": "anticipation",
                            "value": 0.2518085595231206
                        },
                        {
                            "emotion": "Misc",
                            "value": 0.2654667631228659
                        },
                        {
                            "emotion": "anger",
                            "value": 0.07488859308987789
                        },
                        {
                            "emotion": "fear",
                            "value": 0.08391689912610677
                        },
                        {
                            "emotion": "sadness",
                            "value": 0.00659760402801088
                        }
                    ]
                },
                "speakersSentiment": {
                    "-1": [
                        {
                            "sentiment": "Very Positive",
                            "value": 0.17777777777777778
                        },
                        {
                            "sentiment": "Mostly Positive",
                            "value": 0.15555555555555556
                        },
                        {
                            "sentiment": "Neutral",
                            "value": 0.6666666666666666
                        },
                        {
                            "sentiment": "Mostly Negative",
                            "value": 0
                        },
                        {
                            "sentiment": "Very Negative",
                            "value": 0
                        }
                    ]
                },
                "speakerAvgWpm": {
                    "-1": 163.55519043739972
                }
            },
            "transcriptInsights": [

                {
                    "sentence": "I am currently attending Nicholas State University to complete my degree in secondary education with a focus on social studies.",
                    "startTime": 56180,
                    "endTime": 64779.999,
                    "speaker": "-1",
                    "topics": [
                        {
                            "tiers": [
                                {
                                    "tierName": "Education",
                                    "type": 1
                                }
                            ],
                            "name": "Secondary Education"
                        },
                    ],
                    "keywords": [
                        "state",
                        "education",
                        "focus",
                        "Nicholas State University"
                    ],
                    "speechType": "statement",
                    "speechTypeConfidence": 0.9999955892562866,
                    "sentiment": "Neutral",
                    "polarity": -0.041666666666666664,
                    "subjectivity": 0.2916666666666667,
                    "tone": "angry",
                    "toneConfidence": 0.7229840755462646,
                    "emotion": "optimism",
                    "emotionConfidence": 0.640371561050415,
                    "wordsPerMinute": 132.57355506454238
                },

            ]
        },
},

Response Objects

Field

Description

dataInsights

Data insights object containing all the insights of the given audio.video

transcriptInsights

List of trabscript insight objects for each sentence identified by the model

meetingInsights

Object containing all the insights of the meeting

speakerInsights

Object containing all the insights of the speaker in the meeting

transcriptInsights List<Objects>

Field

Description

sentence

Sentence Identified in the given time frame

startTime

Start time of the sentence in the input Video/Audio in milliseconds

endTime

End time of the sentence in the input Video/Audio in milliseconds

speaker

Speaker id whose voice is identified in the given time frame

topics

List of topic object identified in the given time frame

keywords

List of keywords found in the given sentence

speechType

The type of speech best representing the sentence identified in the given time frame eg: Statement, Question,

speechTypeConfidence

The models confidence in the predicted speechType

sentiment

Sentiment of the speaker during the given time frame .

polarity

Integer representation of the sentiment of the speaker. Can have values between -1 and 1. -1 being very negative and 1 being very positive.

subjectivity

A scale of how much the sentence is based on facts and figures. A high subjectivity indicates that the information given by the speaker is not based on facts and that it is highly subjective.

tone

Tone of the speaker in the given time frame

toneConfidence

Value indicating the models confidence in the predicted tone value

emotion

Emotion of the speaker in the given time frame.

emotionConfidence

Value indicating the models confidence in the predicted emotion value.

wordsPerMinute

Average words per minute spoken by the speaker in the given time frame.

meetingInsights Object

Key

Description

meetingSentiment

List of meeting sentiment objects

meetingSentiment.sentiment

A specific sentiment identified in the meeting

meetingSentiment.value

Value specifying the presence if the given sentiment in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same.

meetingEmotion

List of meeting emotion objects

meetingEmotion.emotion

A specific emotion identified in the meeting

meetingEmotion.value

Value specifying the presence if the given emotion in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that emotion was present. Multiplying this with 100 will give you a percentage representation of the same.

conversationStartTime

Point of time at which the first conversation was initiated in the meeting. Time given is in milliseconds

engagementRatio

Value indicating how active the meeting was. This value can range between 0 and 1, 0 being no activity at all and 1 being active throughout.

keywords

List of keyword objects identified in the meeting

keywords.keyword

A specific keyword identified in the meeting

keywords.frequency

Frequency of the given keyword in the meeting

deadAir

The calculated inactive time in the meeting. This will vary depending upon the dead air threshold given

speakerInsights Object

Key

Value

speaker

List of speakers in present in the meeting

speakersTalktimePc

Object representing the talk time ratio of each user in the meeting

speakersTalktime

Object representing the talk time in milliseconds of each user in the meeting

speakersMonologue

speakersEmotion

Different emotions and their ratios for all users in the meeting. This can help identify the emotion of specific users during the meeting.

speakersEmotion.userId[index].emotion

A specific emotin of a specific user during the meeting

speakersEmotion.userId[index].value

Value specifying the presence if the given emotion for a specific user in the meeting. This value ranges from 0 to 1, 0 meaning it wasn't present and 1 meaning only that sentiment was present. Multiplying this with 100 will give you a percentage representation of the same.

speakerAvgWpm

The average words per minute spoken by the speaker.

Last updated