Marsview Speech Analytics is a cloud-hosted or containerized API service that helps you accurately transcribe a conversation and discover insights. It is packed with models for automatic speech recognition (ASR), Tone Analyzer, Natural Language Classifiers to uncover topics, keywords, entities, and sentiments. Learn more

Here are the configurable models that are bundled with the Speech Analytics API

How it works?

Get Started in 4 Steps



Step 1: GET Access Token

Get your accessToken using apiKey and apiSecret

Step 2: POST Audio/Video

Submit an audio/video file or downloadable URL and you will receive a unique txnId

Step 3: POST Compute Request

Select and configure models. Using txnId you can now select (enableModels) and Configure Models (modelConfig) and submit to POST Compute Request. Each enabled model will be given a unique requestId.

Step 4: Visualize JSON Output

Receive an Output JSON using GET Request Output and use Marsview's Visualizer to visualize your output.


Key Concepts



Unique API Key for your userID


Unique API Secret for your userID


Your registered email ID


JWT Token that is generated using apiKey & apiSecret. This token valid for 3600 seconds from the time of generation.


Marsview generates a Unique txnID for each File/URL submitted


You can choose AI-powered models to enable for Speech Analytics


You can configure the AI-powered models to suit your use case


Marsview generates a Unique ID for each models enabled in enableModels

Step1: GET Access Token

Please log in to https://app.marsview.ai to get your apiKey and apiSecret

Learn how to get your accessToken

Step 2: POST Audio/Video Input

Submit an audio/video file or downloadable URL and you will receive a unique txnId

  1. Input an Audio/Video Stream - (contact [email protected])

  2. Process a Telephony Stream - PSTN/SIP (contact [email protected])

You will receive a unique txnId for your input.

Step 3: POST Compute Request: Enable and configure your models using enableModels and modelConfig

Using txnId you can now select models usingenableModels and Configure Models using modelConfig and submit a request using POST Compute Request.

Each enabled model will be given a unique requestId.

The following are the models that can be enabled using the txnId . Click on each of the Models to learn how to enable and configure it.




Marsview Automatic Speech Recognition (ASR) technology accurately converts speech into text in live or batch mode. API can be deployed in the cloud or on-premise. Get superior accuracy, speaker separation, punctuation, casing, word-level time markers, and more. (Supported Language: English)

Speaker Separation

automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker

Keywords & Topics

Extract the most relevant topics, concepts, discussion points from the conversation are generated based on each paragraph spoken (Topics by Sentence).

Tone Analysis

Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis.

Marsview is capable of detecting the following tone in an audio file:

  • Calm

  • Happy

  • Sad

  • Angry

  • Fearful

  • Disgust

  • Surprised

Emotion Analysis

The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form of free text or spoken text and is designed after the emotion wheel.

Marsview is capable of detecting the following Emotions in an audio file:

  • Anger

  • Anticipation

  • Disgust

  • Fear

  • Joy

  • Love

  • Optimism

  • Pessimism

  • Sadness

  • Surprise

  • Trust

Sentiment Analysis

Sentiment Analysis will help you interpret and quantify if the conversation in the audio or text is Positive, Negative, or Neutral.

Speech/Conversation Type Detection

Speech Type model helps you understand the type of conversation at any given time. Every phone call, online or offline conversation can be broadly classified into four categories - Statement, Command, Action Item, or a Question.

Action Items Detection

Action Item API detects an event, task, activity, or action that needs to take place in the future (after the conversation). These Action items can be of high priority with a definite assignee and due date or of lower priority with a non-definite due date.

All action items are generated with action phrases, assignees, and due dates to make the output immediately consumable by your CRM or project management tools

Questions & Responses Detection

Automatically identify and detect questions or requests posed during the conversation and also the apt response in the conversation in a consumable form. The API detects the Question and Response by the speaker.


Extractive summarization aims at identifying the salient information that is then extracted and grouped together to form a concise summary.

Screengrabs (Chapter API)

Captures keyframes and slides from videos and screen sharing from an online web conference.

Screen Activity

Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API.

Marsview detects the following Screen Activity:

  • Screen Share

  • Interaction

  • Whiteboard

  • Presentation

Learn how to get your submit your Compute Request after configuring required models.

Step 4: Visualize your Output JSON

Get your JSON Output using the GET Request Output API

Easily visualize your Marsview API generated JSON Output

Run on Postman

Graphic User Interface to run APIs.

Error Codes & Troubleshoot Guide

List of error codes and their respective troubleshoot techniques.

Not able to troubleshoot?

Our support team is available to respond to user requests via email at [email protected]

  • 1st Response SLA is less than 24 hours.

  • Users must reach us by filling out the support form available here with their Full Name, Email Address and a brief description of the problem.

  • A support engineer shall respond with 2 hours with a support case number.

  • You can also Book a Call with our engineers.


What can Speech Analytics do?

Speech analytics software helps mine and analyze audio data, detecting things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned; and more. Speech analytics tools can also identify if a customer is getting upset or frustrated.

How does Speech Analytics help improve customer experience?

  • Detect things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned

  • Adapt to customer’s sentiments in real time or improve after the fact

  • Identify customers at risk of churning and retain them

  • Gather insights to improve NPS, CSAT and CES scores

  • Use call transcripts for compliance and documentation

  • Listen to your customers - it pays!

What is the Marsview API platform?

Marsview conversation self-service API platform offers a comprehensive suite of proprietary APIs and developer tools for automatic speech recognition, speaker separation, multi-modal emotion and sentiment recognition, intent recognition, time-sequenced visual recognition, and more. Designed for the demanding Call Center environments (CCAI) that handle millions of outbound and inbound sales and support calls. Marsview APIs provide end-to-end workflows from call listening, recording, insights generation, and Voice of Customer Insights. Conversation APIs are also used in one-on-one to many-to-many conversations and meetings to automatically generate rich contextual feedback, key topics, moments, actions, Q&A, and summaries.