Overview
Marsview Speech Analytics is a cloud-hosted or containerized API service that helps you accurately transcribe a conversation and discover insights. It is packed with models for automatic speech recognition (ASR), Tone Analyzer, Natural Language Classifiers to uncover topics, keywords, entities, and sentiments. Learn more
Here are the configurable models that are bundled with the Speech Analytics API

Steps | Description |
Get your accessToken using apiKey and apiSecret | |
Submit an audio/video file or downloadable URL and you will receive a unique txnId | |
Select and configure models. Using txnId you can now select (enableModels ) and Configure Models (modelConfig ) and submit to POST Compute Request. Each enabled model will be given a unique requestId. | |
Receive an Output JSON using GET Request Output and use Marsview's Visualizer to visualize your output. |
Key Concepts | Description |
apiKey | Unique API Key for your userID |
apiSecret | Unique API Secret for your userID |
userID | Your registered email ID |
accessToken | JWT Token that is generated using apiKey & apiSecret . This token valid for 3600 seconds from the time of generation. |
txnID | |
enableModels | You can choose AI-powered models to enable for Speech Analytics |
modelConfig | You can configure the AI-powered models to suit your use case |
requestID | Marsview generates a Unique ID for each models enabled in enableModels |
Learn how to get your
accessToken

Submit an audio/video file or downloadable URL and you will receive a unique
txnId
- 3.Input an Audio/Video Stream - (contact [email protected])
- 4.Process a Telephony Stream - PSTN/SIP (contact [email protected])
You will receive a unique
txnId
for your input.Using
txnId
you can now select models usingenableModels
and Configure Models using modelConfig
and submit a request using POST Compute Request. Each enabled model will be given a unique
requestId.
The following are the models that can be enabled using the
txnId
. Click on each of the Models to learn how to enable and configure it.Models | Description |
Marsview Automatic Speech Recognition (ASR) technology accurately converts speech into text in live or batch mode. API can be deployed in the cloud or on-premise. Get superior accuracy, speaker separation, punctuation, casing, word-level time markers, and more. (Supported Language: English) | |
automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker | |
Extract the most relevant topics, concepts, discussion points from the conversation are generated based on each paragraph spoken (Topics by Sentence). | |
Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis. Marsview is capable of detecting the following tone in an audio file:
| |
The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form of free text or spoken text and is designed after the emotion wheel. Marsview is capable of detecting the following Emotions in an audio file:
| |
Sentiment Analysis will help you interpret and quantify if the conversation in the audio or text is Positive, Negative, or Neutral. | |
Speech Type model helps you understand the type of conversation at any given time. Every phone call, online or offline conversation can be broadly classified into four categories - Statement, Command, Action Item, or a Question. | |
Action Item API detects an event, task, activity, or action that needs to take place in the future (after the conversation). These Action items can be of high priority with a definite assignee and due date or of lower priority with a non-definite due date. All action items are generated with action phrases, assignees, and due dates to make the output immediately consumable by your CRM or project management tools | |
Automatically identify and detect questions or requests posed during the conversation and also the apt response in the conversation in a consumable form. The API detects the Question and Response by the speaker. | |
Extractive summarization aims at identifying the salient information that is then extracted and grouped together to form a concise summary. | |
Captures keyframes and slides from videos and screen sharing from an online web conference. | |
Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API. Marsview detects the following Screen Activity:
|
Learn how to get your submit your Compute Request after configuring required models.
Get your JSON Output using the GET Request Output API
Easily visualize your Marsview API generated JSON Output

Graphic User Interface to run APIs.
List of error codes and their respective troubleshoot techniques.
Our support team is available to respond to user requests via email at [email protected].
- 1st Response SLA is less than 24 hours.
- Users must reach us by filling out the support form available here with their Full Name, Email Address and a brief description of the problem.
- A support engineer shall respond with 2 hours with a support case number.
Speech analytics software helps mine and analyze audio data, detecting things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned; and more. Speech analytics tools can also identify if a customer is getting upset or frustrated.
- Detect things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned
- Adapt to customer’s sentiments in real time or improve after the fact
- Identify customers at risk of churning and retain them
- Gather insights to improve NPS, CSAT and CES scores
- Use call transcripts for compliance and documentation
- Listen to your customers - it pays!
Marsview conversation self-service API platform offers a comprehensive suite of proprietary APIs and developer tools for automatic speech recognition, speaker separation, multi-modal emotion and sentiment recognition, intent recognition, time-sequenced visual recognition, and more. Designed for the demanding Call Center environments (CCAI) that handle millions of outbound and inbound sales and support calls. Marsview APIs provide end-to-end workflows from call listening, recording, insights generation, and Voice of Customer Insights. Conversation APIs are also used in one-on-one to many-to-many conversations and meetings to automatically generate rich contextual feedback, key topics, moments, actions, Q&A, and summaries.
Last modified 1yr ago