Marsview Speech Analytics is a cloud-hosted or containerized API service that helps you accurately transcribe a conversation and discover insights. It is packed with models for automatic speech recognition (ASR), Tone Analyzer, Natural Language Classifiers to uncover topics, keywords, entities, and sentiments. Learn more
Here are the configurable models that are bundled with the Speech Analytics API

How it works?

Get Started in 4 Steps

Get your accessToken using apiKey and apiSecret
Submit an audio/video file or downloadable URL and you will receive a unique txnId
Select and configure models. Using txnId you can now select (enableModels) and Configure Models (modelConfig) and submit to POST Compute Request. Each enabled model will be given a unique requestId.
Receive an Output JSON using GET Request Output and use Marsview's Visualizer to visualize your output.


Key Concepts
Unique API Key for your userID
Unique API Secret for your userID
Your registered email ID
JWT Token that is generated using apiKey & apiSecret. This token valid for 3600 seconds from the time of generation.
Marsview generates a Unique txnID for each File/URL submitted
You can choose AI-powered models to enable for Speech Analytics
You can configure the AI-powered models to suit your use case
Marsview generates a Unique ID for each models enabled in enableModels

Step1: GET Access Token

Please log in to https://app.marsview.ai to get your apiKey and apiSecret
Learn how to get your accessToken

Step 2: POST Audio/Video Input

Submit an audio/video file or downloadable URL and you will receive a unique txnId
  1. 3.
    Input an Audio/Video Stream - (contact [email protected])
  2. 4.
    Process a Telephony Stream - PSTN/SIP (contact [email protected])
You will receive a unique txnId for your input.

Step 3: POST Compute Request: Enable and configure your models using enableModels and modelConfig

Using txnId you can now select models usingenableModels and Configure Models using modelConfig and submit a request using POST Compute Request.
Each enabled model will be given a unique requestId.
The following are the models that can be enabled using the txnId . Click on each of the Models to learn how to enable and configure it.
Marsview Automatic Speech Recognition (ASR) technology accurately converts speech into text in live or batch mode. API can be deployed in the cloud or on-premise. Get superior accuracy, speaker separation, punctuation, casing, word-level time markers, and more. (Supported Language: English)
automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker
Extract the most relevant topics, concepts, discussion points from the conversation are generated based on each paragraph spoken (Topics by Sentence).
Tone Analysis suggests speaker emotion using only audio clues. Sometimes the speaker may show emotions in the tone of the response and this is important to capture to get the overall sentiment/mood of the conversation which cannot be extracted from conventional Lexical Emotion analysis.
Marsview is capable of detecting the following tone in an audio file:
  • Calm
  • Happy
  • Sad
  • Angry
  • Fearful
  • Disgust
  • Surprised
The Emotion Analysis model will help you understand and interpret speaker emotions in a conversation or text. It is designed to understand human conversation in the form of free text or spoken text and is designed after the emotion wheel.
Marsview is capable of detecting the following Emotions in an audio file:
  • Admiration
  • Amusement
  • Anger
  • Annoyance
  • Approval
  • Caring
  • Confusion
  • Curiosity
  • Desire
  • Disappointment
  • Disapproval
  • Disgust
  • Embarrassment
  • Excitement
  • Fear
  • Gratitude
  • Grief
  • Joy
  • Love
  • Nervousness
  • Optimism
  • Pride
  • Realization
  • Relief
  • Remorse
  • Sadness
  • Surprise
  • Neutral
Sentiment Analysis will help you interpret and quantify if the conversation in the audio or text is Positive, Negative, or Neutral.
Speech Type model helps you understand the type of conversation at any given time. Every phone call, online or offline conversation can be broadly classified into four categories - Statement, Command, Action Item, or a Question.
Action Item API detects an event, task, activity, or action that needs to take place in the future (after the conversation). These Action items can be of high priority with a definite assignee and due date or of lower priority with a non-definite due date.
All action items are generated with action phrases, assignees, and due dates to make the output immediately consumable by your CRM or project management tools
Automatically identify and detect questions or requests posed during the conversation and also the apt response in the conversation in a consumable form. The API detects the Question and Response by the speaker.
Extractive summarization aims at identifying the salient information that is then extracted and grouped together to form a concise summary.
Captures keyframes and slides from videos and screen sharing from an online web conference.
Identify and analyze the visual aspects of the meeting along with the corresponding timestamps and with Screen Activity API.
Marsview detects the following Screen Activity:
  • Screen Share
  • Interaction
  • Whiteboard
  • Presentation
Learn how to get your submit your Compute Request after configuring required models.

Step 4: Visualize your Output JSON

Get your JSON Output using the GET Request Output API
Easily visualize your Marsview API generated JSON Output

Run on Postman

Graphic User Interface to run APIs.

Error Codes & Troubleshoot Guide

List of error codes and their respective troubleshoot techniques.

Not able to troubleshoot?

Our support team is available to respond to user requests via email at [email protected].
  • 1st Response SLA is less than 24 hours.
  • Users must reach us by filling out the support form available here with their Full Name, Email Address and a brief description of the problem.
  • A support engineer shall respond with 2 hours with a support case number.
  • You can also Book a Call with our engineers.


What can Speech Analytics do?

Speech analytics software helps mine and analyze audio data, detecting things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned; and more. Speech analytics tools can also identify if a customer is getting upset or frustrated.

How does Speech Analytics help improve customer experience?

  • Detect things like emotion, tone and stress in a customer's voice; the reason for the call; satisfaction; the products mentioned
  • Adapt to customer’s sentiments in real time or improve after the fact
  • Identify customers at risk of churning and retain them
  • Gather insights to improve NPS, CSAT and CES scores
  • Use call transcripts for compliance and documentation
  • Listen to your customers - it pays!

What is the Marsview API platform?

Marsview conversation self-service API platform offers a comprehensive suite of proprietary APIs and developer tools for automatic speech recognition, speaker separation, multi-modal emotion and sentiment recognition, intent recognition, time-sequenced visual recognition, and more. Designed for the demanding Call Center environments (CCAI) that handle millions of outbound and inbound sales and support calls. Marsview APIs provide end-to-end workflows from call listening, recording, insights generation, and Voice of Customer Insights. Conversation APIs are also used in one-on-one to many-to-many conversations and meetings to automatically generate rich contextual feedback, key topics, moments, actions, Q&A, and summaries.