Please contact support@marsview.ai for more details on this API bundle and a free demo.

Marsview Speech-to-Text is an automatic speech recognition API service that uses advanced deep learning neural network algorithms to convert audio/video files and live streams into readable text, separated by speakers. It is simple to integrate in your application with just a few lines of code. Learn more

Model Features


Feature Description


Accurately converts speech into text in live or batch mode

Automatic Punctuation

Accurately adds punctuation to the transcribed text

Custom Vocabulary

Boost domain-specific terminology, proper nouns, abbreviations by adding a simple list/taxonomy of words/phrases.

Speaker Separation

Automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker.

Sentence level Keywords

The most relevant topics, concepts, discussion points from the conversation are generated based on the overall scope of the discussion.


Where is Speech-to-Text used?

Speech-to-Text API produces call transcription. It is easier to search and review text history than an audio file. Therefore, transcriptions are widely used by contact center managers in sales and support functions, publishers, students, educators, medical and legal professionals to gain insights and take actions. From the user's point of view, a speech-to-text system can be categorized based on its use: conversational system, command and control, text dictation, audio document transcription, webinars, interview etc. Each use has specific requirements in terms of latency, memory constraints, vocabulary size, and adaptive features.

How does it work?

Our Automatic Speech Recognition (ASR) that powers Speech-to-Text API is built on language, acoustic and pronunciation models. It is highly-optimized for performance, accuracy, low latency and customization. Further, the models are specialized for the English language, dialect, application domain, type of speech, and communication channel. It should be noted that the accuracy is highly dependent on the speaker, the style of speech and the environmental conditions. High accuracy is essential to maximize your ROI. Marsview offers services to customize, adapt and refine the ASR models to exactly match your needs. Tailoring models for your application is the best way to ensure you get the best possible results for your needs.

What is the Marsview API platform?

Marsview conversation self-service API platform offers a comprehensive suite of proprietary APIs and developer tools for automatic speech recognition, speaker separation, multi-modal emotion and sentiment recognition, intent recognition, time-sequenced visual recognition, and more. Designed for the demanding Call Center environments (CCAI) that handle millions of outbound and inbound sales and support calls. Marsview APIs provide end-to-end workflows from call listening, recording, insights generation, and Voice of Customer Insights. Conversation APIs are also used in one-on-one to many-to-many conversations and meetings to automatically generate rich contextual feedback, key topics, moments, actions, Q&A, and summaries.

Last updated