Speaker Separation

Automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker.

Overview

Speaker separation setting will partition the input audio stream into homogeneous segments according to the speaker identity. This works on both multi-channel audio and mono-channel audio as well.

Input Type Supported: Audio, Video

post
Compute Metadata

https://api.marsview.ai/v1/conversation/compute
By default, the speaker separation setting in the conversation API will be disabled. It can be enabled in the Compute route by adding the settings given below. By setting the "speaker_separation.enable " flag to "True" and mentioning the Number of speakers using the "num_speakers" flag the speech to text output will be speaker separated with each transcript section having a speaker number. A speaker number (integer) is automatically assigned to each speaker cluster. If the number of speakers present in the conversation is unknown, the"num_speakers"can be set to "0" and Marsview's Speaker separation model will automatically detect the number of speakers.
Request
Response
Request
Headers
appSecret
required
string
<sample-app-secret>
appId
required
string
<sample-app-Id>
Content-Type
optional
string
application/json
Body Parameters
num_speakers
optional
integer
Number of speakers in the conversation (Defaults to 0)
speaker_separation.enable
optional
boolean
Flag to enable Speaker Separation
Response
200: OK
A Transaction ID is returned in the JSON body once the processing job is launched successfully. This Transaction ID can be used to check the status of the job or fetch the results of the job once the metadata is computed
{
"status":true,
"transaction_id":32dcef1a-5724-4df8-a4a5-fb43c047716b
}
400: Bad Request
This usually happens when the settings for computing the metadata are not configured correctly. Check the request object and also the dependencies required to compute certain metadata objects. ( For Example: Speech to Text has to be enabled for Action Items to be enabled)
{
"status":false,
"error":{
"code":"CVAPI01",
"message":"DependencyError: Speech to text must be enabled for speaker_separation to be enabled"
}
}

Speech to text must be enabled for speaker_separation to be enabled. (Error code: CVAPI01)

The accuracy of the diarized output will be much higher when num_speakers is mentioned

post
Request Metadata

https://api.marsview.ai/v1/conversation/fetch
Request
Response
Request
Headers
appSecret
optional
string
<sample app secret>
appId
optional
string
<sample app ID>
Content-Type
optional
string
application/json
Body Parameters
data.speaker_separation
optional
boolean
Returns speaker separated transcript data once the data is computed.
file_id
optional
string
File ID of the audio/video file
Response
200: OK
The output consists of two objects. The data object returns the requested metadata if it is computed. The status object shows the current state of the requested metadata. Status for each metadata field can take values "Queued"/"Processing"/"Completed".
QUEUED STATE
COMPLETED STATE
QUEUED STATE
{
"status":{
"speech_to_text":"Queued",
"speaker_separation":"Queued"
}
"data":{
"speech_to_text":{}
}
}
COMPLETED STATE
{
"status":{
"speech_to_text":"Completed",
"speaker_spearation":"Completed"
}
"data":{
"speech_to_text":{
"sentences":[
...
{
"sentence" : "Be sure to check out the support document at marsview.ai",
"start_time" : "172200.0",
"end_time" : "175100.0",
"speakers" : [
"2"
]
},
{
"sentence" : "Sure, Thats what i was looking for, Thank You!",
"start_time" : "175100.0",
"end_time" : "177300.0",
"speakers" : [
"1"
]
},
...
]
}
}
}

Speaker Separation Response Object Fields

Fields

Description

start_time

Starting time of the chunk in milliseconds

end_time

Ending time of the chunk in milliseconds

speakers

A speaker number (integer) is automatically assigned to each speaker cluster

‚Äč