Audio Connector

Audio Connector lets you send raw audio (PCM 16 khz/16bit) streams from a live Vonage Video session to external services such as AWS, GCP, Azure, etc., through your own servers for further processing and analysis.

Using Audio Connector, you can send audio streams individually or mixed. You can identify the speaker by sending the audio streams individually by opening multiple WS connections.

The further processing of audio streams in real-time and offline enables building capabilities such as captions, transcriptions, translations, search and index, content moderation, media intelligence, Electronic Health Records, sentiment analysis, etc.

You can also use Audio Connector to use a WebSocket connection to publish audio to an OpenTok session.

Audio Connector is enabled by default for all projects, and it is a usage-based product. Audio Connector usage is charged based on the number of audio streams of participants (or stream IDs) that are sent to the WebSocket server. The Audio Connector feature is only supported in routed sessions (sessions that use the OpenTok Media Router). You can send up to 50 audio streams from a single session at a time.

Notes:

If you do not specify any stream IDs in the Connect API call, all participants audio will be mixed and sent to the WebSocket server specified. A new participant cannot be added to an existing mixed audio WebSocket connection. You need to tear down the WebSocket connection and start a new WebSocket connection with the new participant’s audio. If a connection to your WebSocket server is not established within 6 seconds, the Connect API call will fail.

This page includes the following sections:

Starting a WebSocket connection

WebSocket messages

Stopping a WebSocket connection

Automatic reconnections

Publishing audio to a session via the WebSocket

Sample application

Starting a WebSocket connection

To start an Audio Connector WebSocket connection, use the OpenTok REST API.

You can also you can also start an Audio Connector WebSocket connection using the OpenTok server SDKs:

Java — See the OpenTok.connectAudioStream() method.
Node — See the opentok.websocketConnect() method.
PHP — See the OpenTok->connectAudio()] method.
Python — See the opentok.connect_audio_to_websocket() method.
Ruby — See the opentok.websocket.connect() method
.NET — See the OpenTok.StartBroadcast() method

For more details, see the Audio Connector REST API documentation.

WebSocket messages

First message

The initial message sent on the established WebSocket connection is text-based, containing a JSON payload. The JSON details the audio format in content-type, along with any other metadata that you put in the headers property of the body in the POST request to start the WebSocket connection:

{
    "content-type":"audio/l16;rate=16000",
    "CUSTOM-HEADER-1": "value-1",
    "CUSTOM-HEADER-2": "value-2"
}

Binary audio messages

Messages that are binary represent the audio of the call. The audio codec supported on the WebSocket interface is Linear PCM 16-bit, with a 16kHz sample rate. Each message includes one 640-byte frame of data (20ms of audio) at 50 frames (messages) per second.

Audio active/inactive messages

When audio in the streams included in the WebSocket is muted, a text message is sent with the following JSON payload (with active set to false):

{
    "content-type":"audio/l16;rate=16000",
    "method": "update",
    "event": "websocket:media:update",
    "active": false,
    "CUSTOM-HEADER-1": "value-1",
    "CUSTOM-HEADER-2": "value-2"
}

(The CUSTOM-HEADER properties in this example represent metadata that you include in the headers property of the body in the POST request to start the WebSocket connection.)

Audio may be muted because all clients stop publishing audio or as a result of a force mute moderation event.

When audio of one of the streams resumes, a text message is sent with the following JSON payload (with active set to true):

{
    "content-type":"audio/l16;rate=16000",
    "method": "update",
    "event": "websocket:media:update",
    "active": true,
    "CUSTOM-HEADER-1": "value-1",
    "CUSTOM-HEADER-2": "value-2"
}

Disconnected message

When the Audio Connector WebSocket stops because of a call to the force disconnect REST method or because the 6-hour time limit is reached (see Stopping a WebSocket connection), a text message is sent with the following JSON payload:

{
    "content-type":"audio/l16;rate=16000",
    "method": "delete",
    "event": "websocket:disconnected",
    "CUSTOM-HEADER-1": "value-1",
    "CUSTOM-HEADER-2": "value-2"
}

This message marks the termination of the WebSocket connection.

(The CUSTOM-HEADER properties in this example represent metadata that you include in the headers property of the body in the POST request to start the WebSocket connection.)

Stopping a WebSocket connection

When your WebSocket server closes the connection, the OpenTok connection for the call also ends. In each client connected to the session, the OpenTok client-side SDK dispatches events indicating the connection ended (just as it would when other clients disconnect from the session).

You can disconnect the Audio Connector WebSocket connection using the force disconnect REST method. Use the connection ID of the Audio Connector WebSocket connection with this method.

As a security measure, the WebSocket will be closed automatically after 6 hours.

Automatic reconnections

Audio Connector will make a few attempts to re-establish a WebSocket connection that closes unexpectedly (for example, if the WebSocket closes without resulting from a call to the force disconnect REST method).

Publishing audio to a session via the WebSocket

You can use the Audio Connector WebSocket connection to send audio data from the WebSocket connection to a stream published in an OpenTok session (in addition to having WebSocket connection receive audio from the session). Set the bidirectional property to true in the data you send with the REST API method to start the Audio Connector.

See Binary audio messages for details on the format of the audio data to send via the WebSocket connection.

When creating the token used by the Audio Connector, you can add token data to identify the the Audio Connector stream. (The OpenTok client libraries include methods for inspecting the connection data for the connection of a stream in session.)

Note: The bidirectional option is only available in the REST API. It is not currently supported in the OpenTok server SDKs.

Sample application

See the demo-video-node-audio_connector project for a sample Node application that uses Audio Connector.

More information

See this blog post.

Could we improve this page? Let us know what's missing.

Feedback

Suggestions

Audio Connector

Starting a WebSocket connection

WebSocket messages

First message

Binary audio messages

Audio active/inactive messages

Disconnected message

Stopping a WebSocket connection

Automatic reconnections

Publishing audio to a session via the WebSocket

Sample application

More information