SSML (Beta)

Overview

Speech Synthesis Markup Language (SSML) is an XML-based markup language that allows fine-tuning of text-to-speech output attributes such as tone, pronunciation, speed, and volume. Compared to plain text input, SSML enables manual definition of the tone and pauses for each sentence during inference, and the free insertion of sound effects or music into the audio.

Use Cases

SSML provides flexible control over voice output through various customizable attributes. It can be used for the following purposes:

Text Structuring: Define the structure of the input text to influence TTS output, including paragraphs, sentences, breaks/pauses, and silence.
Voice and Style Selection: Choose the voice, language, name, style, and role. Multiple voices can be used within a single SSML document. You can also adjust emphasis, speed, pitch, and volume.
Audio Insertion: Insert pre-recorded audio such as sound effects or music.

Requests Method

Both GET and POST methods are supported. Replace plain text content with SSML in the request. Currently, SSML functionality is in the development and testing phase, with limited features available.

PreviousBest Practice NextAutomatic Speech Recognition (ASR)

Last updated 10 months ago

Was this helpful?