SSML (Beta)
Overview
Speech Synthesis Markup Language (SSML) is an XML-based markup language that allows fine-tuning of text-to-speech output attributes such as tone, pronunciation, speed, and volume. Compared to plain text input, SSML enables manual definition of the tone and pauses for each sentence during inference, and the free insertion of sound effects or music into the audio.
Use Cases
SSML provides flexible control over voice output through various customizable attributes. It can be used for the following purposes:
Text Structuring: Define the structure of the input text to influence TTS output, including paragraphs, sentences, breaks/pauses, and silence.
Voice and Style Selection: Choose the voice, language, name, style, and role. Multiple voices can be used within a single SSML document. You can also adjust emphasis, speed, pitch, and volume.
Audio Insertion: Insert pre-recorded audio such as sound effects or music.
Requests Method
Both GET and POST methods are supported. Replace plain text content with SSML in the request. Currently, SSML functionality is in the development and testing phase, with limited features available.
Last updated
Was this helpful?