SSML Elements
Using Speech Sythesis Markup Language (SSML) you can make responses sound more like natural speech.
To get started, make sure you set the Say element type attribute to SSML. See https://docs.ytel.com/docs/say for more information.
Break
The Break element controls pausing or other prosodic boundaries between words. Using Break between any pair of tokens is optional. If this element is not present between words, the break is automatically determined based on the linguistic context.
Element Attributes
Attribute | Description |
|---|---|
| Sets the length of the break by seconds or milliseconds (e.g. "3s" or "250ms") |
| Sets the strength of the output's prosodic break by relative terms. Valid values are: "x-weak", weak", "medium", "strong", and "x-strong". The value "none" indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break that the processor would otherwise produce. The other values indicate monotonically non-decreasing (conceptually increasing) break strength between tokens. The stronger boundaries are typically accompanied by pauses. |
Example
The following example shows how to use the Break element to pause between steps:
<response>
<say type='ssml'>
Step 1, take a deep breath. <break time="200ms"/>
Step 2, exhale.
Step 3, take a deep breath again. <break strength="weak"/>
Step 4, exhale.
</say><say-as>
<say-as>The <say-as> element lets you specify information about the type of text construct that is contained within the element.
Attributes
interpret-as: Determines how the value is spokenformat: Optional formatting for specific interpret-as valuesdetail: Optional detail level for specific interpret-as values
Supported Values
- cardinal: Speaks numbers as words
- ordinal: Speaks numbers as ordinal terms
- characters: Spells out words letter by letter
- fraction: Converts fractions to spoken words
- expletive or beep: Censors text
- unit: Converts units to appropriate form
- verbatim or spell-out: Spells out text
- time: Speaks time in a natural format
- date: Speaks dates with configurable detail
Example
<response>
<say type='ssml'>
Cardinal value <say-as interpret-as="cardinal">12345</say-as>
Ordinal value <say-as interpret-as="ordinal">1</say-as>
Characters value <say-as interpret-as="characters">can</say-as>
Fraction value <say-as interpret-as="fraction">5+1/2</say-as>
Expletive or beep value <say-as interpret-as="expletive">censor this</say-as>
Unit value <say-as interpret-as="unit">10 foot</say-as>
Verbatim or spell-out value <say-as interpret-as="verbatim">abcdefg</say-as>
Time value <say-as interpret-as="time" format="hms12">2:30pm</say-as>
Date value with detail 1 <say-as interpret-as="date" format="yyyymmdd" detail="1"> 1960-09-10</say-as>
Date value with detail 2 <say-as interpret-as="date" format="dmy" detail="2"> 10-9-1960 </say-as>
</say><p> and <s>
<p> and <s>The <p> and <s> element lets you create paragraphs and sentences.
Use <s>...</s> tags to wrap full sentences, especially if they contain SSML elements that change prosody (that is, <audio>, <break>, <emphasis>, <par>, <prosody>, <say-as>, <seq>, and <sub>).
If a break in speech is intended to be long enough that you can hear it, use <s>...</s> tags and put that break between sentences.
Example
<response>
<say type='ssml'>
<p>
<s>This is sentence one.</s>
<s>This is sentence two.</s>
</p>
</say>Additional Notes
- The
<s>tag helps define sentence boundaries - SSML elements within
<s>tags can modify how the sentence is spoken - Proper use of these tags can improve speech synthesis clarity
Prosody
The Prosody element lets you customize the pitch, speaking rate, and volume of text contained by the element. Currently the rate, pitch, and volume attributes are supported.
The rate and volume attributes can be set according to the W3 specification. There are three options for setting the value of the pitch attribute:
| Option | Description |
|---|---|
| Relative | Specify a relative value (e.g. "low", "medium", "high", etc) where "medium" is the default pitch. |
| Semitones | Increase or decrease pitch by "N" semitones using "+Nst" or "-Nst" respectively. Note that "+/-" and "st" are required. |
| Percentage | Increase or decrease pitch by "N" percent by using "+N%" or "-N%" respectively. Note that "%" is required but "+/-" is optional. |
Example
<response>
<say type='ssml'>
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
</say>
</response>Audio
The Audio element supports the insertion of recorded audio files and other audio formats in conjunction with synthesized speech output.
Refer to the W3 specification for detailed information: https://www.w3.org/TR/speech-synthesis/#S3.3.1
Attributes
| Attribute | Required | Default | Description |
|---|---|---|---|
src | Yes | N/A | URI referring to the audio media source (HTTPS only) |
clipBegin | No | 0 | Offset from audio source's beginning to start playback |
clipEnd | No | Infinity | Offset from audio source's beginning to end playback |
speed | No | 100% | Playback rate relative to normal input rate |
repeatCount | No | 1 | Number of times to insert the audio |
repeatDur | No | Infinity | Duration limit for inserted audio |
soundLevel | No | +0dB | Sound level adjustment in decibels |
Example
<response>
<say type='ssml'>
<audio src="cat_purr_close.ogg">
<desc>a cat purring</desc>
PURR (sound didn't load)
</audio>
</say>
</response>Updated 6 months ago
