All Products
Search
Document Center

SSML-based synthesis effect control

Last Updated: Jun 18, 2020

Speech Synthesis Markup Language (SSML) is an XML-based markup language. Compared with plain text synthesis, SSML-based synthesis enriches the content that can be synthesized, achieving a variety of final synthesis effects. With SSML, you can not only control what the speech synthesis service reads, but also control how the speech synthesis service reads the content you specify. For example, you can specify how to break sentences and words, control the pronunciation, speed, pauses, intonation, and volume, and even add background music (BGM).

SSML of Alibaba Cloud speech synthesis service is implemented based on SSML V1.0 of W3C. However, Alibaba Cloud SSML does not support all W3C markup types. Instead, it supports markup types as many as possible based on business needs.

2 Usage

Notes:

  • Currently, Alibaba Cloud SSML can only be used for speech synthesis in Chinese.
  • All text to be synthesized must be enclosed in the <speak></speak> tag. Each speech synthesis task can contain only one <speak></speak> tag.

You can upload tagged text as the value of the text parameter to the speech synthesis service. The following code uses the Java SDK as an example:

  1. SpeechSynthesizer synthesizer = new SpeechSynthesizer(client, getSynthesizerListener());
  2. String text = "<speak>Close your eyes and take a rest.<break time=\"500ms\"/>Now open your eyes.</speak>" ;
  3. synthesizer.setText(text);
  4. .......

The request sent to the speech synthesis service is as follows:

  1. {
  2. "payload": {
  3. "volume": 50,
  4. "sample_rate": 16000,
  5. "format": "wav",
  6. "text": "<speak>Close your eyes and take a rest.<break time=\"500ms\"/>Now open your eyes.</speak>"
  7. },
  8. "context": {
  9. "sdk": {
  10. "name": "nls-sdk-java",
  11. "version": "2.0.4"
  12. }
  13. },
  14. "header": {
  15. "namespace": "SpeechSynthesizer",
  16. "name": "StartSynthesis",
  17. "message_id": "5fdf78c0dd574b6897f3cb204dd080b3",
  18. "appkey": "fd4******er4",
  19. "task_id": "6e1be78ef5804c50a2c5a8b92de15cb9"
  20. }
  21. }

The following section describes the SSML tags supported by the speech synthesis service, which are used in the same way as the preceding example.

3 Tags

3.1 speak

  1. Description

    <speak> is the root node of all SSML tags supported by the speech synthesis service. All text that needs to call SSML tags must be enclosed in the <speak></speak> tag.

  2. Syntax

    1. <speak>Text that needs to call SSML tags</speak>
  3. Attributes

    <speak> can use the following attributes.

    NameType ValueRequired Description
    voiceString The name of the online speaker that can be called. The value of the voice attribute can only contain lowercase letters, such as “siyue.”No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the speaker during synthesis, which has a higher priority than the speaker specified by the voice parameter in the speech synthesis API. You cannot set this attribute to xiaowei and xiaomeng currently.
    encodeTypeString pcm
    wav
    mp3
    No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the audio file format during synthesis, which has a higher priority than the file format specified by the format parameter in the speech synthesis API.
    sampleRateString 8000
    16000
    No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the audio sampling rate during synthesis, which has a higher priority than the audio sampling rate specified by the sample_rate parameter in the speech synthesis API.
    rateString Any integer from -500 to 500. Default value: 0. A value greater than 0 indicates that the speech speed is increased, and a value less than 0 indicates that the speech speed is reduced.No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the audio speed during synthesis, which has a higher priority than the speed specified by the speech_rate parameter in the speech synthesis API.
    pitchString Any integer from -500 to 500. Default value: 0. A value greater than 0 indicates that the pitch rises, and a value less than 0 indicates that the pitch falls.No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the audio pitch during synthesis, which has a higher priority than the pitch specified by the pitch_rate parameter in the speech synthesis API.
    volumeString Any integer from 0 to 100. Default value: 50. A value greater than 50 indicates that the volume is increased, and a value less than 50 indicates that the volume is reduced.No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the audio volume during synthesis, which has a higher priority than the volume specified by the volume parameter in the speech synthesis API.
    effectString robot
    lolita
    lowpass
    echo
    eq
    lpfilter
    hpfilter
    No The proprietary tag of Alibaba Cloud speech synthesis service. This tag can be used to produce various sound effects for the synthesized speech. Valid values:
    • robot
    • lolita
    • lowpass
    • echo
    • eq (equalizer)
    • lpfilter (low-pass filter)
    • hpfilter (high-pass filter)
    Notes:
    1. Among the values, eq, lpfilter, and hpfilter are advanced filters. If you set this attribute to eq, lpfilter, or hpfilter, you can specify the effectValue attribute to customize the effect of the specified filter.
    2. An SSML structure supports only one sound effect. You cannot set this attribute to multiple values.
    3. If you set this attribute, the system latency may increase.
    effectValueString If you set the effect attribute to eq, lpfilter, or hpfilter, you can specify this attribute to modify the default effect of the specified filter.No • eq: The system provides eight default bands with the frequencies of 40 Hz, 100 Hz, 200 Hz, 400 Hz, 800 Hz, 1,600 Hz, 4,000 Hz, and 12,000 Hz. All of their bandwidths are 1.0q. When setting this attribute, you must enter the corresponding gain of each band. The adjustment range of each gain is [-20 dB, 20 dB]. For example, if you set the effect attribute to eq, you can set this attribute to 1 1 1 1 1 1 1 1. The input value is a string consisting of eight integers separated with spaces. If the value is 0, the gain of the corresponding band is not adjusted.
    • lpfilter: the frequency of the low-pass filter. The value can be any integer in the range of (0, target sampling rate/2]. For example, if you set the effect attribute to lpfilter, you can set this attribute to 800.
    • hpfilter: the frequency of the high-pass filter. The value can be any integer in the range of (0, target sampling rate/2]. For example, if you set the effect attribute to hpfilter, you can set this attribute to 1200.
    bgmString The name of the BGM that can be called online. For more information, see the following BGM attribute description.No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the BGM of the synthesized speech.
    backgroundMusicVolumeString Any integer from 0 to 100. Default value: 50. A value greater than 50 indicates that the volume is increased, and a value less than 50 indicates that the volume is reduced.No The proprietary tag of Alibaba Cloud speech synthesis service. This attribute specifies the volume of the BGM.

    The following table describes the bgm attribute.

    Built-in BGM URLAlibaba Cloud speech synthesis service provides several built-in BGM streams. You can click the following URLs to listen to them:
    http://nls.alicdn.com/bgm/1.wav
    http://nls.alicdn.com/bgm/2.wav
    http://nls.alicdn.com/bgm/3.wav
    Custom BGM URL You can use custom BGM as needed. You need to store the BGM on your Alibaba Cloud OSS bucket and grant at least the public-read permission on the bucket. For more information about how to create a bucket, see Create a bucket. For more information about how to generate an object access link, see How to obtain the access URL of an OSS object. Both HTTP and HTTPS are supported.
    You are legally liable for the copyright of the uploaded audio file.
    Audio requirements:
    1. The audio file must be a mono WAV file with the sampling rate of 16 kHz.
    2. The maximum file size is 2 MB.
    3. If the synthesis duration is longer than the BGM duration, the BGM is cyclically played.
    If your BGM file is not in WAV format, you can run the following command to convert the BGM file into the WAV format by using FFmpeg: ffmpeg -i Input audio file -acodec pcm_s16le -ac 1 -ar 16000 Target audio file.wav.

    Note: If the URL in the tag contains special XML characters, escape the characters. The following special characters are commonly used: < > & " '.

  4. Tag relationship<speak> can contain text and the following tags:

    • break
    • s
    • w
    • phoneme
    • say-as
  5. Examples

    • Empty attribute

      1. <speak >
      2. Text that needs to call SSML tags
      3. </speak>
    • Attribute voice

      1. <speak voice="xiaogang">
      2. This is a male voice.
      3. </speak>
    • Attribute encodeType

      1. <speak encodeType="mp3">
      2. I can produce audio in compressed formats.
      3. </speak>
    • Attribute sampleRate

      1. <speak sampleRate="8000" >
      2. Check my file size. It is half of the audio at a sample rate of 16,000.
      3. </speak>
    • Attribute rate

      1. <speak rate="200">
      2. I speak faster than the normal people.
      3. </speak>
    • Attribute pitch

      1. <speak pitch="-100">
      2. But the pitch of my voice is lower than theirs.
      3. </speak>
    • Attribute volume

      1. <speak volume="80">
      2. I have a voice of high volume too.
      3. </speak>
    • Combination of attributes, separated with spaces

      1. <speak rate="200" pitch="-100" volume="80">
      2. So put together, this is my voice.
      3. </speak>
    • Attribute effect

      1. <speak effect="robot">
      2. Do you like Wall-E the robot?
      3. </speak>
    • Attribute bgm

      1. <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40">
      2. <break time="2s"/>
      3. Shady cliff, old trees, dense clouds and mist;
      4. <break time="700ms"/>
      5. Sound of raindrops remains in the bamboo forest.
      6. <break time="700ms"/>
      7. Mianjue Books do include plans beneficial to country;
      8. <break time="700ms"/>
      9. Mianzhou's landscape and specialty are always worth the journey.
      10. <break time="2s"/>
      11. </speak>

3.2 break

  1. Description

    The <break> tag is optional, used to insert a break in the text.

  2. Syntax

    1. # Empty attribute
    2. <break/>
    3. # Attribute with time
    4. <break time="string"/>
  3. Attributes

    NameType ValueRequired Description
    EmptyEmpty EmptyNoIf the <break> tag without any attribute is used, the speech has a break of 1 second.
    timeString [number]s
    [number]ms
    No The break length, in seconds or milliseconds. For example, 2 seconds or 50 milliseconds.
    If the break is in milliseconds, the value of number is an integer in the range of [50, 10000].
    If the break is in seconds, the value of number is an integer in the range of [1, 10].
    • [number]s: the break duration, in seconds.
    • [number]ms: the break duration, in milliseconds.
  4. Tag relationship

    <break> is an empty tag and cannot contain any tags. If an SSML structure contains the <s> tag, write the <break> tag in the <s> tag, which indicates that the current paragraph or sentence has a break.

  5. Examples

    1. <speak>
    2. Close your eyes and take a rest.<break time="500ms"/>Okay, now open your eyes.
    3. </speak>

3.3 s

  1. Description

    The <s> tag is optional, used to represent the sentence structure in the text.

  2. Syntax

    1. <s>Text</s>
  3. Attributes

None

  1. Tag relationshipThe <s> tag can contain text and the following tags:

    • break
    • w
    • phoneme
    • say-as
  2. Examples

    1. <speak><s>This is the first sentence.</s><s>This is the second sentence.</s></speak>

3.4 sub

  1. Description

    The <sub> tag is used to replace the text in a tag with an alias.

  2. Syntax

    1. <sub alias="string"></sub>
  3. Attributes

    NameType ValueRequired Description
    aliasString The target content.Yes The text used to replace the text in a tag.
  4. Tag relationship

    The <sub> tag can only contain text.

  5. Examples

    1. <speak><sub alias="Network Protocol Standards">W3C</sub></speak>

3.5 w

  1. Description

    The <w> tag is optional, used to represent the word structure in the text.

  2. Syntax

    1. <w>Text</w>
  3. Attributes

None

  1. Tag relationship

    The <w> tag can only contain text.

  2. Examples

    1. <speak>The Mayor of Nanjing <w>Jiang Daqiao</w> gave a speech today.</speak>

3.6 phoneme

  1. Description

    The <phoneme> tag is optional, used to control the pronunciation of the text in a tag.

  2. Syntax

    1. <phoneme alphabet="string" ph="string">Text</phoneme>
  3. Attributes

    NameType ValueRequired Description
    alphabetString pyYesThe value of py indicates pinyin.
    phString The pinyin string corresponding to the text in the tag.YesAssignment specification for pinyin:
    1. Pinyin syllables are separated with spaces. The number of pinyin syllables must be the same as that of words.
    2. Each pinyin syllable is composed of sound and tone marks. The tone marks are represented by tone numbers 1 to 5, in which 5 indicates the neutral tone.
  4. Tag relationship

    The <phoneme> tag can only contain text.

  5. Examples

    1. <speak>
    2. Go to a <phoneme alphabet="py" ph="dian3 dang4 hang2">pawnshop</phoneme> with this<phoneme alphabet="py" ph="dang4 diao4">.</phoneme>
    3. </speak>

3.8 soundEvent

  1. Description

The <soundEvent> tag is used to insert a sound cue in any position of the text during SSML-based synthesis.

  1. Syntax

    1. <soundEvent src="URI"/>
  2. Attributes

    NameTypeValueRequiredDescription
    src String The URI of the sound cue. Yes You can use a custom sound cue as needed. You need to store the sound cue on your Alibaba Cloud OSS bucket and grant at least the public-read permission on the bucket. For more information about how to create a bucket, see Create a bucket. Both HTTP and HTTPS are supported.
    You are legally liable for the copyright of the uploaded audio file.
    Audio requirements:
    1. The audio file must be a mono WAV file with the sampling rate of 16 kHz.
    2. The maximum file size is 2 MB.
  3. Tag relationship

<soundEvent> is an empty tag and cannot contain any tags.

  1. Examples
    1. <speak>
    2. A horse gets frightened.<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>People scatter in search of shelter.
    3. </speak>

3.7 say-as

  1. Description

    The <say-as> tag is used to indicate the type of the text in the tag, so that the text can be read based on the default pronunciation method of this type.

  2. Syntax

    1. <say-as interpret-as="string">Text</say-as>
  3. Attributes

    NameType ValueRequired Description
    interpret-asString cardinal
    digits
    telephone
    name
    address
    id
    characters
    punctuation
    date
    time
    currency
    measure
    YesSpecifies the type of the text in the tag. Valid values:
    • cardinal: indicates that the text is read as an integer or decimal number.
    • digits: indicates that the text is read as a number.
    • telephone: indicates that the text is read as a phone number.
    • name: indicates that the text is read as a name.
    • address: indicates that the text is read as an address.
    • id: indicates that the text is read as an account name or nickname.
    • characters: indicates that the text in the tag is read by character.
    • punctuation: indicates that the text in the tag is read as a punctuation mark.
    • date: indicates that the text is read as a date.
    • time: indicates that the text is read as a time.
    • currency: indicates that the text is read as an amount.
    • measure: indicates that the text is read as a measurement unit.
  4. Types supported by the say-as tag

  • cardinal
    Format Example Output Description
    Numeric string 145 One hundred and forty-five Integer input range: positive and negative integers with a maximum of 20 digits, that is, -99,999,999,999,999,999,999 to 99,999,999,999,999,999,999.
    Decimal input range: There is no limit on the number of decimal places. However, we recommend that you retain up to 10 decimal places.
    Minus sign + numeric string -145 Minus one hundred and forty-five
    Numeric string with each three digits separated with a comma 10,000 Ten thousand
    Minus sign + numeric string with each three digits separated with a comma -10,124 Minus ten thousand one hundred and twenty-four
    Numeric string + decimal point + two zeros 10.00 Ten
    Minus sign + numeric string + decimal point + two zeros -110.00 Minus one hundred and ten
    Numeric string + decimal point + numeric string 79.090 Seventy-nine point zero nine zero
    Minus sign + numeric string + decimal point + numeric string -79.001 Minus seventy-nine point zero zero one
  • digits
    Format Example Output Description
    Numeric string 129090909 One two nine zero nine zero nine zero nine There is no limit on the length of the numeric string. However, we recommend that you retain up to 20 digits. If the numeric string exceeds 10 digits in length, insert a break between digits.
  • telephone
    Format Example Output Description
    Landline number 4930286 Four-nine-three, oh-two-eight-six Landline numbers can be seven or eight digits. The space and hyphen (-) can be used as the delimiter. Where:
    A seven-digit landline number can be separated in 3-4 mode.
    An eight-digit landline number can be separated in 4-4 mode.
    493 0286 Four-nine-three, oh-two-eight-six
    493-0286 Four-nine-three, oh-two-eight-six
    62552560 Six-two-five-five, two-five-six-oh
    6255 2560 Six-two-five-five, two-five-six-oh
    6255-2560 Six-two-five-five, two-five-six-oh
    Landline number + extension number 4930286-109 Four-nine-three, oh-two-eight-six, extension one-oh-nine Extension numbers can be one to four digits.
    4930286, x. 109 Four-nine-three, oh-two-eight-six, extension one-oh-nine
    4930286, ex. 109 Four-nine-three, oh-two-eight-six, extension one-oh-nine
    4930286, ext. 109 Four-nine-three, oh-two-eight-six, extension one-oh-nine
    Area code + landline number 01062552560 Oh-one-oh, six-two-five-five, two-five-six-oh Area codes of 010, 02x, 03xx, 04xx, 05xx, 07xx, 08xx, and 09xx are supported.
    010 62552560 Oh-one-oh, six-two-five-five, two-five-six-oh
    010 6255 2560 Oh-one-oh, six-two-five-five, two-five-six-oh
    010 6255-2560 Oh-one-oh, six-two-five-five, two-five-six-oh
    010-62552560 Oh-one-oh, six-two-five-five, two-five-six-oh
    010-6255-2560 Oh-one-oh, six-two-five-five, two-five-six-oh
    (010)62552560 Oh-one-oh, six-two-five-five, two-five-six-oh
    03198907098 Oh-three-one-nine, eight-nine-oh, seven-oh-nine-eight
    0319-8907098 Three-one-nine, eight-nine-oh, seven-oh-nine-eight
    Area code + landline number + extension number 010 62552560-109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    010-62552560-109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    (010)62552560-109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    (010)62552560, x. 109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    (010)62552560, ex.109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    (010)62552560, ext. 109 Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine
    Country code + area code + landline number 86-010-62791627 Eight-six, oh-one-oh, six-two-seven-nine, one-six-two-seven Country code formats of 86, (86), +86, (+86), and 0086 are supported, all of which are read as eight-six.
    (86)10-62791627 Eight-six, one-oh, six-two-seven-nine, one-six-two-seven
    +86-010-62791627 Eight-six, oh-one-oh, six-two-seven-nine, one-six-two-seven
    0086-10-62791627 Eight-six, one-oh, six-two-seven-nine, one-six-two-seven
    (+86)-10-6279 1627 Eight-six, one-oh, six-two-seven-nine, one-six-two-seven
    Country code + area code + landline number + extension number (86)21-58118818-207 Eight-six, two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven
    (86)021-5811-8818-207 Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven
    (86)021-58118818, x. 207 Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven
    (86)21-5811-8818, ex. 207 Eight-six, two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven
    +86-021-58118818, ext. 207 Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven
    Mobile number 151 9099 0987 One-five-one, nine-oh-nine-nine, oh-nine-eight-seven Mobile numbers of 11 digits are supported, which can be separated in 3-3-5 and 3-4-4 modes.
    151-909-90987 One-five-one, nine-oh-nine, nine-oh-nine-eight-seven
    151 909 90987 One-five-one, nine-oh-nine, nine-oh-nine-eight-seven
    Country code + mobile number +86-15190990987 Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven
    (+86)-151-9099-0987 Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven
    +8615190990987 Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven
    0086-151 909 90987 Eight-six, one-five-one, nine-oh-nine, nine-oh-nine-eight-seven
    Service number 110 One-one-oh 1. Common service numbers such as 110 are supported.
    2. Ten-digit service numbers starting with 400 or 800 are supported, which are separated in 3-3-4 mode.
    3. Sixteen-digit service numbers starting with 12530, 17951, and 12593 are supported.
    95566 Nine-five-five-six-six
    4008110510 Four hundred, eight-one-one, oh-five-one-oh
    800-810-8888 Eight hundred, eight-one-oh, eight-eight-eight-eight
    1253013520638377 One-two-five-three-oh, one-three-five, two-oh-six-three, eight-three-seven-seven
    Others (86)(21)9899-80800-0909 Eight-six, two-one, nine-eight-nine-nine, eight-oh-eight-oh-oh, oh-nine-oh-nine The numeric string and delimiters are supported. The delimiters can be parentheses and hyphen (-).
    - address
    FormatExampleOutputDescription
    Common address format 30-9, Jiayuan, Yuanhe Town Thirty hyphen nine, Jiayuan, Yuanhe Town Common address formats are supported. The address indicates the standard post address.
    No. 1107-1108, Lane 388, Shitai Road No. one-one-zero-seven to one-one-zero-eight, Lane three-eight-eight, Shitai Road
    3-1-3205, Jinyunfu, Phase 6, Huarun 24 City Three hyphen one hyphen three-two-zero-five, Jinyunfu, Phase six, Huarun Twenty-four City
    Room 2006, Building 2, Shenghua Mingdu Building Room two thousand six, Building two, Shenghua Mingdu Building
    Room 201, Unit 4, Building 5, Wuchang Street Courtyard Room two hundred one, Unit four, Building five, Wuchang Street Courtyard
    No. 19, Lane 150, Furong River Road No. nineteen, Lane one hundred fifty, Furong River Road
    - id
    FormatExampleOutputDescription
    String dell0101 D E L L zero one zero one Uppercase and lowercase English letters, numbers 0 to 9, and underscores (_) are supported.
    The output space indicates that a break is inserted between characters, and characters are read one by one.
    myid_1998 M Y I D underscore one nine nine eight
    AiTest A I T E S T
    - characters
    FormatExampleOutputDescription
    String ISBN 1-001-099098-1 I S B N one hyphen zero zero one hyphen zero nine nine zero nine eight hyphen one Chinese characters, uppercase and lowercase English letters, numbers 0 to 9, and some full-width and half-width characters are supported.
    The output space indicates that a break is inserted between characters, and characters are read one by one.
    If the text in the tag contains special XML characters, escape the characters. Common special XML characters include &lt;, &gt;, &amp;, &quot;, and &apos;, which correspond to the angle brackets (< and >), ampersand (&), double quotation mark ("), and apostrophe (').
    x10b2345_u x one zero b two three four five underscore u
    v1.0.1 v one point zero point one
    Version 2.0 Version two point zero
    Su M MA000 Su M M A zero zero zero
    Airbus A330 Airbus A three three zero
    Models s01, s02, and s03 Modes s zero one s zero two and s zero three
    Airbus A330 Airbus A three three zero
    αβγ Alpha beta gamma
    - punctuation
    FormatExampleOutputDescription
    Punctuation mark Ellipsis Common Chinese and English punctuation marks are supported. For more information, see the following table about the punctuation marks.
    The output space indicates that a break is inserted between characters, and characters are read one by one.
    If the text in the tag contains special XML characters, escape the characters. Common special XML characters include &lt;, &gt;, &amp;, &quot;, and &apos;, which correspond to the angle brackets (< and >), ampersand (&), double quotation mark ("), and apostrophe (').
    …… Ellipsis
    !"#$%& Exclamation point, double quotation mark, number sign, dollar sign, percent sign, and ampersand
    ‘()+ Apostrophe, left parenthesis, right parenthesis, asterisk, and plus sign
    ,-./:; Comma, hyphen, period, forward slash, colon, and semicolon
    <=>?@ Less than, equal sign, greater than, question mark, and at sign
    [\]^_ Left square bracket, backslash, right square bracket, caret, and underscore
    - *date
    FormatExampleOutputDescription
    Year 71 Nineteen seventy-one Two-digit and four-digit years are supported, in which:
    Two-digit years range from 60 to 99, 00 to 09, and 10 to 19.
    Four-digit years range from 1000 to 1999 and 2000 to 2099.
    04 Two thousand and four
    19 Two thousand and nineteen
    1011 One thousand and eleven
    1998 Nineteen ninety-eight
    2008 Two thousand and eight
    Year and month April, 98 April, nineteen ninety-eight The months from January to September can be represented by a number with or without a zero. For example, in April, 1908, April can be represented by 4 or 04.
    April, 1998 April, nineteen ninety-eight
    August, 08 August, two thousand and eight
    August, 2008 August, two thousand and eight
    Year, month, and day
    April 23, 98 April twenty-third, nineteen ninety-eight The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, on April eighth, 1908, April can be represented by 4 or 04, and eighth can be represented by 8 or 08.
    April 23, 1998 April twenty-third, nineteen ninety-eight
    August 8, 08 August eighth, two thousand and eight
    August 08, 2008 August eighth, two thousand and eight
    Year, month, and day
    April 23, 98 April twenty-third, nineteen ninety-eight The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, on April eighth, 1908, April can be represented by 4 or 04, and eighth can be represented by 8 or 08.
    April 23, 1998 April twenty-third, nineteen ninety-eight
    August 8, 08 August eighth, two thousand and eight
    August 08, 2008 August eighth, two thousand and eight
    Month and day March 20 March twentieth
    August 07 August seventh
    Abbreviation of year and month 2018/08 August, two thousand and eighteen The forward slash (/), hyphen (-), and period (.) can be used as the delimiters.
    2018-08 August, two thousand and eighteen
    2018.08 August, two thousand and eighteen
    Abbreviation of year, month, and day 2018/08/08 August eighth, two thousand and eighteen
    2018-8-8 August eighth, two thousand and eighteen
    2018.08.08 August eighth, two thousand and eighteen
    Date range
    September 1~30, 04 September first to thirtieth, two thousand and four The tilde (~) and hyphen (-) can be used to indicate the range.
    September 01, 2004-June 08, 2008 September first, two thousand and four to June eighth, two thousand and eight
    Date range
    September 1~30, 04 September first to thirtieth, two thousand and four
    September 01, 2004-June 08, 2008 September first, two thousand and four to June eighth, two thousand and eight
    Date range April, 01~April, 10 April, two thousand and one to April, two thousand and ten
    April, 2001~April, 2010 April, two thousand and one to April, two thousand and ten
    Date range
    October 1~October 7 October first to October seventh
    October 01~October 07 October first to October seventh
    Date range
    October 1~7 October first to seventh
    October 01~07 October first to seventh
    Abbreviation of date range 2018/03/03~2019/01/01 March third, two thousand and eighteen to January first, two thousand and nineteen The forward slash (/) and period (.) can be used as the delimiters. The tilde (~) and hyphen (-) can be used to indicate the range.
    1997.9.9~1998.9.9 September ninth, nineteen ninety-seven to September ninth, nineteen ninety-eight
    Abbreviation of date range 10/20~10/31 October twentieth to October thirty-first
    Date range
    January~October January to October
    January~October January to October
    Abbreviation of month, day, and year 10/20/2018 October twentieth, two thousand and eighteen Only four-digit years are supported, only the forward slash (/) can be used as the delimiter, and only the format of month/day/year is supported.
    - time
    Format Example Output Description
    Time 12:00 Twelve o’clock Common time and time range formats are supported.
    12:00:00 Twelve o’clock
    10:20 Ten twenty
    10:20:30 Ten twenty and thirty seconds
    09:18:14 Nine eighteen and fourteen seconds
    Time range 11:00~12:00 Eleven o’clock to twelve o’clock
    09:00-14:00 Nine o’clock to fourteen o’clock
    11:00~11:30 Eleven o’clock to eleven thirty
    11:00-12:18 Eleven o’clock to twelve eighteen
    10:30~11:00 Ten thirty to eleven o’clock
    09:28-10:00 Nine twenty-eight to ten o’clock
    10:20~11:20 Ten twenty to eleven twenty
    06:00~08:00 Six o’clock to eight o’clock
    10:20 a.m.~1:30 p.m. Ten twenty AM to one thirty PM
    Abbreviation of time 5:00am Five o’clock AM
    5:30am Five thirty AM
    5:20:12am Five twenty and twelve seconds AM
    7:00am Seven o’clock AM
    7:30AM Seven thirty AM
    7:20:12a.m. Seven twenty and twelve seconds AM
    07:08:12A.M. Seven eight and twelve seconds AM
    5:00pm Five o’clock PM
    5:30PM Five thirty PM
    5:20:12p.m. Five twenty and twelve seconds PM
    05:09:12P.M. Five nine and twelve seconds PM
    9:00pm Nine o’clock PM
    9:30pm Nine thirty PM
    9:20:12PM Nine twenty and twelve seconds PM
    9:02:12P.M. Nine two and twelve seconds PM
    12:00pm Twelve o’clock PM
    12:30p.m. Twelve thirty PM
    12:20:12PM Twelve twenty and twelve seconds PM
    - currency

    Format Example Output Description
    Number + currency identifier 12.00RMB Twelve yuan The following currency identifiers are supported: AUD, CAD, HKD, JPY, USD, CHF, NOK, SEK, GBP, RMB, CNY, and EUR.
    The supported number formats include the integer, decimal, and international expressions separated with commas (,).
    12.50RMB Twelve point five yuan
    12,000,000RMB Twelve million yuan
    12,000,000.00RMB Twelve million yuan
    12,000.35RMB Twelve thousand point thirty-five yuan
    Currency sign + number $12 Twelve dollars The following currency signs are supported: Canadian dollar (CAD), dollar sign ($), franc (Fr), Swedish krona (kr), pound sign (£), yen sign (¥), yuan sign (¥), and euro sign (€).
    The supported number formats include the integer, decimal, and international expressions separated with commas (,).
    $12.00 Twelve dollars
    $12.12 Twelve point twelve dollars
    $12,000 Twelve thousand dollars
    $12,000.00 Twelve thousand dollars
    $12,000.99 Twelve thousand point ninety-nine dollars
    Other default reading methods 1213 One thousand two hundred and thirteen
    1213KML One thousand two hundred and thirteen K M L
    1213.00KML One thousand two hundred and thirteen K M L
    1213.9KML One thousand two hundred and thirteen point nine K M L
    1,000KML One thousand K M L
    1,000.00KML One thousand K M L
    1,000.98KML One thousand point ninety-eight K M L
    12,000 Twelve thousand

    • measure
      Format Example Output Description
      Number + unit 2 pieces Two pieces Common Chinese units are supported. For more information about the unit abbreviations, see the following unit table.
      120 hectares One hundred and twenty hectares
      Over 100 milligrams Over one hundred milligrams
      About 100 meters About one hundred meters
      Over 100 persons Over one hundred persons
      1 centimeter 20 millimeters One centimeter twenty millimeters
      120.00 square kilometers One hundred and twenty square kilometers
      Number + unit abbreviation 120.56cm² One hundred twenty point fifty-six square centimeters
      120 m² 56 cm² One hundred twenty square meters fifty-six square centimeters
      100m12cm6mm One hundred meters twelve centimeters six millimeters
      Range 10~15kg Ten to fifteen kilograms
      10.24~789.82 Mu Ten point twenty-four to seven hundred eighty-nine point eighty-two Mu
      10 meters~15 meters Ten meters to fifteen meters
      10.24cm~19.08cm Ten point twenty-four centimeters to nineteen point zero eight centimeters
      Number + unit + "/" + unit RMB 10/kg Ten yuan per kilogram
      RMB 199~299/piece One hundred and ninety-nine yuan to two hundred and ninety-nine yuan per piece
      RMB 299.99/g~RMB 399.99/g Two hundred ninety-nine point ninety-nine yuan to three hundred ninety-nine point ninety-nine yuan per gram
      Other default reading methods 12 bunches Twelve bunches
      30rm Thirty reams
      400,000,000 fellows Four hundred million fellows
      12.897 micrograms Twelve point eight nine seven micrograms

    The following table describes the characters supported by the say-as tag.

    Punctuation mark Reading
    ! Exclamation point
    Double quotation mark
    # Number sign
    $ dollar
    % Percent sign
    & and
    Apostrophe
    ( Left parenthesis
    ) Right parenthesis
    * Asterisk
    + Plus sign
    , Comma
    - Hyphen
    . Period
    / Forward slash
    : Colon
    ; Semicolon
    < Less than
    = Equal sign
    > Greater than
    ? Question mark
    @ at
    [ Left square bracket
    \ Backslash
    ] Right square bracket
    ^ Caret
    _ Underscore
    ` Grave accent
    { Left brace
    | Vertical bar
    } Right brace
    ~ Tilde
    Exclamation point
    Left double quotation mark
    Right double quotation mark
    Left single quotation mark
    Right single quotation mark
    Left parenthesis
    Right parenthesis
    Comma
    Period
    En dash
    Colon
    Semicolon
    Question mark
    Enumeration comma
    Ellipsis
    …… Ellipsis
    Left title mark
    Right title mark
    Yuan sign
    Greater than or equal to
    Less than or equal to
    Not equal to
    Approximation
    ± Plus-minus sign
    × Multiplication sign
    π Pi
    Α Alpha
    Β Beta
    Γ Gamma
    Δ Delta
    Ε Epsilon
    Ζ Zeta
    Ε Eta
    Θ Theta
    Ι Iota
    Κ Kappa
    Lambda
    Μ Mu
    Ν Nu
    Ξ Xi
    Ο Omicron
    Pi
    Ρ Rho
    Sigma
    Τ Tau
    Υ Upsilon
    Φ Phi
    Χ Chi
    Ψ Psi
    Ω Omega
    α Alpha
    β Beta
    γ Gamma
    δ Delta
    ε Epsilon
    ζ Zeta
    η Eta
    θ Theta
    ι Iota
    κ Kappa
    λ Lambda
    μ Mu
    ν Nu
    ξ Xi
    ο Omicron
    π Pi
    ρ Rho
    σ Sigma
    τ Tau
    υ Upsilon
    φ Phi
    χ Chi
    ψ Psi
    ω Omega

    The following table describes the measurement units supported by the say-as tag.

    Format
    Category
    Example
    Abbreviation
    Length
    nm, μm, mm, cm, m, km, ft, and in
    Area
    cm², m², km², and SqFt
    Volume
    cm³, m³, km³, mL, L, and gallon
    Weight
    μg, mg, g, and kg
    Time
    min, sec, and ms
    Electromagnet
    μA, mA, Ω, Hz, kHz, MHz, GHz, V, kV, and kWh
    Voice
    dB
    Pressure
    Pa, kPa, and Mpa
    Other units
    Other units are supported, such as meter, second, USD, and milliliters per bottle. Quantifiers are also supported, such as rack, head, piece, and basin.

    5.Tag relationship

    The <say-as> tag can only contain text.

    6.Examples

    • cardinal
      1. <speak>
      2. <say-as interpret-as="cardinal">12345</say-as>
      3. </speak>
    • digits

      1. <speak>
      2. <say-as interpret-as="digits">12345</say-as>
      3. </speak>
    • telephone

      1. <speak>
      2. <say-as interpret-as="telephone">12345</say-as>
      3. </speak>
    • name

      1. <speak>
      2. She once used <say-as interpret-as="name">Zeng Xiaofan as her full name.</say-as>
      3. </speak>
    • address

      1. <speak>
      2. <say-as interpret-as="address">Room 304, Unit 3, Building 1, Fulu International District</say-as>
      3. </speak>
    • id

      1. <speak>
      2. <say-as interpret-as="id">myid_1998</say-as>
      3. </speak>
    • characters

      1. <speak>
      2. <say-as interpret-as="characters">The Greek letters α and β.</say-as>
      3. </speak>
    • punctuation

      1. <speak>
      2. <say-as interpret-as="punctuation"> -./:;</say-as>
      3. </speak>
    • date

      1. <speak>
      2. <say-as interpret-as="date">1000-10-10</say-as>
      3. </speak>
    • time

      1. <speak>
      2. <say-as interpret-as="time">5:00am</say-as>
      3. </speak>
    • currency

      1. <speak>
      2. <say-as interpret-as="currency">13,000,000.00RMB</say-as>
      3. </speak>
    • measure

      1. <speak>
      2. <say-as interpret-as="measure">100m12cm6mm</say-as>
      3. </speak>

    4. Comprehensive example

    The following example shows how to use SSML in detail. You can copy the following code to Project Settings of the console to test the effect.

    Notes:

    • The narration in the example is a custom voice and is not available currently. For more information about how to experience the custom voice, For more information about how to experience the custom voice, please contact us: nls_support@service.aliyun.com.
    • The text of each speech synthesis request can contain up to 300 characters and can use the <speak> tag only once. In the following example, the text must be synthesized for multiple times based on the <speak> tag.
    1. <speak>
    2. Legend has it that in the Northern Song Dynasty,
    3. <say-as interpret-as="date">October 10, 1121</say-as>,
    4. <say-as interpret-as="address">eager shoppers and vendors</say-as>
    5. used to gather outside Kaifeng City on the morning of
    6. <sub alias="the Double 11</sub>
    7. Shopping Festival. As a train of mules carrying goods entered the city gate on one of these mornings,
    8. <soundEvent src="http://nls.alicdn.com/sound-event/bell.wav"/>
    9. a pretty girl with fair skin
    10. <phoneme alphabet="py" ph="de5">stopped</phoneme>
    11. a lad named <say-as interpret-as="name">A Fa in the front of the crowds.</say-as>
    12. </speak>
    13. <speak voice="xiaomei">
    14. "Dear, special offer for today:
    15. <say-as interpret-as="digits">199</say-as>
    16. 100 cash back for spending 199.
    17. <say-as interpret-as="cardinal">100</say-as>
    18. Don't miss it."
    19. </speak>
    20. <speak voice="sicheng" rate="150">
    21. "Not today, hurrying to restock. It's 09:59:59 now.
    22. <say-as interpret-as="time">09:59:59</say-as>
    23. Any later, and the supply chain would be broken."
    24. </speak>
    25. <speak>
    26. <say-as interpret-as="name">Wiping away his sweat, A Fa</say-as>
    27. led the train of mules through the busy streets with various shouts of peddlers into his ears.
    28. </speak>
    29. <speak voice="ninger" rate="200">
    30. On-site cloth dyeing with chic color and design. Buy two feet, get one foot free;
    31. </speak>
    32. <speak voice="xiaobei">
    33. Best-seller gauze cap. Return of goods without reasons within 7 days;
    34. </speak>
    35. <speak voice="sijia">
    36. Special treatment for adults and children. Improve the body conditions of men and women and treat incurable and complicated diseases.
    37. </speak>
    38. <speak>
    39. Suddenly, a horse, somehow startled, rushed along the road neighing.
    40. <soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>
    41. And then a frightened kid tottered to his parent,
    42. <break time="50ms"/>shouting:
    43. </speak>
    44. <speak voice="sitong" rate="150">
    45. "Mom, mom!"
    46. </speak>
    47. <speak>
    48. Seeing this,
    49. <say-as interpret-as="name">A Fa</say-as>
    50. cursed silently:
    51. </speak>
    52. <speak effect="robot" pitch="-100">
    53. "My poor heart!"
    54. </speak>
    55. <speak>
    56. He pressed his
    57. <phoneme alphabet="py" ph="he2 bao1">wallet tightly to himself</phoneme>
    58. and went on delivering goods. The sight of the
    59. <say-as interpret-as="address">prosperous Kaifeng City</say-as>
    60. left
    61. <say-as interpret-as="name">A Fa</say-as>
    62. a deep impression.
    63. .
    64. </speak>
    65. <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-200">
    66. As the night fell, the busy streets turned quiet. In a fit of joy, he picked up a painting brush and drew a long scroll. He named it Riverside Scene at Qingming Festival.
    67. </speak>