All Products
Search
Document Center

Usage notes

Last Updated: Sep 30, 2020

This topic describes the features of Speech Synthesis Markup Language (SSML) and provides sample code to guide you through SSML-based synthesis.

Overview

SSML is an XML-based markup language. Compared with plain text synthesis, SSML-based synthesis enriches the content that can be synthesized, achieving a variety of final synthesis effects. By using SSML, you can not only control what the speech synthesis service reads, but also control how the service reads the text you specify. For example, you can specify how to break sentences and words, control the pronunciation, speed, pauses, intonation, and volume, and even add background music (BGM).

Note

Alibaba Cloud SSML is implemented based on SSML V1.0 of W3C. However, Alibaba Cloud SSML does not support all W3C markup types. Instead, Alibaba Cloud SSML supports markup types as many as possible based on business needs.

Usage

Note

  • For more information about the SSML tags that the speech synthesis service supports, see the following Tags section.

  • Alibaba Cloud SSML can only be used to synthesize speech from Chinese text.

  • All input text must be enclosed in the <speak></speak> tag. To synthesize speech from long text, you can use multiple <speak></speak> tags. For example, synthesize speech from the following text:

    How are you? <speak>
    <say-as interpret-as="telephone">114</say-as>Help me find a phone number. 
    <say-as interpret-as="cardinal">123</say-as>Let's get started.
    The sum is <say-as interpret-as="digits">1234</say-as>.
    <say-as interpret-as="name">Bo Bin's</say-as> package.
    <say-as interpret-as="address">Room 304, Unit 3, Building 1, Fulu International District</say-as>
    <say-as interpret-as="nick">Wang Yonghua 6689</say-as>
    </speak>I'm fine. <speak>
    The license plate number is <say-as interpret-as="verbatim">ZheA X88888</say-as>
    </speak>Ha ha ha

You can upload tagged text as the value of the text parameter to the speech synthesis service. The following code uses the SDK for Java as an example:

SpeechSynthesizer synthesizer = new SpeechSynthesizer(client, getSynthesizerListener());
String text = "<speak>Close your eyes and take a rest.<break time=\"500ms\"/>Okay, now open your eyes. </speak>";
synthesizer.setText(text);

In this example, the following request is sent to the speech synthesis service:

{
    "payload": {
        "volume": 50,
        "sample_rate": 16000,
        "format": "wav",
        "text": "<speak>Close your eyes and take a rest.<break time=\"500ms\"/>Okay, now open your eyes. </speak>"
    },
    "context": {
        "sdk": {
            "name": "nls-sdk-java",
            "version": "2.0.4"
        }
    },
    "header": {
        "namespace": "SpeechSynthesizer",
        "name": "StartSynthesis",
        "message_id": "5fdf78c0dd574b6897f3cb204dd0****",
        "appkey": "f6OslY8nCPOa****",
        "task_id": "6e1be78ef5804c50a2c5a8b92de1****"
    }
}

Tags

<speak>

  • Description

    <speak> is the root node of all SSML tags that the speech synthesis service supports. All text that needs to call SSML tags must be enclosed in the <speak></speak> tag.

  • Syntax

     <speak>Text that needs to call SSML tags</speak>
  • Attributes

    <speak> can use the following attributes.

    Name

    Type

    Value

    Required

    Description

    voice

    String

    The name of the online speaker that can be called. The value of the voice attribute can only contain lowercase letters, such as siyue.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the speaker during synthesis. The specified speaker has a higher priority than that specified by the voice parameter in the speech synthesis API.

    Notice

    You cannot set this attribute to xiaoyue.

    encodeType

    String

    PCM/WAV/MP3

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the audio file format during synthesis. The specified audio file format has a higher priority than that specified by the format parameter in the speech synthesis API.

    sampleRate

    String

    8000/16000

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the audio sampling rate during synthesis. The specified audio sampling rate has a higher priority than that specified by the sample_rate parameter in the speech synthesis API.

    rate

    String

    Any integer from -500 to 500. Default value: 0.

    • A value greater than 0 indicates that the speech speed is increased.

    • A value less than 0 indicates that the speech speed is reduced.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the audio speed during synthesis. The specified audio speed has a higher priority than that specified by the speech_rate parameter in the speech synthesis API.

    pitch

    String

    Any integer from -500 to 500. Default value: 0.

    • A value greater than 0 indicates that the pitch rises.

    • A value less than 0 indicates that the pitch falls.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the audio pitch during synthesis. The specified audio pitch has a higher priority than that specified by the pitch_rate parameter in the speech synthesis API.

    volume

    String

    Any integer from 0 to 100. Default value: 50.

    • A value greater than 50 indicates that the volume is increased.

    • A value less than 50 indicates that the volume is reduced.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the audio volume during synthesis. The specified audio volume has a higher priority than that specified by the volume parameter in the speech synthesis API.

    effect

    String

    robot/lolita/lowpass/echo/eq/lpfilter/hpfilter

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute can be used to produce various sound effects for the synthesized speech. Valid values:

    • robot

    • lolita

    • lowpass

    • echo

    • eq: equalizer

    • lpfilter: low-pass filter

    • hpfilter: high-pass filter

    Note

    • Among the values, eq, lpfilter, and hpfilter are advanced filters. If you set this attribute to eq, lpfilter, or hpfilter, you can set the effectValue attribute to customize the effect of the specified filter.

    • An SSML structure supports only one sound effect. You cannot set this attribute to multiple values.

    • If you set this attribute, the system latency may increase.

    effectValue

    String

    A value that indicates the effect of the specified filter if you set the effect attribute to eq, lpfilter, or hpfilter.

    No

    • eq: The system provides eight default bands with the frequencies of 40 Hz, 100 Hz, 200 Hz, 400 Hz, 800 Hz, 1,600 Hz, 4,000 Hz, and 12,000 Hz. All of their bandwidths are 1.0q. When you set this attribute, you must enter the corresponding gain of each band. The adjustment range of each gain is from -20 dB to 20 dB. For example, if you set the effect attribute to eq, you can set this attribute to 1 1 1 1 1 1 1 1. The input value is a string consisting of eight integers separated with spaces. If the value is 0, the gain of the corresponding band is not adjusted.

    • lpfilter: the frequency of the low-pass filter. The value can be any integer in the range of (0, target sampling rate/2]. For example, if you set the effect attribute to lpfilter, you can set this attribute to 800.

    • hpfilter: the frequency of the high-pass filter. The value can be any integer in the range of (0, target sampling rate/2]. For example, if you set the effect attribute to hpfilter, you can set this attribute to 1200.

    bgm

    String

    The name of the BGM that can be called online. For more information, see the following description of the bgm attribute.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the BGM of the synthesized speech.

    backgroundMusicVolume

    String

    Any integer from 0 to 100. Default value: 50.

    • A value greater than 50 indicates that the volume is increased.

    • A value less than 50 indicates that the volume is reduced.

    No

    This attribute is included in the proprietary tag of the Alibaba Cloud speech synthesis service. This attribute specifies the volume of the BGM.

    The following table describes the bgm attribute.

    Built-in BGM URL

    Custom BGM URL

    The speech synthesis service provides several built-in BGM streams. You can click the following URLs to listen to them:

    You can use custom BGM as needed. Before you specify a custom BGM, store the BGM in your Alibaba Cloud OSS bucket and set the access control list (ACL) of the bucket to public read or public read/write. For more information about how to create a bucket, see Create buckets. You can use the HTTP or HTTPS protocol to generate a URL for an object stored in a bucket. For more information, see the Manage objects section of the Quick start topic.

    Requirements on audio files to be uploaded:

    • The audio file must be a mono WAV file with the sampling rate of 16,000 Hz.

    • The maximum file size is 2 MB.

    Note

    • If the synthesis duration is longer than the BGM duration, the BGM is cyclically played. If your BGM file is not in the WAV format, you can run the following command to convert the BGM file to the WAV format by using FFmpeg: ffmpeg -i Input audio file -acodec pcm_s16le -ac 1 -ar 16000 Target audio file.wav.

    • If the URL in the tag contains special XML characters, escape the characters. The following special characters are commonly used: <, >, &, ", and '.

    Notice

    You are legally liable for the copyright of the uploaded audio file.

  • Tag relationship

    The <speak> tag can contain text and the following tags:

    • <break>

    • <s>

    • <w>

    • <phoneme>

    • <say-as>

    • Examples

      • Empty attribute

        <speak>
          Text that needs to call SSML tags
        </speak>

        Synthesis result: SSML-speak1.mp3

      • Attribute voice

        <speak voice="xiaogang">
          This is a male voice.
        </speak>

        Synthesis result: SSML-speak2.mp3

      • Attribute encodeType

        <speak encodeType="mp3">
          I can produce audio in compressed formats.
        </speak>

        Synthesis result: SSML-encode.mp3

      • Attribute sampleRate

        <speak sampleRate="8000">
          Check my file size. It is half of the audio at a sample rate of 16,000 Hz.
        </speak>

        Synthesis result: SSML-speak4.mp3

      • Attribute rate

        <speak rate="200">
          I speak faster than the normal people.
        </speak>

        Synthesis result: SSML-speak5.mp3

      • Attribute pitch

        <speak pitch="-100">
          But the pitch of my voice is lower than theirs.
        </speak>

        Synthesis result: SSML-speak6.mp3

      • Attribute volume

        <speak volume="80">
          I have a voice of high volume too.
        </speak>

        Synthesis result: SSML-speak7.mp3

      • Combination of attributes, separated with spaces

        <speak rate="200" pitch="-100" volume="80">
          So put together, this is my voice.
        </speak>

        Synthesis result: SSML-speak8.mp3

      • Attribute effect

        <speak effect="robot">
          Do you like Wall-E the robot?
        </speak>

        Synthesis result: SSML-speak9.mp3

      • Attribute bgm

        <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40">
          <break time="2s"/>
          Shady cliff, old trees, dense clouds and mist;
          <break time="700ms"/>
          Sound of raindrops remains in the bamboo forest.
          <break time="700ms"/>
          Mianjue Books do include plans beneficial to country;
          <break time="700ms"/>
          Mianzhou's landscape and specialty are always worth the journey.
          <break time="2s"/>
        </speak>

        Synthesis result: SSML-speak10.mp3

    <break>

    • Description

      The <break> tag is optional, which is used to insert a break in the text.

    • Syntax

       # Empty attribute
       <break/>
       # Attribute with time
       <break time="string"/>
    • Attributes

      Note

      If the <break> tag without any attribute is used, the speech has a break of 1 second.

      Name

      Type

      Value

      Required

      Description

      time

      String

      [number]s/[number]ms

      No

      The break length, in seconds or milliseconds. For example, 2 seconds or 50 milliseconds.

      • If the break is in milliseconds, the value of number is an integer in the range of [50, 10000]. In this case, the value is in the format of [number]s.

      • If the break is in seconds, the value of number is an integer in the range of [1, 10]. In this case, the value is in the format of [number]ms.

    • Tag relationship

      <break> is an empty tag and cannot contain any tags. If an SSML structure contains the <s> tag, write the <break> tag in the <s> tag, which indicates that the current paragraph or sentence has a break.

    • Examples

       <speak>
         Close your eyes and take a rest.<break time="500ms"/>Okay, now open your eyes.
       </speak>

      Synthesis result: SSML-break.mp3

    <s>

    • Description

      The <s> tag is optional, which is used to represent the sentence structure in the text.

    • Syntax

       <s>Text</s>
    • Attributes

      None.

    • Tag relationship

      The <s> tag can contain text and the following tags:

      • <break>

      • <w>

      • <phoneme>

      • <say-as>

    • Examples

      <speak><s>This is the first sentence.</s><s>This is the second sentence.</s></speak>

      Synthesis result: SSML-s.mp3

    <sub>

    • Description

      The <sub> tag is used to replace the text in a tag with an alias.

    • Syntax

       <sub alias="string"></sub>
    • Attributes

      Name

      Type

      Value

      Required

      Description

      alias

      String

      The target content.

      Yes

      The text used to replace the text in a tag.

    • Tag relationship

      The <sub> tag can contain only text.

    • Examples

       <speak><sub alias="网络协议标准">W3C</sub></speak>

      Synthesis result: SSML-sub.mp3

    <w>

    • Description

      The <w> tag is optional, which is used to represent the word structure in the text.

    • Syntax

       <w>Text</w>
    • Attributes

      None.

    • Tag relationship

      The <w> tag can contain only text.

    • Examples

       <speak>The Mayor of Nanjing <w>Jiang Daqiao</w> gave a speech today. </speak>

      Synthesis result: SSML-w.mp3

    <phoneme>

    • Description

      The <phoneme> tag is optional, which is used to control the pronunciation of the text in a tag.

    • Syntax

       <phoneme alphabet="string" ph="string">Text</phoneme>
    • Attributes

      Name

      Type

      Value

      Required

      Description

      alphabet

      String

      py

      Yes

      The value of py indicates pinyin.

      ph

      String

      The pinyin string corresponding to the text in the tag.

      Yes

      Assignment specification for pinyin:

      • Pinyin syllables are separated with spaces. The number of pinyin syllables must be the same as that of words.

      • Each pinyin syllable is composed of sound and tone marks. The tone marks are represented by tone numbers 1 to 5, in which 5 indicates the neutral tone.

    • Tag relationship

      The <phoneme> tag can contain only text.

    • Examples

       <speak>
          Go to a <phoneme alphabet="py" ph="dian3 dang4 hang2">pawnshop</phoneme> with this<phoneme alphabet="py" ph="dang4 diao4">.</phoneme>
       </speak>

      Synthesis result: SSML-phoneme.mp3

    <soundEvent>

    • Description

      The <soundEvent> tag is used to insert a sound cue in any position of the text during SSML-based synthesis.

    • Syntax

       <soundEvent src="URL"/>
    • Attributes

      Name

      Type

      Value

      Required

      Description

      src

      String

      The URL of the sound cue.

      Yes

      You can use a custom sound cue as needed. Before you specify a custom BGM, store the BGM in your Alibaba Cloud OSS bucket and set the ACL of the bucket to public read or public read/write. For more information about how to create a bucket, see Create buckets. You can use the HTTP or HTTPS protocol to generate a URL for an object stored in a bucket. For more information, see the Manage objects section of the Quick start topic.

      Requirements on audio files to be uploaded:

      • The audio file must be a mono WAV file with the sampling rate of 16,000 Hz.

      • The maximum file size is 2 MB.

      Notice

      You are legally liable for the copyright of the uploaded audio file.

    • Tag relationship

      <soundEvent> is an empty tag and cannot contain any tags.

    • Examples

       <speak>
         A horse gets frightened.<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>People scatter in search of shelter.
       </speak>

      Synthesis result: SSML-sound-event.mp3

    <say-as>

    • Description

      The <say-as> tag is used to indicate the type of the text in the tag, so that the text can be read based on the default pronunciation method of this type.

    • Syntax

       <say-as interpret-as="string">Text</say-as>
    • Attributes

      Name

      Type

      Value

      Required

      Description

      interpret-as

      String

      cardinal/digits/telephone/name/address/id/characters/punctuation/date/time/currency/measure

      Yes

      Specifies the type of the text in the tag. Valid values:

      • cardinal: indicates that the text is read as an integer or decimal number.

      • digits: indicates that the text is read by digit.

      • telephone: indicates that the text is read as a phone number.

      • name: indicates that the text is read as a name.

      • address: indicates that the text is read as an address.

      • id: indicates that the text is read as an account name or nickname.

      • characters: indicates that the text in the tag is read by character.

      • punctuation: indicates that the text in the tag is read as a punctuation mark.

      • date: indicates that the text is read as a date.

      • time: indicates that the text is read as a time.

      • currency: indicates that the text is read as an amount.

      • measure: indicates that the text is read as a measurement unit.

    • Text types that the say-as tag supports

      • cardinal

        Format

        Example

        Output

        Description

        Numeric string

        145

        One hundred and forty-five

        Integer input range: positive and negative integers with a maximum of 20 digits, that is, -99,999,999,999,999,999,999 to 99,999,999,999,999,999,999.

        Decimal input range: No limits are set on the number of decimal places. However, we recommend that you retain up to 10 decimal places.

        Minus sign + numeric string

        -145

        Minus one hundred and forty-five

        Numeric string with each three digits separated with a comma

        10,000

        Ten thousand

        Minus sign + numeric string with each three digits separated with a comma

        -10,124

        Minus ten thousand one hundred and twenty-four

        Numeric string + decimal point + two zeros

        10.00

        Ten

        Minus sign + numeric string + decimal point + two zeros

        -110.00

        Minus one hundred and ten

        Numeric string + decimal point + numeric string

        79.090

        Seventy-nine point zero nine zero

        Minus sign + numeric string + decimal point + numeric string

        -79.001

        Minus seventy-nine point zero zero one

      • digits

        Format

        Example

        Output

        Description

        Numeric string

        129090909

        One two nine zero nine zero nine zero nine

        No limits are set on the length of the numeric string.

        However, we recommend that you retain up to 20 digits. If the numeric string exceeds 10 digits in length, insert a break between digits.

      • telephone

        Format

        Example

        Output

        Description

        Landline number

        4930286

        Four-nine-three, oh-two-eight-six

        Landline numbers can be seven or eight digits. The space and hyphen (-) can be used as the delimiter.

        Note the following rules on different formats: A seven-digit landline number can be separated in 3-4 mode. An eight-digit landline number can be separated in 4-4 mode.

        493 0286

        Four-nine-three, oh-two-eight-six

        493-0286

        Four-nine-three, oh-two-eight-six

        62552560

        Six-two-five-five, two-five-six-oh

        6255 2560

        Six-two-five-five, two-five-six-oh

        6255-2560

        Six-two-five-five, two-five-six-oh

        Landline number + extension number

        4930286-109

        Four-nine-three, oh-two-eight-six, extension one-oh-nine

        Extension numbers can be one to four digits.

        4930286, x. 109

        Four-nine-three, oh-two-eight-six, extension one-oh-nine

        4930286, ex. 109

        Four-nine-three, oh-two-eight-six, extension one-oh-nine

        4930286, ext. 109

        Four-nine-three, oh-two-eight-six, extension one-oh-nine

        Area code + landline number

        01062552560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        Area codes of 010, 02x, 03xx, 04xx, 05xx, 07xx, 08xx, and 09xx are supported.

        010 62552560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        010 6255 2560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        010 6255-2560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        010-62552560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        010-6255-2560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        (010)62552560

        Oh-one-oh, six-two-five-five, two-five-six-oh

        03198907098

        Oh-three-one-nine, eight-nine-oh, seven-oh-nine-eight

        0319-8907098

        Three-one-nine, eight-nine-oh, seven-oh-nine-eight

        Area code + landline number + extension number

        010 62552560-109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        None.

        010-62552560-109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        (010)62552560-109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        (010)62552560, x. 109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        (010)62552560, ex.109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        (010)62552560, ext. 109

        Oh-one-oh, six-two-five-five, two-five-six-oh, extension one-oh-nine

        Country code + area code + landline number

        86-010-62791627

        Eight-six, oh-one-oh, six-two-seven-nine, one-six-two-seven

        Country code formats of 86, (86), +86, (+86), and 0086 are supported, all of which are read as eight-six.

        (86)10-62791627

        Eight-six, one-oh, six-two-seven-nine, one-six-two-seven

        +86-010-62791627

        Eight-six, oh-one-oh, six-two-seven-nine, one-six-two-seven

        0086-10-62791627

        Eight-six, one-oh, six-two-seven-nine, one-six-two-seven

        (+86)-10-6279 1627

        Eight-six, one-oh, six-two-seven-nine, one-six-two-seven

        Country code + area code + landline number + extension number

        (86)21-58118818-207

        Eight-six, two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven

        None.

        (86)021-5811-8818-207

        Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven

        (86)021-58118818, x. 207

        Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven

        (86)21-5811-8818, ex. 207

        Eight-six, two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven

        +86-021-58118818, ext. 207

        Eight-six, oh-two-one, five-eight-one-one, eight-eight-one-eight, extension two-oh-seven

        Mobile number

        151 9099 0987

        One-five-one, nine-oh-nine-nine, oh-nine-eight-seven

        Mobile numbers of 11 digits are supported, which can be separated in 3-3-5 and 3-4-4 modes.

        151-909-90987

        One-five-one, nine-oh-nine, nine-oh-nine-eight-seven

        151 909 90987

        One-five-one, nine-oh-nine, nine-oh-nine-eight-seven

        Country code + mobile number

        +86-15190990987

        Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven

        None.

        (+86)-151-9099-0987

        Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven

        +8615190990987

        Eight-six, one-five-one, nine-oh-nine-nine, oh-nine-eight-seven

        0086-151 909 90987

        Eight-six, one-five-one, nine-oh-nine, nine-oh-nine-eight-seven

        Service number

        110

        One-one-oh

        • Common service numbers such as 110 are supported.

        • Ten-digit service numbers starting with 400 or 800 are supported, which are separated in 3-3-4 mode.

        • Sixteen-digit service numbers starting with 12530, 17951, and 12593 are supported.

        95566

        Nine-five-five-six-six

        4008110510

        Four hundred, eight-one-one, oh-five-one-oh

        800-810-8888

        Eight hundred, eight-one-oh, eight-eight-eight-eight

        1253013520638377

        One-two-five-three-oh, one-three-five, two-oh-six-three, eight-three-seven-seven

        Others

        (86)(21)9899-80800-0909

        Eight-six, two-one, nine-eight-nine-nine, eight-oh-eight-oh-oh, oh-nine-oh-nine

        The numeric string and delimiters are supported. The delimiters can be parentheses and hyphen (-).

      • address

        Format

        Example

        Output

        Description

        Common address format

        30-9, Jiayuan, Yuanhe Town

        Thirty hyphen nine, Jiayuan, Yuanhe Town

        Common address formats are supported. The address indicates the standard post address.

        No. 1107-1108, Lane 388, Shitai Road

        No. one-one-zero-seven to one-one-zero-eight, Lane three-eight-eight, Shitai Road

        3-1-3205, Jinyunfu, Phase 6, Huarun 24 City

        Three hyphen one hyphen three-two-zero-five, Jinyunfu, Phase six, Huarun Twenty-four City

        Room 2006, Building 2, Shenghua Mingdu Building

        Room two thousand six, Building two, Shenghua Mingdu Building

        Room 201, Unit 4, Building 5, Wuchang Street Courtyard

        Room two hundred one, Unit four, Building five, Wuchang Street Courtyard

        No. 19, Lane 150, Furong River Road

        No. nineteen, Lane one hundred fifty, Furong River Road

      • id

        Format

        Example

        Output

        Description

        String

        dell0101

        D E L L zero one zero one

        Uppercase and lowercase letters, digits 0 to 9, and underscores (_) are supported.

        The output space indicates that a break is inserted between characters, and characters are read one by one.

        myid_1998

        M Y I D underscore one nine nine eight

        AiTest

        A I T E S T

      • characters

        Format

        Example

        Output

        Description

        String

        ISBN 1-001-099098-1

        I S B N one hyphen zero zero one hyphen zero nine nine zero nine eight hyphen one

        Chinese characters, uppercase and lowercase letters, digits 0 to 9, and some full-width and half-width characters are supported.

        The output space indicates that a break is inserted between characters, and characters are read one by one. If the text in the tag contains special XML characters, escape the characters. The following common special XML characters are supported:

        &lt;
        &gt;
        &amp;
        &quot;
        &apos;

        They correspond to the angle brackets (< and >), ampersand (&), double quotation mark ("), and apostrophe ('), respectively.

        x10b2345_u

        x one zero b two three four five underscore u

        v1.0.1

        v one point zero point one

        Version 2.0

        Version two point zero

        Su M MA000

        Su M M A zero zero zero

        Airbus A330

        Airbus A three three zero

        Models s01, s02, and s03

        Modes s zero one s zero two and s zero three

        Airbus A330

        Airbus A three three zero

        αβγ

        Alpha beta gamma

      • punctuation

        Format

        Example

        Output

        Description

        Punctuation mark

        Ellipsis

        Common Chinese and English punctuation marks are supported. The output space indicates that a break is inserted between characters, and characters are read one by one.

        If the text in the tag contains special XML characters, escape the characters. The following common special XML characters are supported:

        &lt;
        &gt;
        &amp;
        &quot;
        &apos;

        They correspond to the angle brackets (< and >), ampersand (&), double quotation mark ("), and apostrophe ('), respectively.

        ……

        Ellipsis

        !" #$%&

        Exclamation point, double quotation mark, number sign, dollar sign, percent sign, and ampersand

        '()*+

        Apostrophe, left parenthesis, right parenthesis, asterisk, and plus sign

        ,-. /:;

        Comma, hyphen, period, forward slash, colon, and semicolon

        <=>? @

        Less than, equal sign, greater than, question mark, and at sign

        [\]^_

        Left square bracket, backslash, right square bracket, caret, and underscore

      • date

        Format

        Example

        Output

        Description

        Year

        71

        Nineteen seventy-one

        Two-digit and four-digit years are supported. Note the following rules on different formats:

        • Two-digit years range from 60 to 99, 00 to 09, and 10 to 19.

        • Four-digit years range from 1000 to 1999 and 2000 to 2099.

        04

        Two thousand and four

        19

        Two thousand and nineteen

        1011

        One thousand and eleven

        1998

        Nineteen ninety-eight

        2008

        Two thousand and eight

        Year and month

        April, 98

        April, nineteen ninety-eight

        The months from January to September can be represented by a number with or without a zero. For example, in April 1908, April can be represented by 4 or 04.

        April 1998

        April, nineteen ninety-eight

        August, 08

        August, two thousand and eight

        August 2008

        August, two thousand and eight

        Year, month, and day

        April 23, 98

        April twenty-third, nineteen ninety-eight

        The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, on April eighth, 1908, April can be represented by 4 or 04, and eighth can be represented by 8 or 08.

        April 23, 1998

        April twenty-third, nineteen ninety-eight

        August 8, 08

        August eighth, two thousand and eight

        August 08, 2008

        August eighth, two thousand and eight

        Year, month, and day

        April 23, 98

        April twenty-third, nineteen ninety-eight

        The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, on April eighth, 1908, April can be represented by 4 or 04, and eighth can be represented by 8 or 08.

        April 23, 1998

        April twenty-third, nineteen ninety-eight

        August 8, 08

        August eighth, two thousand and eight

        August 08, 2008

        August eighth, two thousand and eight

        Month and day

        March 20

        March twentieth

        None.

        August 07

        August seventh

        Abbreviation of year and month

        2018/08

        August, two thousand and eighteen

        The forward slash (/), hyphen (-), and period (.) can be used as the delimiters.

        2018-08

        August, two thousand and eighteen

        2018.08

        August, two thousand and eighteen

        Abbreviation of year, month, and day

        2018/08/08

        August eighth, two thousand and eighteen

        2018-8-8

        August eighth, two thousand and eighteen

        2018.08.08

        August eighth, two thousand and eighteen

        Date range

        September 1~30, 04

        September first to thirtieth, two thousand and four

        The tilde (~) and hyphen (-) can be used to indicate the range.

        September 01, 2004-June 08, 2008

        September first, two thousand and four to June eighth, two thousand and eight

        Date range

        September 1~30, 04

        September first to thirtieth, two thousand and four

        September 01, 2004-June 08, 2008

        September first, two thousand and four to June eighth, two thousand and eight

        Date range

        April, 01~April, 10

        April, two thousand and one to April, two thousand and ten

        April 2001~April 2010

        April, two thousand and one to April, two thousand and ten

        Date range

        October 1~October 7

        October first to October seventh

        October 01~October 07

        October first to October seventh

        Date range

        October 1~7

        October first to seventh

        October 01~07

        October first to seventh

        Abbreviation of date range

        2018/03/03~2019/01/01

        March third, two thousand and eighteen to January first, two thousand and nineteen

        The forward slash (/) and period (.) can be used as the delimiters. The tilde (~) and hyphen (-) can be used to indicate the range.

        1997.9.9~1998.9.9

        September ninth, nineteen ninety-seven to September ninth, nineteen ninety-eight

        Abbreviation of date range

        10/20~10/31

        October twentieth to October thirty-first

        Date range

        Jan~Oct

        January to October

        January~October

        January to October

        Abbreviation of month, day, and year

        10/20/2018

        October twentieth, two thousand and eighteen

        Only four-digit years are supported, only the forward slash (/) can be used as the delimiter, and only the format of month/day/year is supported.

      • time

        Format

        Example

        Output

        Description

        Time

        12:00

        Twelve o'clock

        Common time and time range formats are supported.

        12:00:00

        Twelve o'clock

        10:20

        Ten twenty

        10:20:30

        Ten twenty and thirty seconds

        09:18:14

        Nine eighteen and fourteen seconds

        Time range

        11:00~12:00

        Eleven o'clock to twelve o'clock

        09:00-14:00

        Nine o'clock to fourteen o'clock

        11:00~11:30

        Eleven o'clock to eleven thirty

        11:00-12:18

        Eleven o'clock to twelve eighteen

        10:30~11:00

        Ten thirty to eleven o'clock

        09:28-10:00

        Nine twenty-eight to ten o'clock

        10:20~11:20

        Ten twenty to eleven twenty

        06:00~08:00

        Six o'clock to eight o'clock

        10:20 a.m.~1:30 p.m.

        Ten twenty AM to one thirty PM

        Abbreviation of time

        5:00am

        Five o'clock AM

        5:30am

        Five thirty AM

        5:20:12am

        Five twenty and twelve seconds AM

        7:00am

        Seven o'clock AM

        7:30AM

        Seven thirty AM

        7:20:12a.m.

        Seven twenty and twelve seconds AM

        07:08:12A.M.

        Seven eight and twelve seconds AM

        5:00pm

        Five o'clock PM

        5:30PM

        Five thirty PM

        5:20:12p.m.

        Five twenty and twelve seconds PM

        05:09:12P.M.

        Five nine and twelve seconds PM

        9:00pm

        Nine o'clock PM

        9:30pm

        Nine thirty PM

        9:20:12PM

        Nine twenty and twelve seconds PM

        9:02:12P.M.

        Nine two and twelve seconds PM

        12:00pm

        Twelve o'clock PM

        12:30p.m.

        Twelve thirty PM

        12:20:12PM

        Twelve twenty and twelve seconds PM

      • currency

        Format

        Example

        Output

        Description

        Number + currency identifier

        12.00RMB

        Twelve yuan

        The following currency identifiers are supported: AUD, CAD, HKD, JPY, USD, CHF, NOK, SEK, GBP, RMB, CNY, and EUR.

        The supported number formats include the integer, decimal, and international expressions separated with commas (,).

        12.50RMB

        Twelve point five yuan

        12,000,000RMB

        Twelve million yuan

        12,000,000.00RMB

        Twelve million yuan

        12,000.35RMB

        Twelve thousand point thirty-five yuan

        Currency sign + number

        $12

        Twelve dollars

        The following currency signs are supported: Canadian dollar (CAD), dollar sign ($), franc (Fr), Swedish krona (kr), pound sign (£), yen sign (¥), yuan sign (¥), and euro sign (€).

        The supported number formats include the integer, decimal, and international expressions separated with commas (,).

        $12.00

        Twelve dollars

        $12.12

        Twelve point twelve dollars

        $12,000

        Twelve thousand dollars

        $12,000.00

        Twelve thousand dollars

        $12,000.99

        Twelve thousand point ninety-nine dollars

        Other default readings

        1213

        One thousand two hundred and thirteen

        None.

        1213KML

        One thousand two hundred and thirteen K M L

        1213.00KML

        One thousand two hundred and thirteen K M L

        1213.9KML

        One thousand two hundred and thirteen point nine K M L

        1,000KML

        One thousand K M L

        1,000.00KML

        One thousand K M L

        1,000.98KML

        One thousand point ninety-eight K M L

        12,000

        Twelve thousand

      • measure

        Format

        Example

        Output

        Description

        Number + unit

        2 pieces

        Two pieces

        Common Chinese units and unit abbreviations are supported.

        120 hectares

        One hundred and twenty hectares

        More than 100 milligrams

        More than one hundred milligrams

        About 100 meters

        About one hundred meters

        More than 100 persons

        More than one hundred persons

        1 centimeter 20 millimeters

        One centimeter twenty millimeters

        120.00 square kilometers

        One hundred and twenty square kilometers

        Number + unit abbreviation

        120.56cm²

        One hundred twenty point fifty-six square centimeters

        120 m² 56 cm²

        One hundred twenty square meters fifty-six square centimeters

        100m12cm6mm

        One hundred meters twelve centimeters six millimeters

        Scope

        10~15kg

        Ten to fifteen kilograms

        10.24~789.82 Mu

        Ten point twenty-four to seven hundred eighty-nine point eighty-two Mu

        10 meters~15 meters

        Ten meters to fifteen meters

        10.24cm~19.08cm

        Ten point twenty-four centimeters to nineteen point zero eight centimeters

        Number + unit + "/" + unit

        CNY 10/kg

        Ten yuan per kilogram

        CNY 199~299/piece

        One hundred and ninety-nine yuan to two hundred and ninety-nine yuan per piece

        CNY 299.99/g~CNY 399.99/g

        Two hundred ninety-nine point ninety-nine yuan to three hundred ninety-nine point ninety-nine yuan per gram

        Other default readings

        12 bunches

        Twelve bunches

        30rm

        Thirty reams

        400,000,000 fellows

        Four hundred million fellows

        12.897 micrograms

        Twelve point eight nine seven micrograms

      The following table describes the characters that the <say-as> tag supports.

      Punctuation mark

      Reading

      !

      Exclamation point

      "

      Double quotation mark

      #

      Number sign

      $

      dollar

      %

      Percent sign

      &

      and

      '

      Apostrophe

      (

      Left parenthesis

      )

      Right parenthesis

      *

      Asterisk

      +

      Plus sign

      ,

      Comma

      -

      Hyphen

      .

      Period

      /

      Forward slash

      :

      Colon

      ;

      Semicolon

      <

      Less than

      =

      Equal sign

      >

      Greater than

      ?

      Question mark

      @

      at

      [

      Left square bracket

      \

      Backslash

      ]

      Right square bracket

      ^

      Caret

      _

      Underscore

      `

      Grave accent

      {

      Left brace

      |

      Vertical bar

      }

      Right brace

      ~

      Tilde

      !

      Exclamation point

      Left double quotation mark

      Right double quotation mark

      Left single quotation mark

      Right single quotation mark

      Left parenthesis

      Right parenthesis

      Comma

      Period

      En dash

      Colon

      Semicolon

      Question mark

      Enumeration comma

      Ellipsis

      ……

      Ellipsis

      Left title mark

      Right title mark

      Yuan sign

      Greater than or equal to

      Less than or equal to

      Not equal to

      Approximation

      ±

      Plus-minus sign

      ×

      Multiplication sign

      π

      Pi

      Α

      Alpha

      Β

      Beta

      Γ

      Gamma

      Δ

      Delta

      Ε

      Epsilon

      Ζ

      Zeta

      Ε

      Eta

      Θ

      Theta

      Ι

      Iota

      Κ

      Kappa

      Lambda

      Μ

      Mu

      Ν

      Nu

      Ξ

      Xi

      Ο

      Omicron

      Pi

      Ρ

      Rho

      Sigma

      Τ

      Tau

      Υ

      Upsilon

      Φ

      fai

      Χ

      Chi

      Ψ

      Psi

      Ω

      Omega

      α

      Alpha

      β

      Beta

      γ

      Gamma

      δ

      Delta

      ε

      Epsilon

      ζ

      Zeta

      η

      Eta

      θ

      Theta

      ι

      Iota

      κ

      Kappa

      λ

      Lambda

      μ

      Mu

      ν

      Nu

      ξ

      Xi

      ο

      Omicron

      π

      Pi

      ρ

      Rho

      σ

      Sigma

      τ

      Tau

      υ

      Upsilon

      φ

      fai

      χ

      Chi

      ψ

      Psi

      ω

      Omega

      The following table describes the measurement units that the <say-as> tag supports.

      Format

      Category

      Example

      Abbreviation
      Length
      nm, μm, mm, cm, m, km, ft, and in
      Area
      cm², m², km², and SqFt
      Volume
      cm³, m³, km³, mL, L, and gallon

      Weight

      μg, mg, g, and kg

      Time

      min, sec, and ms

      Electromagnet

      μA, mA, Ω, Hz, kHz, MHz, GHz, V, kV, and kWh

      Voice

      dB

      Pressure
      Pa, kPa, and Mpa
      Other units
      Other units are supported, such as meter, second, USD, and milliliters per bottle. Quantifiers are also supported, such as rack, head, piece, and basin.

    • Tag relationship

      The <say-as> tag can contain only text.

    • Examples

      • cardinal

        <speak>
          <say-as interpret-as="cardinal">12345</say-as>
        </speak>

        Synthesis result: SSML-say-as_Cardinal.mp3

      • digits

        <speak>
          <say-as interpret-as="digits">12345</say-as>
        </speak>

        Synthesis result: SSML-say-as_digit.mp3

      • telephone

        <speak>
          <say-as interpret-as="telephone">12345</say-as>
        </speak>

        Synthesis result: SSML-say-as_Telephone.mp3

      • name

        <speak>
          She once used <say-as interpret-as="name"Zeng Xiaofan as her full name</say-as>
        </speak>

        Synthesis result: SSML-say-as_Name.mp3

      • address

        <speak>
          <say-as interpret-as="address">Room 304, Unit 3, Building 1, Fulu International District</say-as>
        </speak>

        Synthesis result: SSML-say-as_Address.mp3

      • id

        <speak>
          <say-as interpret-as="id">myid_1998</say-as>
        </speak>

        Synthesis result: SSML-say-as_id.mp3

      • characters

        <speak>
          <say-as interpret-as="characters">The Greek letters α and β</say-as>
        </speak>

        Synthesis result: SSML-say-as_characters.mp3

      • punctuation

        <speak>
          <say-as interpret-as="punctuation"> -./:;</say-as>
        </speak>

        Synthesis result: SSML-say-as_punctuation.mp3

      • date

        <speak>
          <say-as interpret-as="date">1000-10-10</say-as>
        </speak>

        Synthesis result: SSML-say-as_date.mp3

      • time

        <speak>
          <say-as interpret-as="time">5:00am</say-as>
        </speak>

        Synthesis result: SSML-say-as_time.mp3

      • currency

        <speak>
          <say-as interpret-as="currency">13,000,000.00RMB</say-as>
        </speak>

        Synthesis result: SSML-say-as_currency.mp3

      • measure

        <speak>
          <say-as interpret-as="measure">100m12cm6mm</say-as>
        </speak>

        Synthesis result: SSML-say-as_measure.mp3

    Comprehensive example

    The following example shows how to use SSML in detail. You can copy the following code to the Project Settings page in the Intelligent Speech Interaction console to test the effect. Synthesis result: Comprehensive example.mpe

    <speak>
      Legend has it that in the Northern Song Dynasty
      <say-as interpret-as="date">October 10, 1121</say-as>,
      <say-as interpret-as="address">eager shoppers and vendors</say-as>
      used to gather outside Kaifeng City on the morning of
      <sub alias="双十一">Double 11</sub>
      . As a train of mules carrying goods entered the city gate on one of these mornings,
      <soundEvent src="http://nls.alicdn.com/sound-event/bell.wav"/>
      a pretty girl with fair skin
      <phoneme alphabet="py" ph="de5">stopped</phoneme>
      a lad named <say-as interpret-as="name">A Fa in the front of the crowds. </say-as>
    </speak>
    
    <speak voice="xiaomei">
      "Dear, special offer for today:
      <say-as interpret-as="digits">199</say-as>
      cash back for spending
      <say-as interpret-as="cardinal">199</say-as>.
      Do not miss it."
    </speak>
    
    <speak voice="sicheng" rate="150">
       "Not today, hurrying to restock. It is
      <say-as interpret-as="time">09:59:59</say-as>
      now. Any later, and the supply chain would be broken."
    </speak>
    
    <speak>
      <say-as interpret-as="name">Wiping away his sweat, A Fa</say-as>
      led the train of mules through the busy streets with various shouts of peddlers into his ears.
    </speak>
    
    <speak voice="ninger" rate="200">
      On-site cloth dyeing with chic color and design. Buy two feet, get one foot free;
    </speak>
    
    <speak voice="xiaobei">
      Best-seller gauze cap. Return of goods without reasons within seven days;
    </speak>
    
    <speak voice="sijia">
      Special treatment for adults and children. Improve the body conditions of men and women and treat incurable and complicated diseases.
    </speak>
    
    <speak>
      Suddenly, a horse, somehow startled, rushed along the road neighing.
      <soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>
      Then, a frightened kid tottered to his parent,
      <break time="50ms"/>shouting:
    </speak>
    
    <speak voice="sitong" rate="150">
      "Mom, mom!"
    </speak>
    
    <speak>
      Seeing this,
      <say-as interpret-as="name">A Fa</say-as>
      silently cursed:
    </speak>
    
    <speak effect="robot" pitch="-100">
      "My poor heart!"
    </speak>
    
    <speak>
      He pressed his
      <phoneme alphabet="py" ph="he2 bao1">wallet tightly to himself</phoneme>
      and went on delivering goods. The sight of the
      <say-as interpret-as="address">prosperous Kaifeng City</say-as>
      left
      <say-as interpret-as="name">A Fa</say-as>
      a deep impression.
    </speak>
    
    <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-200">
      As the night fell, the busy streets turned quiet. In a fit of joy, he picked up a painting brush and drew a long scroll. He named it Riverside Scene at Qingming Festival.
    </speak>