All Products
Search
Document Center

AI Guardrails:Large model speech moderation

Last Updated:May 01, 2026

This topic introduces the voice moderation service, which uses the Tongyi large model to moderate audio files and live audio streams.

1. Service

Voice Moderation (Large Model Edition) is powered by the Qwen Moderation Large Model, trained specifically to identify content risks in audio. It provides the following voice moderation services:

  • Audio and Video Media Detection_LLM Edition:

    This service moderates audio files. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.

  • Social & Entertainment Live Stream Detection_LLM Version:

    This service moderates live audio streams. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.

Service selection

Service

Description

Supported regions

Typical scenarios

Audio and Video Media Detection_LLM Edition (audio_detection_byllm_global)

  • Powered by the Tongyi Large Model.

  • Delivers superior performance, making it ideal for scenarios requiring high-accuracy audio file moderation.

Singapore

  • Ideal for various audio and video media, such as general videos and audiobooks.

  • Uses Qwen ASR to better recognize multiple Chinese dialects (such as Cantonese and Sichuanese) and various other languages (such as English, German, Korean, Japanese, Russian, French, Portuguese, and Vietnamese).

  • Uses a text moderation large model to better identify content such as ideological themes, political sentiment, and metaphors.

Social & Entertainment Live Stream Detection_LLM Version (live_detection_byllm_global)

  • Powered by the Tongyi Large Model.

  • Delivers superior performance, making it ideal for scenarios requiring high-accuracy live audio stream moderation.

Singapore

  • Ideal for social live streaming scenarios, which typically feature a single speaker and often have background music.

  • Uses a large model to improve accuracy in complex audio environments, reducing interference from dialects, accents, and background noise.

  • Uses a text moderation large model to focus on identifying risks such as pornographic content, abusive language, innuendo, and political sentiment. It can also detect periods of silence in the live stream.

Billing

The large model services for Voice Moderation V2.0 support two billing methods: pay-as-you-go and .

Pay-as-you-go

After you activate the Voice Moderation V2.0 service, pay-as-you-go is the default billing method. You are billed daily for your actual usage. You incur no fees if you do not call the service.

Moderation type

Service

Unit price

Voice Moderation Advanced (audio_advanced)

  • Audio and Video Media Detection_LLM Edition: audio_detection_byllm_global

  • Social & Entertainment Live Stream Detection_LLM Version: live_detection_byllm_global

USD 18 per 1,000 minutes

Note

We bill based on the total duration of audio processed. For example, if you use the Audio and Video Media Detection_LLM Edition service to process 100 minutes of audio, we charge you USD 1.80.

Note

The pay-as-you-go billing frequency for Content Moderation V2.0 is once every 24 hours. In the billing details, moderationType corresponds to the moderation type field. You can view the billing details.

Get started

Step 1: Activate the service

Go to the activation page to activate the Voice Moderation 2.0 service.

After you activate the Voice Moderation 2.0 service, the default billing method is pay-as-you-go. You are billed based on your actual usage, and you incur no fees if you do not use the service. After you integrate and use the API, the system automatically generates bills based on your usage. For more information, see Billing. You can also purchase a resource package. Resource packages offer tiered discounts compared with the pay-as-you-go method and are suitable for users with predictable or high usage.

Step 2: Grant permissions to a RAM user

Before you access the SDK or API, you must grant permissions to a RAM user. To authenticate API calls, use an AccessKey from your Alibaba Cloud account or a RAM user. For more information about how to obtain an AccessKey, see Obtain an AccessKey.

  1. Log in to the RAM console with your Alibaba Cloud account or as a RAM administrator.

  2. Create a RAM user. For more information, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. For more information, see Grant permissions to a RAM user. You can then use the RAM user to call Content Moderation APIs.

Step 3: Install and integrate the SDK

See the Voice Moderation 2.0 SDK and Integration Guide for installation and integration instructions. The service is available in the following regions:

Region

Public endpoint

VPC endpoint

Services

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

audio_detection_byllm_global, live_detection_byllm_global

Step 4: Adjust moderation rules (Optional)

In the Content Moderation console, you can adjust detection rules for the Voice Moderation large model, manage text moderation rules, replicate a service, configure a custom dictionary, query detection records, and check usage.

API reference

Usage notes

You can call this API to create audio content moderation tasks. For instructions on how to construct an HTTP request, see Making native HTTP calls. You can also use an SDK to call the API. For more information, see the SDK and integration guide for audio moderation (enhanced edition) V2.0.

  • API operations:

    • Submit a moderation task: VoiceModeration

    • Query a moderation task: VoiceModerationResult

    • Cancel a moderation task: VoiceModerationCancel

  • Billing:

    This is a paid API. You are charged only for requests that return an HTTP status code of 200. You are not charged for requests that result in other error codes. For more information, see Billing.

  • Service performance:

    Service performance

    Description

    Audio file size

    The audio moderation (enhanced edition) supports audio files up to 500 MB.

    Audio and video file formats

    Supported audio file formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR.

    Supported video file formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.

    Live audio stream

    Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP.

    QPS

    The QPS limit for task submission is 100.

    Concurrent streams

    The enhanced edition supports 50 concurrent streams by default.

Submit a moderation task

Request parameters

Parameter

Type

Required

Example value

Description

Service

String

Yes

audio_detection_byllm_global

The moderation service to use. Valid values:

  • audio_detection_byllm_global: Audio and video moderation (LLM-based)

  • live_detection_byllm_global: Live stream moderation for social entertainment (LLM-based)

ServiceParameters

JSONString

Yes

The parameters required by the moderation service, formatted as a JSON string. For a description of each parameter, see ServiceParameters.

Table 1. ServiceParameters

Parameter

Type

Required

Example value

Description

url

String

Yes. You must provide the file by using one of the following three methods:

  • Provide the URL of the file in the url parameter.

  • Provide the details of an OSS object in the ossBucketName, ossObjectName, and ossRegionId parameters.

  • Upload a local audio file. This method does not consume your OSS storage and the file is temporarily stored for 30 minutes. The SDK has a built-in feature for local file uploads. For code examples, see Audio Moderation Enhancement V2.0 SDK and Integration Guide.

http://aliyundoc.com/test.mp3

The URL of the object to moderate. Supports public HTTP and HTTPS URLs.

ossBucketName

String

bucket_01

The name of the authorized OSS bucket.

Note

To use an internal OSS URL, you must first use your Alibaba Cloud account (root account) to grant access on the Cloud Resource Access Authorization page.

ossObjectName

String

20240307/07/28/test.mp3

The name of the object in the authorized OSS bucket.

ossRegionId

String

cn-shanghai

The region where the OSS bucket is located.

callback

String

No

http://aliyundoc.com

The callback URL where moderation results are sent. HTTP and HTTPS are supported. If you omit this parameter, you must use polling to retrieve the results.

The callback endpoint must support the POST method, UTF-8 encoded data, and the form parameters checksum and content.

Content Security sets the checksum and content parameters and sends the results to your callback endpoint as follows.

  • checksum: A string that is generated by applying the SHA256 algorithm to the concatenated string of UID + seed + content. The UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud console. To prevent tampering, you can verify the push result by generating a string with the same algorithm and comparing it to the received checksum.

    Note

    The user UID must be the ID of your Alibaba Cloud account, not the ID of a RAM user.

  • content: A string in JSON format. You must parse the string to convert it back into a JSON object. For an example of the content result, see the response example for querying detection results.

Note

Your callback endpoint must return an HTTP status code of 200 after receiving a notification from Content Security to confirm receipt. Any other status code is considered a failure. If receipt fails, Content Security retries sending the notification up to 16 times. If all 16 retries fail, Content Security makes no further attempts. We recommend that you check the status of your callback endpoint.

seed

String

No

abc****

A random string used in the signature for the callback request.

It can consist of letters, digits, and underscores (_), with a maximum length of 64 characters. You can customize this value to verify that callback notifications originate from Content Security.

Note

This parameter is required when the callback parameter is specified.

cryptType

String

No

SHA256

When using a callback, this parameter specifies the algorithm for generating the signature of the callback content. Content Security calculates a signature for the result (a concatenated string of user UID + seed + content) by using the specified algorithm and sends it to your callback URL. Valid values:

  • SHA256 (default): Uses the SHA256 encryption algorithm.

  • SM3: Uses the HMAC-SM3 algorithm. The result is a hexadecimal string that consists of lowercase letters and digits. For example, abc encrypted with SM3 returns 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0.

liveId

String

No

liveId1****

The ID of the live audio stream.

This parameter prevents repeated moderation of the same live stream through deduplication. If provided, Content Security checks for an active moderation task by using the composite key uid+service+liveId. If an active task exists, the service returns its taskId instead of creating a new task.

dataId

String

No

voice20240307***

A custom data ID for the object being moderated.

This ID uniquely identifies your business data. The ID can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The maximum length is 64 characters.

referer

String

No

www.aliyun.com

The Referer request header. Used for purposes such as hotlink protection. The maximum length is 256 characters.

Response parameters

Parameter

Type

Example value

Description

Code

Integer

200

The status code. For more information, see Status Codes.

Data

JSONObject

{"TaskId": "AAAAA-BBBBB","DataId": "voice20240307***"}

The response data.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Examples

  • Request example

{
    "Service": "audio_detection_byllm_global",
    "ServiceParameters": {
        "cryptType": "SHA256",
        "seed": "abc***123",
        "callback": "https://aliyun.com/callback",
        "url": "http://aliyundoc.com/test.mp3"
    }
}
  • Successful response example

{
    "Code": 200,
    "Data": {
        "TaskId": "AAAAA-BBBBB",
        "DataId": "voice20240307***"
    },
    "Message": "SUCCESS",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query task results

When a live stream moderation task is in progress, a query returns the N most recent audio slices. After the task is complete, the query returns all audio slices.

  • To query a moderation task, call the VoiceModerationResult operation.

  • Billing: This API operation is not billed.

  • Query timeout: Query the results 30 seconds after submitting an asynchronous moderation task. The service stores results for up to 24 hours before automatically deleting them.

QPS limit

This operation has a QPS limit of 100 for each user. API calls exceeding this limit are throttled, which may affect your business. We recommend calling this operation at a reasonable rate.

Request parameters

Parameter

Type

Required

Example

Description

Service

String

Yes

audio_detection_byllm

The type of moderation service.

ServiceParameters

JSONString

Yes

The required service parameters, provided as a JSON string. For details, see ServiceParameters.

Table 2. ServiceParameters

Parameter

Type

Required

Example

Description

taskId

String

Yes

AAAAA-BBBBB

The ID of the task, returned upon task submission.

Response parameters

Parameter

Type

Example

Description

Code

Integer

200

The status code. For more information, see Status codes.

Data

JSONObject

The results of the audio content moderation. For more information, see Data.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 3. Data

Parameter

Type

Example

Description

Url

String

https://aliyundoc.com/text.mp3

The URL of the moderated object.

LiveId

String

liveId1****

The ID of the live audio stream (optional).

DataId

String

voice20240307***

The data ID of the moderated object (optional).

RiskLevel

String

high

The combined risk level of all audio slices. Valid values are:

  • high: high-risk

  • medium: medium-risk

  • low: low-risk

  • none: no risk detected

Note

Address high-risk content immediately and manually review medium-risk content. For standard use cases, low-risk content can be treated as safe. Process low-risk content only if a high recall rate is required.

SliceDetails

JSONArray

An array of objects with detailed results for each audio slice. For more information, see sliceDetails.

Table 4. SliceDetails

Parameter

Type

Example

Description

StartTime

Integer

0

The start time of the sentence, in seconds.

EndTime

Integer

4065

The end time of the sentence, in seconds.

StartTimestamp

Integer

1678854649720

The start timestamp of the audio slice, in milliseconds.

EndTimestamp

Integer

1678854649720

The end timestamp of the audio slice, in milliseconds.

Text

String

Disgusting

The text transcribed from the audio.

Url

String

https://aliyundoc.com

If the moderated content is from a live audio stream, this parameter provides a temporary URL to the corresponding audio segment. The URL is valid for 30 minutes. Save the content promptly if needed.

RiskLevel

String

high

The risk level of the audio slice. Valid values are:

  • high: high-risk

  • medium: medium-risk

  • low: low-risk

  • none: no risk detected

Result

JSONArray

An array of objects that provides the moderation results. For more information, see Result.

Table 5. Result

Parameter

Type

Example

Description

Label

String

political_entity

The moderation label assigned to the content. The service may detect multiple labels. This includes labels from the text policy and labels specific to audio:

  • For text policy labels, see Risk Labels.

  • Audio-specific labels include:

    • specified_nontalk: silent audio

    • specified_speaking: specified speaking

    • sexual_sounds: sexual sounds

Note

By default, the detection of specified speaking and sexual sounds is disabled. To enable this feature, contact your account manager.

Description

String

Suspected pornographic content

A description of the Label.

Important

This field provides an explanation of the Label and is subject to change. When processing results, use the Label value instead of this description.

Confidence

Float

81.22

The confidence score, a value from 0 to 100 with up to two decimal places. Some labels do not have a confidence score.

RiskLevel

String

high

The risk level of the current label. Valid values are:

  • high: high-risk

  • medium: medium-risk

  • low: low-risk

  • none: no risk detected

Riskwords

String

AA,BB,CC

Detected risk words, separated by commas. This parameter is not returned for all labels.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

If a keyword from a custom library is detected, the Label is customized. This parameter returns the name of the custom library and the detected keywords. For more information, see CustomizedHit.

RiskPositions

JSONArray

The positions of the detected risk words. For more information, see RiskPositions.

Table 6. CustomizedHit

Parameter

Type

Example

Description

LibName

String

CustomLibrary1

The name of the custom library.

Keywords

String

CustomKeyword1,CustomKeyword2

Detected custom keywords, separated by commas.

Table 7. RiskPositions

Parameter

Type

Example

Description

RiskWord

String

AA

The detected risk word.

StartPos

Integer

10

The start position of the risk word.

EndPos

Integer

12

The end position of the risk word.

Examples

Request example

{
    "Service": "audio_detection_byllm",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}

Successful response example

{
    "Code": 200,
    "Data": {
        "DataId": "voice20240307***",
        "LiveId": "liveId1****",
        "RiskLevel": "high",
        "SliceDetails": [
            {
                "StartTime": 0,
                "EndTime": 4065,
                "RiskLevel": "high",
                "Result": [
                    {
                        "Label": "political_entity",
                        "Description": "Suspected political entity",
                        "Confidence": 100.0,
                        "RiskLevel": "high",
                        "RiskWords": "WordA,WordB",
                        "RiskPositions": [
                            {
                                "EndPos": 14,
                                "RiskWord": "WordA",
                                "StartPos": 16
                            }
                        ]
                    }
                ],
                "Text": "Content Moderation product test case",
                "Url": "https://aliyundoc.com"
            }
        ]
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Cancel a moderation task

You can only cancel moderation tasks for live streams. Tasks for file types cannot be canceled.

  • To cancel a moderation task, call the VoiceModerationCancel operation.

  • Billing: This operation is free of charge.

Request parameters

Parameter

Type

Required

Example value

Description

Service

String

Yes

live_detection_byllm_global

Review the service type.

ServiceParameters

JSONString

Yes

The parameters for the moderation service, formatted as a JSON string. For details about each field, see ServiceParameters.

Table 5. ServiceParameters

Parameter

Type

Required

Example value

Description

taskId

String

Yes

AAAAA-BBBBB

The ID of the moderation task to cancel.

Response parameters

Parameter

Type

Example value

Description

Code

Integer

200

The response status code. For more information, see Code description.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The unique ID of the request.

Examples

  • Request example

{
    "Service": "live_detection_byllm_global",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}
  • Successful response example

{
    "Code": 200,
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback message format

A callback message is a JSON object with the following fields:

Parameter

Type

Description

checksum

String

A checksum generated by applying the SHA-256 algorithm to the concatenated string of uid + seed + content.

The UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud console. To prevent tampering, verify the message integrity upon receipt by generating a checksum with the same algorithm and comparing it with the received checksum value.

Note

The UID must be for your Alibaba Cloud account, not for a RAM user.

taskId

String

The ID of the associated task.

content

String

A JSON-formatted string containing the serialized detection results. You must parse this string to retrieve the JSON object. The format is identical to the response from a task result query. For more information, see Response Parameters.

Codes

This section describes the codes that the API returns. You are only billed for successful requests, which are indicated by a code of 200.

Code

Description

200

The request succeeded.

280

Detection is in progress.

400

A required request parameter is missing or empty.

401

A request parameter is invalid. Check the parameter value and try again.

402

A request parameter value exceeds the length limit. Check the parameter and try again.

403

The request rate exceeds the QPS limit. Check and adjust your QPS limit.

404

An error occurred while downloading the input file. Check the file URL and try again.

405

The file download timed out. This may be because the file is inaccessible. Check the file URL and your network settings, then try again.

406

The input file is too large. Use a smaller file and try again.

407

The input file format is not supported. Use a supported format and try again.

408

The account is not authorized to call this API. This can occur if the service has not been activated, the account has an overdue payment, or the account lacks the necessary permissions.

480

The number of concurrent requests exceeds the limit. Reduce your concurrency and try again.

500

A system error occurred.