All Products
Search
Document Center

Content Moderation:Voice Moderation 2.0 multilingual service

Last Updated:Dec 05, 2025

Voice Moderation 2.0 features an upgraded voice model that supports voice content in Chinese, English, and a mix of Chinese and English. It provides moderation policies and a tag system tailored for international business. This topic describes the features and usage of the Voice Moderation 2.0 multilingual service.

Features

Compared to Voice Moderation 1.0, Voice Moderation 2.0 uses a separate policy and tag system to meet the needs of international business. It also offers more features to simplify usage and assist with manual review.

Comparison item

Voice Moderation 2.0

Voice Moderation 1.0

Multilingual capabilities

  • The Singapore region supports 35 languages, including Chinese, English, Arabic, German, Russian, French, Korean, Japanese, Spanish, Italian, Indonesian, Vietnamese, Malay, Thai, Hindi, Türkiye, Portuguese, Dutch, Polish, Bengali, Persian, Swedish, Danish, Norwegian, Icelandic, Finnish, Belarusian, Lithuanian, Czech, Slovak, Hungarian, Greek, and Romanian.

  • The US (Virginia) region supports Chinese, English, and a mix of Chinese and English.

Supports only Chinese by default.

Moderation capabilities

  • Uses multiple parallel models with language and region-specific features for more precise policies.

  • Includes additional models such as a moaning detection model to identify non-semantic features.

  • Uses a single model with language-specific features to balance accuracy and recall.

  • Does not support the moaning detection model by default.

Tag system

Uses an international tag system with tags such as profanity and regional. This system supports multiple risk tags and sub-tags.

Uses a tag system designed for supports only a single risk tag.

API features

  • Uses an adjustable segmentation solution where voice is segmented into fixed-length clips. Fixed-length segments improve the efficiency of manual review.

  • Returns all voice segment content and transcribed text. Provides temporary URLs for voice segments to assist with manual review.

  • Uses a semantic sentence-based segmentation solution where voice segments range from a few seconds to tens of seconds.

  • Returns only voice segments with potential violations. Does not provide temporary URLs for voice segments by default.

Internationalized tags

The Voice Moderation 2.0 multilingual service uses an international tag system. If content contains multiple types of risks, the service can return multiple tags simultaneously. Tag categorizations include but are not limited to the following:

Tag type

Categorization

Primary tags (labels)

  • violence: Violence

  • contraband: Contraband

  • Sexual Content: Pornography

  • profanity: Profanity

  • pullinTraffic: Ad-driven traffic

  • regional: Regional conflict

  • C_customized: Hit in user-defined library

Sub-tags (riskTips)

Sub-tags are returned in the xxx_yyy format. For example: contraband_Drugs.

Service performance

Voice Moderation 2.0 uses a high-performance core engine that can schedule dozens of models and policies with high concurrency to ensure timely service.

Service performance

Description

File size

Version 2.0 increases the maximum supported voice file size from 200 MB to 500 MB.

Voice file format

Supported voice file formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR.

Supported video file formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.

Live voice stream

Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP.

Queries per second (QPS)

The queries per second (QPS) for submitting tasks is increased from 50 to 100.

Concurrent streams

Version 2.0 increases the default limit for concurrent streams from 20 to 50.

Note

In Voice Moderation, QPS refers to the number of requests that the API responds to per second. Concurrent streams refers to the number of voice files or voice streams being detected in the system simultaneously.

Billing information

The Voice Moderation 2.0 service supports pay-as-you-go billing methods.

Pay-as-you-go

After you activate the Voice Moderation 2.0 service, the default billing method is pay-as-you-go. You are charged daily based on your actual usage. If you do not use the service, you are not charged.

Moderation type

Supported business scenarios (services)

Unit price

Standard Voice Moderation (voice_standard)

  • Multilingual detection for voice and video media: audio_multilingual_global

  • Multilingual detection for social and entertainment live streams: stream_multilingual_global

USD 9.0 per 1,000 minutes

Access guide

Step 1: Activate the service

Go to Activate Service to activate the Voice Moderation 2.0 service.

Step 2: Grant permissions to a RAM user

Before you use the software development kit (SDK) or API, grant permissions to a RAM user. You can create an AccessKey pair for your Alibaba Cloud account or a RAM user. You must use an AccessKey pair for identity verification when you call Alibaba Cloud API operations. For more information, see Obtain an AccessKey pair.

Procedure

  1. Log on to the RAM console as a RAM administrator.

  2. Create a RAM user.

    For more information, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user.

    For more information, see Grant permissions to a RAM user.

    After completing the preceding operations, you can call the Content Moderation API as the RAM user.

Step 3: Install and integrate the SDK

The following regions are supported:

Region

Internet endpoint

VPC endpoint

Singapore

https://green-cip.ap-southeast-1.aliyuncs.com

https://green-cip-vpc.ap-southeast-1.aliyuncs.com

US (Virginia)

https://green-cip.us-east-1.aliyuncs.com

https://green-cip-vpc.us-east-1.aliyuncs.com

Note

If you need SDK sample code in other languages, you can use the online debugging tool in OpenAPI Developer Portal to debug the API operation. This tool automatically generates SDK sample code for the API operation.

API

Usage notes

Service endpoint: https://green-cip.{region}.aliyuncs.com.

You can call this operation to create an voice content moderation task. You can construct an HTTP request manually or use an SDK. For information about how to construct a manual HTTP request, see Make a raw HTTP call. For more information about SDKs, see voice Moderation 2.0 SDKs and integration guide.

  • API Operations:

    • Submit moderation task: VoiceModeration

    • Query moderation task result: VoiceModerationResult

  • Billing:

    This is a paid operation. You are charged only for requests that return an HTTP status code of 200. Requests that return other error codes are not charged.

Submit moderation task

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

udio_multilingual_global

The type of moderation service. Valid values:

  • audio_multilingual_global: Multilingual detection for voice and video media

  • stream_multilingual_global: Multilingual detection for social and entertainment live streams

ServiceParameters

JSONString

Yes

The parameter set required by the moderation service. This is a JSON string. For a description of each string, see ServiceParameters.

Table 1. ServiceParameters

Name

Type

Required

Example

Description

url

String

Yes

http://aliyundoc.com/test.flv

The URL of the object to be detected. This must be a public HTTP or HTTPS URL.

callback

String

No

http://aliyundoc.com

The URL to which the moderation result is sent as a callback notification. HTTP and HTTPS are supported. If you leave this field empty, you must periodically poll for the moderation result.

The callback interface must support the POST method, UTF-8 encoded data, and the form parameters checksum and content.

Content Moderation sets the checksum and content parameters and calls your callback interface to return the moderation result according to the following rules and format.

  • checksum: A string generated by concatenating user UID + seed + content and then applying the SHA256 algorithm. The user UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud Management Console. For tamper-proofing, you can generate a string using the same algorithm when you receive the pushed result and verify it against the checksum.

    Note

    The user UID must be the UID of your Alibaba Cloud account, not the UID of a RAM user.

  • content: A JSON string. Parse it into a JSON object. For an example of the content result, see the sample response for querying a moderation result.

Note

After your server's callback interface receives the result pushed by Content Moderation, if it returns an HTTP status code of 200, the receipt is successful. Any other HTTP status code is considered a failure. On failure, Content Moderation will retry pushing the result up to 16 times until it is successfully received. If it is still not received after 16 retries, the push is stopped. Check the status of your callback interface.

seed

String

No

abc****

A random string used for the signature in the callback notification request.

It can contain letters, digits, and underscores (_), and must not exceed 64 characters. You can customize this value to verify that the callback notification request is initiated by the Alibaba Cloud Content Moderation service.

Note

This field is required when using a callback.

cryptType

String

No

SHA256

When using a callback notification (callback), this sets the encryption algorithm for the notification content. Content Moderation encrypts the result (a string concatenated from user UID + seed + content) using your specified algorithm before sending it to your callback notification address. Valid values:

  • SHA256 (default): Uses the SHA256 encryption algorithm.

  • SM3: Uses the SM3 HMAC algorithm. It returns a hexadecimal string of lowercase letters and digits. For example, `abc` encrypted with SM3 returns `66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0`.

liveId

String

No

liveId1****

The ID of the live voice stream.

This parameter is used to deduplicate live voice stream tasks and prevent repeated moderation. If you pass this parameter, the system checks for an ongoing moderation task based on uid+service+liveId. If a task exists, the system returns the existing live moderation taskId instead of starting a new one.

dataId

String

No

voice20240307***

The data ID corresponding to the detected object.

It can consist of uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.), and must not exceed 64 characters. You can use it to uniquely identify your business data.

Return parameters

Name

Type

Example

Description

Code

Integer

200

The error code. This is consistent with the HTTP status code. For more information, see Code description.

Data

JSONObject

{"taskId": "AAAAA-BBBBB"}

The moderation result data.

Message

String

OK

The response message for the request.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Example

Request sample

{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"cryptType\":\"SHA256\",\"seed\":\"abc***123\",\"callback\":\"https://aliyun.com/callback\",\"url\":\"http://aliyundoc.com/test.flv"}"
}

Sample response

{
  "code":200,
  "data":{
    "taskId":"AAAAA-BBBBB"
  },
  "message":"SUCCESS",
  "requestId":"AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query task result

After the moderation task is complete, the query result returns data for all voice segments.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

audio_multilingual_global

The type of moderation service.

ServiceParameters

JSONString

Yes

The parameter set required by the moderation service. This is a JSON string. For a description of each string, see ServiceParameters.

Table 2. ServiceParameters

Name

Type

Required

Example

Description

taskId

String

Yes

AAAAA-BBBBB

The ID returned when the task was submitted.

Return parameters

Name

Type

Example

Description

Code

Integer

200

The error code. This is consistent with the HTTP status code. For more information, see Code description.

Data

JSONObject

{"url":xxxx,"results":xxx}

The returned parameters in JSON format.

Message

String

OK

The response message for the request.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 3. Data

Name

Type

Example

Description

url

String

https://aliyundoc.com

The URL of the detected object.

LiveId

String

liveId1****

The ID of the live voice stream (optional).

DataId

String

voice20240307***

The data ID corresponding to the detected object (optional).

RiskLevel

String

high

The risk level of the voice, calculated based on all voice segments. Return values include the following:

  • high: High risk

  • medium: Medium risk

  • low: Low risk

  • none: No risk detected

Note

High-risk content should be handled directly. Medium-risk content should be manually reviewed. Low-risk content should be handled only when high recall is required. Otherwise, treat it the same as content with no risk detected.

sliceDetails

JSONArray

The detailed results for the voice segments. For more information, see sliceDetails.

Table 4. sliceDetails

Name

Type

Example

Description

startTime

Integer

0

The start time of the sentence, in seconds.

endTime

Integer

4065

The end time of the sentence, in seconds.

startTimestamp

Integer

1678854649720

The start timestamp of the segment, in milliseconds.

endTimestamp

Integer

1678854649720

The end timestamp of the segment, in milliseconds.

text

String

disgusting

The text converted from the voice.

url

String

https://aliyundoc.com

A temporary URL for the voice segment. The URL is valid for 30 minutes. Store it promptly.

labels

String

pullinTraffic

The tags, separated by commas (,). Includes:

  • Violence: Terrorism

  • contraband: Contraband

  • Sexual content: Pornography

  • profanity: Profanity

  • pullinTraffic: Ad-driven traffic

  • regional: Regional conflict

  • C_customized: Hit in user-defined library

RiskLevel

String

high

The risk level of the voice segment. Return values include the following:

  • high: High risk

  • medium: Medium risk

  • low: Low risk

  • none: No risk detected

riskWords

String

AAA,BBB,CCC

The risk words that were hit, separated by commas.

riskTips

String

sexuality_Suggestive

The sub-tags, separated by commas.

extend

String

{\"riskTips\":\"sexuality_Suggestive\",\"riskWords\":\"pxxxxy\"}

A reserved field.

Example

Request sample

{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"taskId\":\"AAAAA-BBBBB"}"
}

Sample response

{
    "Code": 200,
    "Data": {
        "DataId": "voice20240307***",
        "LiveId": "liveId1****",
        "RiskLevel": "high",
        "SliceDetails": [
            {
                "EndTime": 4065,
                "Labels": "political_content,xxxx",
                "RiskLevel": "high",
                "RiskTips": "contraband_ProhibitedGoods",
                "RiskWords": "Risk Word A",
                "StartTime": 0,
                "Text": "Content Moderation product test case",
                "Url": "https://aliyundoc.com"
            }
        ]
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback message format

The callback message data is in JSON format, as shown below:

Field name

Field type

Description

checksum

String

The checksum. A string generated by concatenating user uid + seed + content and then applying the SHA256 algorithm.

The user UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud Management Console. For tamper-proofing, you can generate a string using the same algorithm when you receive the pushed result and verify it against the checksum.

Note

The user UID must be the UID of your Alibaba Cloud account, not the UID of a RAM user.

taskId

String

The task ID of the callback message.

content

String

The serialized moderation result. This is a JSON string. Parse it into a JSON object. The format of the content result is the same as the response for querying a task result. For more information, see Return parameters.

Code description

The following are the descriptions of the codes returned by the operation. Fees are incurred only for requests that return a code of 200.

Code

Description

200

The request is successful.

280

Verifying.

400

The request parameters are empty.

401

The request parameters are invalid.

402

The length of a request parameter does not meet the requirements. Check and modify the parameter.

403

The request exceeds the QPS limit. Check and adjust the QPS limit.

404

An error occurred while downloading the specified file. Check the file or retry.

405

The download of the specified file timed out. The file may be inaccessible. Check the file and retry.

406

The specified file exceeds the size limit. Check the file and retry.

407

The format of the specified file is not supported. Check the file and retry.

408

The account does not have permission to call this operation. The account may not have the service activated, may have an overdue payment, or may not be authorized to access the service.

480

The number of concurrent streams exceeds the limit. Check and adjust the concurrency.

500

A system error occurred.