Voice Moderation 2.0 detects harmful or policy-violating content in audio and video streams across 35 languages. It is designed for international platforms—social entertainment apps, user-generated content (UGC) sites, and online gaming—where content arrives in multiple languages and requires fast, accurate triage.
How it works
Submit an audio or video URL to start an asynchronous moderation task. The service splits the audio into fixed-length segments, runs multiple parallel detection models against each segment, and returns:
The transcribed text for every segment
One or more risk tags (e.g.,
profanity,contraband) per segmentA per-segment risk level (
high,medium,low, ornone)A temporary URL to the audio clip for manual review
Retrieve results by polling the query API or by configuring a callback URL to receive push notifications when the task completes.
What's new in 2.0
| Feature | Voice Moderation 2.0 | Voice Moderation 1.0 |
|---|---|---|
| Languages | 35 languages (Singapore); Chinese, English, and Chinese-English mix (US Virginia) | Chinese only |
| Detection models | Multiple parallel models with language- and region-specific tuning; includes moaning detection | Single model; no moaning detection |
| Tag system | International tags with multiple simultaneous risk tags and sub-tags | Single risk tag per result |
| Audio segmentation | Fixed-length, adjustable segments; all segments returned | Semantic sentence-based segments (a few seconds to tens of seconds); only flagged segments returned |
| Manual review support | Returns transcribed text and temporary segment URLs for every segment | No temporary URLs by default |
| Max file size | 500 MB | 200 MB |
| QPS | 100 | 50 |
| Concurrent streams | 50 | 20 |
In Voice Moderation, queries per second (QPS) refers to the number of API requests processed per second. Concurrent streams refers to the number of audio files or live streams being analyzed simultaneously.
Supported regions and languages
| Region | Supported languages | Internet endpoint | VPC endpoint |
|---|---|---|---|
| Singapore | 35 languages: Chinese, English, Arabic, German, Russian, French, Korean, Japanese, Spanish, Italian, Indonesian, Vietnamese, Malay, Thai, Hindi, Türkiye, Portuguese, Dutch, Polish, Bengali, Persian, Swedish, Danish, Norwegian, Icelandic, Finnish, Belarusian, Lithuanian, Czech, Slovak, Hungarian, Greek, Romanian | https://green-cip.ap-southeast-1.aliyuncs.com | https://green-cip-vpc.ap-southeast-1.aliyuncs.com |
| US (Virginia) | Chinese, English, Chinese-English mix | https://green-cip.us-east-1.aliyuncs.com | https://green-cip-vpc.us-east-1.aliyuncs.com |
For 35-language detection, use the Singapore region. The US (Virginia) region supports Chinese and English only.
Supported formats
Audio files: MP3, WAV, AAC, WMA, OGG, M4A, AMR (up to 500 MB)
Video files: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, RM (up to 500 MB)
Live stream protocols: RTMP, HLS, HTTP-FLV, RTSP
Risk tags
When content contains multiple risk types, the service returns all matching tags simultaneously. Tags are returned in the labels field; sub-tags in the riskTips field.
Primary tags
| Tag value | Category | What it detects |
|---|---|---|
violence | Violence | Threats, incitement to physical harm, or terroristic content |
contraband | Contraband | Discussion or promotion of illegal goods (e.g., drugs, weapons) |
Sexual Content | Pornography | Sexually explicit or pornographic content |
profanity | Profanity | Impolite, vulgar, or offensive language |
pullinTraffic | Ad-driven traffic | Spam or unsolicited promotional content designed to redirect users |
regional | Regional conflict | Politically sensitive content related to regional disputes |
C_customized | Custom library match | Content that matches terms in your user-defined library |
Sub-tags
Sub-tags provide a finer-grained classification within a primary tag. They are returned in the xxx_yyy format. For example, contraband_Drugs is a sub-tag of contraband.
Risk level handling
| Risk level | Recommended action |
|---|---|
high | Block or remove content immediately |
medium | Route to human review queue |
low | Act only when high recall is required; otherwise treat as clean |
none | No risk detected |
Billing
Voice Moderation 2.0 uses pay-as-you-go billing. Charges are calculated daily based on actual usage. There is no charge for days when the service is not used.
| Moderation type | Service values | Unit price |
|---|---|---|
Standard Voice Moderation (voice_standard) | audio_multilingual_global, stream_multilingual_global | USD 9.0 per 1,000 minutes |
Fees apply only to requests that return HTTP 200. Requests that return other status codes are not charged.
Prerequisites
Before you begin, make sure you have:
An Alibaba Cloud account with Voice Moderation 2.0 activated
A RAM user with the
AliyunYundunGreenWebFullAccesspolicy attachedAn AccessKey pair for the RAM user
Activate the service
Go to Activate Service to activate Voice Moderation 2.0.
Set up RAM permissions
Log in to the RAM console as a RAM administrator.
Create a RAM user. For details, see Create a RAM user.
Attach the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user. For details, see Grant permissions to a RAM user.
The RAM user can now call Content Moderation API operations.
Install the SDK
Use the software development kit (SDK) to call the API without constructing raw HTTP requests. For SDK sample code in languages not listed below, use the OpenAPI Developer Portal to auto-generate code for the VoiceModeration operation.
For SDK installation and integration details, see Voice Moderation 2.0 SDKs and integration guide.
API reference
The service endpoint pattern is https://green-cip.{region}.aliyuncs.com.
Two API operations are available:
VoiceModeration — Submit a moderation task
VoiceModerationResult — Query a moderation task result
For information about constructing raw HTTP requests, see Make a raw HTTP call.
Submit a moderation task
Operation: VoiceModeration
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
Service | String | Yes | The moderation service type. Valid values: audio_multilingual_global (audio and video files), stream_multilingual_global (live streams) |
ServiceParameters | JSON string | Yes | The service parameters. See the table below. |
ServiceParameters
| Parameter | Type | Required | Example | Description |
|---|---|---|---|---|
url | String | Yes | http://aliyundoc.com/test.flv | Public HTTP or HTTPS URL of the audio or video to analyze |
callback | String | No | http://aliyundoc.com | URL to receive the moderation result as a push notification. Must support HTTP POST, UTF-8 encoding, and the form fields checksum and content. If omitted, poll for results using VoiceModerationResult. |
seed | String | No | abc**** | A random string (letters, digits, underscores; max 64 characters) used to generate the callback checksum. Required when using a callback. |
cryptType | String | No | SHA256 | Encryption algorithm for the callback checksum. Valid values: SHA256 (default), SM3 |
liveId | String | No | liveId1**** | A live stream identifier used to deduplicate tasks. If you pass this, the service checks for an active task matching uid+service+liveId and returns the existing task ID instead of creating a new one. |
dataId | String | No | voice20240307*** | A custom identifier for the submitted content (alphanumeric, _, -, .; max 64 characters) |
Response parameters
| Parameter | Type | Example | Description |
|---|---|---|---|
Code | Integer | 200 | HTTP status code |
Data | JSON object | {"taskId": "AAAAA-BBBBB"} | Contains the task ID |
Message | String | OK | Response message |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID |
Example
Request:
{
"service":"audio_multilingual_global",
"serviceParameters":"{\"taskId\":\"AAAAA-BBBBB"}"
}{
"service":"audio_multilingual_global",
"serviceParameters":"{\"cryptType\":\"SHA256\",\"seed\":\"abc***123\",\"callback\":\"https://aliyun.com/callback\",\"url\":\"http://aliyundoc.com/test.flv"}"
}Response:
{
"code": 200,
"data": {
"taskId": "AAAAA-BBBBB"
},
"message": "SUCCESS",
"requestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Query a moderation task result
Operation: VoiceModerationResult
After the task completes, the response includes results for all voice segments—not just flagged ones.
Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
Service | String | Yes | The moderation service type (same value used when submitting the task) |
ServiceParameters | JSON string | Yes | See the table below |
ServiceParameters
| Parameter | Type | Required | Example | Description |
|---|---|---|---|---|
taskId | String | Yes | AAAAA-BBBBB | The task ID returned by VoiceModeration |
Response parameters
Top-level fields
| Parameter | Type | Example | Description |
|---|---|---|---|
Code | Integer | 200 | HTTP status code |
Data | JSON object | — | See the Data fields table below |
Message | String | OK | Response message |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID |
Data fields
| Field | Type | Description |
|---|---|---|
url | String | URL of the analyzed content |
LiveId | String | Live stream ID (if provided at submission) |
DataId | String | Custom data ID (if provided at submission) |
RiskLevel | String | Overall risk level across all segments: high, medium, low, or none |
sliceDetails | Array | Per-segment results. See the sliceDetails fields table below. |
sliceDetails fields
| Field | Type | Description |
|---|---|---|
startTime | Integer | Segment start time, in seconds |
endTime | Integer | Segment end time, in seconds |
startTimestamp | Integer | Segment start timestamp, in milliseconds |
endTimestamp | Integer | Segment end timestamp, in milliseconds |
text | String | Transcribed text for the segment |
url | String | Temporary URL to the audio clip. Valid for 30 minutes—save it promptly. |
labels | String | Comma-separated risk tags (e.g., profanity,contraband) |
RiskLevel | String | Risk level for this segment: high, medium, low, or none |
riskWords | String | Comma-separated words that triggered the risk tags |
riskTips | String | Comma-separated sub-tags (e.g., contraband_Drugs) |
extend | String | Reserved field |
Example
Request:
Response:
{
"Code": 200,
"Data": {
"DataId": "voice20240307***",
"LiveId": "liveId1****",
"RiskLevel": "high",
"SliceDetails": [
{
"StartTime": 0,
"EndTime": 4065,
"Labels": "political_content,xxxx",
"RiskLevel": "high",
"RiskTips": "contraband_ProhibitedGoods",
"RiskWords": "Risk Word A",
"Text": "AI Guardrails product test case",
"Url": "https://aliyundoc.com"
}
]
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Callback notifications
If you set a callback URL when submitting a task, the service sends a POST request to that URL when the task completes. The request body contains two form fields:
| Field | Type | Description |
|---|---|---|
checksum | String | Verification string generated by applying SHA256 to the concatenation of user UID + seed + content. The user UID is your Alibaba Cloud account ID (not a RAM user UID), found in the Alibaba Cloud Management Console. To verify the request, generate the same string on your server and compare it against checksum. |
taskId | String | The task ID |
content | String | JSON string containing the moderation result. Parse it into a JSON object. The structure matches the Data field in the query response. |
Return HTTP 200 to acknowledge receipt. Any other status code is treated as a failure, and the service retries up to 16 times. After 16 failed attempts, the push stops.
Error codes
| Code | Description |
|---|---|
200 | Success |
280 | Verifying |
400 | Request parameters are empty |
401 | Request parameters are invalid |
402 | A parameter value exceeds the allowed length |
403 | Request exceeds the QPS limit |
404 | File download failed—check the URL and retry |
405 | File download timed out—the file may be inaccessible; check and retry |
406 | File exceeds the size limit |
407 | File format is not supported |
408 | Insufficient permissions—the account may not have the service activated, may have an overdue balance, or may not be authorized |
480 | Concurrent streams limit exceeded |
500 | System error |