How to use the Voice Moderation 2.0 multilingual service - AI Guardrails

Voice Moderation 2.0 detects harmful or policy-violating content in audio and video streams across 35 languages. It is designed for international platforms—social entertainment apps, user-generated content (UGC) sites, and online gaming—where content arrives in multiple languages and requires fast, accurate triage.

How it works

Submit an audio or video URL to start an asynchronous moderation task. The service splits the audio into fixed-length segments, runs multiple parallel detection models against each segment, and returns:

The transcribed text for every segment
One or more risk tags (e.g., profanity, contraband) per segment
A per-segment risk level (high, medium, low, or none)
A temporary URL to the audio clip for manual review

Retrieve results by polling the query API or by configuring a callback URL to receive push notifications when the task completes.

What's new in 2.0

Feature	Voice Moderation 2.0	Voice Moderation 1.0
Languages	35 languages (Singapore); Chinese, English, and Chinese-English mix (US Virginia)	Chinese only
Detection models	Multiple parallel models with language- and region-specific tuning; includes moaning detection	Single model; no moaning detection
Tag system	International tags with multiple simultaneous risk tags and sub-tags	Single risk tag per result
Audio segmentation	Fixed-length, adjustable segments; all segments returned	Semantic sentence-based segments (a few seconds to tens of seconds); only flagged segments returned
Manual review support	Returns transcribed text and temporary segment URLs for every segment	No temporary URLs by default
Max file size	500 MB	200 MB
QPS	100	50
Concurrent streams	50	20

In Voice Moderation, queries per second (QPS) refers to the number of API requests processed per second. Concurrent streams refers to the number of audio files or live streams being analyzed simultaneously.

Supported regions and languages

Region	Supported languages	Internet endpoint	VPC endpoint
Singapore	35 languages: Chinese, English, Arabic, German, Russian, French, Korean, Japanese, Spanish, Italian, Indonesian, Vietnamese, Malay, Thai, Hindi, Türkiye, Portuguese, Dutch, Polish, Bengali, Persian, Swedish, Danish, Norwegian, Icelandic, Finnish, Belarusian, Lithuanian, Czech, Slovak, Hungarian, Greek, Romanian	`https://green-cip.ap-southeast-1.aliyuncs.com`	`https://green-cip-vpc.ap-southeast-1.aliyuncs.com`
US (Virginia)	Chinese, English, Chinese-English mix	`https://green-cip.us-east-1.aliyuncs.com`	`https://green-cip-vpc.us-east-1.aliyuncs.com`

For 35-language detection, use the Singapore region. The US (Virginia) region supports Chinese and English only.

Supported formats

Audio files: MP3, WAV, AAC, WMA, OGG, M4A, AMR (up to 500 MB)

Video files: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, RM (up to 500 MB)

Live stream protocols: RTMP, HLS, HTTP-FLV, RTSP

Risk tags

When content contains multiple risk types, the service returns all matching tags simultaneously. Tags are returned in the labels field; sub-tags in the riskTips field.

Primary tags

Tag value	Category	What it detects
`violence`	Violence	Threats, incitement to physical harm, or terroristic content
`contraband`	Contraband	Discussion or promotion of illegal goods (e.g., drugs, weapons)
`Sexual Content`	Pornography	Sexually explicit or pornographic content
`profanity`	Profanity	Impolite, vulgar, or offensive language
`pullinTraffic`	Ad-driven traffic	Spam or unsolicited promotional content designed to redirect users
`regional`	Regional conflict	Politically sensitive content related to regional disputes
`C_customized`	Custom library match	Content that matches terms in your user-defined library

Sub-tags

Sub-tags provide a finer-grained classification within a primary tag. They are returned in the xxx_yyy format. For example, contraband_Drugs is a sub-tag of contraband.

Risk level handling

Risk level	Recommended action
`high`	Block or remove content immediately
`medium`	Route to human review queue
`low`	Act only when high recall is required; otherwise treat as clean
`none`	No risk detected

Billing

Voice Moderation 2.0 uses pay-as-you-go billing. Charges are calculated daily based on actual usage. There is no charge for days when the service is not used.

Moderation type	Service values	Unit price
Standard Voice Moderation (`voice_standard`)	`audio_multilingual_global`, `stream_multilingual_global`	USD 9.0 per 1,000 minutes

Fees apply only to requests that return HTTP 200. Requests that return other status codes are not charged.

Prerequisites

Before you begin, make sure you have:

An Alibaba Cloud account with Voice Moderation 2.0 activated
A RAM user with the AliyunYundunGreenWebFullAccess policy attached
An AccessKey pair for the RAM user

Activate the service

Go to Activate Service to activate Voice Moderation 2.0.

Set up RAM permissions

Log in to the RAM console as a RAM administrator.
Create a RAM user. For details, see Create a RAM user.
Attach the AliyunYundunGreenWebFullAccess system policy to the RAM user. For details, see Grant permissions to a RAM user.

The RAM user can now call Content Moderation API operations.

Install the SDK

Use the software development kit (SDK) to call the API without constructing raw HTTP requests. For SDK sample code in languages not listed below, use the OpenAPI Developer Portal to auto-generate code for the VoiceModeration operation.

For SDK installation and integration details, see Voice Moderation 2.0 SDKs and integration guide.

API reference

The service endpoint pattern is https://green-cip.{region}.aliyuncs.com.

Two API operations are available:

VoiceModeration — Submit a moderation task
VoiceModerationResult — Query a moderation task result

For information about constructing raw HTTP requests, see Make a raw HTTP call.

Submit a moderation task

Operation: VoiceModeration

Request parameters

Parameter	Type	Required	Description
`Service`	String	Yes	The moderation service type. Valid values: `audio_multilingual_global` (audio and video files), `stream_multilingual_global` (live streams)
`ServiceParameters`	JSON string	Yes	The service parameters. See the table below.

ServiceParameters

Parameter	Type	Required	Example	Description
`url`	String	Yes	`http://aliyundoc.com/test.flv`	Public HTTP or HTTPS URL of the audio or video to analyze
`callback`	String	No	`http://aliyundoc.com`	URL to receive the moderation result as a push notification. Must support HTTP POST, UTF-8 encoding, and the form fields `checksum` and `content`. If omitted, poll for results using `VoiceModerationResult`.
`seed`	String	No	`abc****`	A random string (letters, digits, underscores; max 64 characters) used to generate the callback `checksum`. Required when using a callback.
`cryptType`	String	No	`SHA256`	Encryption algorithm for the callback checksum. Valid values: `SHA256` (default), `SM3`
`liveId`	String	No	`liveId1****`	A live stream identifier used to deduplicate tasks. If you pass this, the service checks for an active task matching `uid+service+liveId` and returns the existing task ID instead of creating a new one.
`dataId`	String	No	`voice20240307***`	A custom identifier for the submitted content (alphanumeric, `_`, `-`, `.`; max 64 characters)

Response parameters

Parameter	Type	Example	Description
`Code`	Integer	`200`	HTTP status code
`Data`	JSON object	`{"taskId": "AAAAA-BBBBB"}`	Contains the task ID
`Message`	String	`OK`	Response message
`RequestId`	String	`AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****`	The request ID

Example

Request:

{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"taskId\":\"AAAAA-BBBBB"}"
}

{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"cryptType\":\"SHA256\",\"seed\":\"abc***123\",\"callback\":\"https://aliyun.com/callback\",\"url\":\"http://aliyundoc.com/test.flv"}"
}

Response:

{
  "code": 200,
  "data": {
    "taskId": "AAAAA-BBBBB"
  },
  "message": "SUCCESS",
  "requestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query a moderation task result

Operation: VoiceModerationResult

After the task completes, the response includes results for all voice segments—not just flagged ones.

Request parameters

Parameter	Type	Required	Description
`Service`	String	Yes	The moderation service type (same value used when submitting the task)
`ServiceParameters`	JSON string	Yes	See the table below

ServiceParameters

Parameter	Type	Required	Example	Description
`taskId`	String	Yes	`AAAAA-BBBBB`	The task ID returned by `VoiceModeration`

Response parameters

Top-level fields

Parameter	Type	Example	Description
`Code`	Integer	`200`	HTTP status code
`Data`	JSON object	—	See the Data fields table below
`Message`	String	`OK`	Response message
`RequestId`	String	`AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****`	The request ID

Data fields

Field	Type	Description
`url`	String	URL of the analyzed content
`LiveId`	String	Live stream ID (if provided at submission)
`DataId`	String	Custom data ID (if provided at submission)
`RiskLevel`	String	Overall risk level across all segments: `high`, `medium`, `low`, or `none`
`sliceDetails`	Array	Per-segment results. See the sliceDetails fields table below.

sliceDetails fields

Field	Type	Description
`startTime`	Integer	Segment start time, in seconds
`endTime`	Integer	Segment end time, in seconds
`startTimestamp`	Integer	Segment start timestamp, in milliseconds
`endTimestamp`	Integer	Segment end timestamp, in milliseconds
`text`	String	Transcribed text for the segment
`url`	String	Temporary URL to the audio clip. Valid for 30 minutes—save it promptly.
`labels`	String	Comma-separated risk tags (e.g., `profanity,contraband`)
`RiskLevel`	String	Risk level for this segment: `high`, `medium`, `low`, or `none`
`riskWords`	String	Comma-separated words that triggered the risk tags
`riskTips`	String	Comma-separated sub-tags (e.g., `contraband_Drugs`)
`extend`	String	Reserved field

Example

Request:

Response:

{
  "Code": 200,
  "Data": {
    "DataId": "voice20240307***",
    "LiveId": "liveId1****",
    "RiskLevel": "high",
    "SliceDetails": [
      {
        "StartTime": 0,
        "EndTime": 4065,
        "Labels": "political_content,xxxx",
        "RiskLevel": "high",
        "RiskTips": "contraband_ProhibitedGoods",
        "RiskWords": "Risk Word A",
        "Text": "AI Guardrails product test case",
        "Url": "https://aliyundoc.com"
      }
    ]
  },
  "Message": "OK",
  "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback notifications

If you set a callback URL when submitting a task, the service sends a POST request to that URL when the task completes. The request body contains two form fields:

Field	Type	Description
`checksum`	String	Verification string generated by applying SHA256 to the concatenation of `user UID + seed + content`. The user UID is your Alibaba Cloud account ID (not a RAM user UID), found in the Alibaba Cloud Management Console. To verify the request, generate the same string on your server and compare it against `checksum`.
`taskId`	String	The task ID
`content`	String	JSON string containing the moderation result. Parse it into a JSON object. The structure matches the `Data` field in the query response.

Return HTTP 200 to acknowledge receipt. Any other status code is treated as a failure, and the service retries up to 16 times. After 16 failed attempts, the push stops.

Error codes

Code	Description
`200`	Success
`280`	Verifying
`400`	Request parameters are empty
`401`	Request parameters are invalid
`402`	A parameter value exceeds the allowed length
`403`	Request exceeds the QPS limit
`404`	File download failed—check the URL and retry
`405`	File download timed out—the file may be inaccessible; check and retry
`406`	File exceeds the size limit
`407`	File format is not supported
`408`	Insufficient permissions—the account may not have the service activated, may have an overdue balance, or may not be authorized
`480`	Concurrent streams limit exceeded
`500`	System error