All Products
Search
Document Center

AI Guardrails:Voice Moderation 2.0 multilingual service

Last Updated:Mar 31, 2026

Voice Moderation 2.0 detects harmful or policy-violating content in audio and video streams across 35 languages. It is designed for international platforms—social entertainment apps, user-generated content (UGC) sites, and online gaming—where content arrives in multiple languages and requires fast, accurate triage.

How it works

Submit an audio or video URL to start an asynchronous moderation task. The service splits the audio into fixed-length segments, runs multiple parallel detection models against each segment, and returns:

  • The transcribed text for every segment

  • One or more risk tags (e.g., profanity, contraband) per segment

  • A per-segment risk level (high, medium, low, or none)

  • A temporary URL to the audio clip for manual review

Retrieve results by polling the query API or by configuring a callback URL to receive push notifications when the task completes.

What's new in 2.0

FeatureVoice Moderation 2.0Voice Moderation 1.0
Languages35 languages (Singapore); Chinese, English, and Chinese-English mix (US Virginia)Chinese only
Detection modelsMultiple parallel models with language- and region-specific tuning; includes moaning detectionSingle model; no moaning detection
Tag systemInternational tags with multiple simultaneous risk tags and sub-tagsSingle risk tag per result
Audio segmentationFixed-length, adjustable segments; all segments returnedSemantic sentence-based segments (a few seconds to tens of seconds); only flagged segments returned
Manual review supportReturns transcribed text and temporary segment URLs for every segmentNo temporary URLs by default
Max file size500 MB200 MB
QPS10050
Concurrent streams5020
In Voice Moderation, queries per second (QPS) refers to the number of API requests processed per second. Concurrent streams refers to the number of audio files or live streams being analyzed simultaneously.

Supported regions and languages

RegionSupported languagesInternet endpointVPC endpoint
Singapore35 languages: Chinese, English, Arabic, German, Russian, French, Korean, Japanese, Spanish, Italian, Indonesian, Vietnamese, Malay, Thai, Hindi, Türkiye, Portuguese, Dutch, Polish, Bengali, Persian, Swedish, Danish, Norwegian, Icelandic, Finnish, Belarusian, Lithuanian, Czech, Slovak, Hungarian, Greek, Romanianhttps://green-cip.ap-southeast-1.aliyuncs.comhttps://green-cip-vpc.ap-southeast-1.aliyuncs.com
US (Virginia)Chinese, English, Chinese-English mixhttps://green-cip.us-east-1.aliyuncs.comhttps://green-cip-vpc.us-east-1.aliyuncs.com
For 35-language detection, use the Singapore region. The US (Virginia) region supports Chinese and English only.

Supported formats

Audio files: MP3, WAV, AAC, WMA, OGG, M4A, AMR (up to 500 MB)

Video files: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, RM (up to 500 MB)

Live stream protocols: RTMP, HLS, HTTP-FLV, RTSP

Risk tags

When content contains multiple risk types, the service returns all matching tags simultaneously. Tags are returned in the labels field; sub-tags in the riskTips field.

Primary tags

Tag valueCategoryWhat it detects
violenceViolenceThreats, incitement to physical harm, or terroristic content
contrabandContrabandDiscussion or promotion of illegal goods (e.g., drugs, weapons)
Sexual ContentPornographySexually explicit or pornographic content
profanityProfanityImpolite, vulgar, or offensive language
pullinTrafficAd-driven trafficSpam or unsolicited promotional content designed to redirect users
regionalRegional conflictPolitically sensitive content related to regional disputes
C_customizedCustom library matchContent that matches terms in your user-defined library

Sub-tags

Sub-tags provide a finer-grained classification within a primary tag. They are returned in the xxx_yyy format. For example, contraband_Drugs is a sub-tag of contraband.

Risk level handling

Risk levelRecommended action
highBlock or remove content immediately
mediumRoute to human review queue
lowAct only when high recall is required; otherwise treat as clean
noneNo risk detected

Billing

Voice Moderation 2.0 uses pay-as-you-go billing. Charges are calculated daily based on actual usage. There is no charge for days when the service is not used.

Moderation typeService valuesUnit price
Standard Voice Moderation (voice_standard)audio_multilingual_global, stream_multilingual_globalUSD 9.0 per 1,000 minutes

Fees apply only to requests that return HTTP 200. Requests that return other status codes are not charged.

Prerequisites

Before you begin, make sure you have:

  • An Alibaba Cloud account with Voice Moderation 2.0 activated

  • A RAM user with the AliyunYundunGreenWebFullAccess policy attached

  • An AccessKey pair for the RAM user

Activate the service

Go to Activate Service to activate Voice Moderation 2.0.

Set up RAM permissions

  1. Log in to the RAM console as a RAM administrator.

  2. Create a RAM user. For details, see Create a RAM user.

  3. Attach the AliyunYundunGreenWebFullAccess system policy to the RAM user. For details, see Grant permissions to a RAM user.

The RAM user can now call Content Moderation API operations.

Install the SDK

Use the software development kit (SDK) to call the API without constructing raw HTTP requests. For SDK sample code in languages not listed below, use the OpenAPI Developer Portal to auto-generate code for the VoiceModeration operation.

For SDK installation and integration details, see Voice Moderation 2.0 SDKs and integration guide.

API reference

The service endpoint pattern is https://green-cip.{region}.aliyuncs.com.

Two API operations are available:

  • VoiceModeration — Submit a moderation task

  • VoiceModerationResult — Query a moderation task result

For information about constructing raw HTTP requests, see Make a raw HTTP call.

Submit a moderation task

Operation: VoiceModeration

Request parameters

ParameterTypeRequiredDescription
ServiceStringYesThe moderation service type. Valid values: audio_multilingual_global (audio and video files), stream_multilingual_global (live streams)
ServiceParametersJSON stringYesThe service parameters. See the table below.

ServiceParameters

ParameterTypeRequiredExampleDescription
urlStringYeshttp://aliyundoc.com/test.flvPublic HTTP or HTTPS URL of the audio or video to analyze
callbackStringNohttp://aliyundoc.comURL to receive the moderation result as a push notification. Must support HTTP POST, UTF-8 encoding, and the form fields checksum and content. If omitted, poll for results using VoiceModerationResult.
seedStringNoabc****A random string (letters, digits, underscores; max 64 characters) used to generate the callback checksum. Required when using a callback.
cryptTypeStringNoSHA256Encryption algorithm for the callback checksum. Valid values: SHA256 (default), SM3
liveIdStringNoliveId1****A live stream identifier used to deduplicate tasks. If you pass this, the service checks for an active task matching uid+service+liveId and returns the existing task ID instead of creating a new one.
dataIdStringNovoice20240307***A custom identifier for the submitted content (alphanumeric, _, -, .; max 64 characters)

Response parameters

ParameterTypeExampleDescription
CodeInteger200HTTP status code
DataJSON object{"taskId": "AAAAA-BBBBB"}Contains the task ID
MessageStringOKResponse message
RequestIdStringAAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****The request ID

Example

Request:

{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"taskId\":\"AAAAA-BBBBB"}"
}
{
  "service":"audio_multilingual_global",
  "serviceParameters":"{\"cryptType\":\"SHA256\",\"seed\":\"abc***123\",\"callback\":\"https://aliyun.com/callback\",\"url\":\"http://aliyundoc.com/test.flv"}"
}

Response:

{
  "code": 200,
  "data": {
    "taskId": "AAAAA-BBBBB"
  },
  "message": "SUCCESS",
  "requestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query a moderation task result

Operation: VoiceModerationResult

After the task completes, the response includes results for all voice segments—not just flagged ones.

Request parameters

ParameterTypeRequiredDescription
ServiceStringYesThe moderation service type (same value used when submitting the task)
ServiceParametersJSON stringYesSee the table below

ServiceParameters

ParameterTypeRequiredExampleDescription
taskIdStringYesAAAAA-BBBBBThe task ID returned by VoiceModeration

Response parameters

Top-level fields

ParameterTypeExampleDescription
CodeInteger200HTTP status code
DataJSON objectSee the Data fields table below
MessageStringOKResponse message
RequestIdStringAAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****The request ID

Data fields

FieldTypeDescription
urlStringURL of the analyzed content
LiveIdStringLive stream ID (if provided at submission)
DataIdStringCustom data ID (if provided at submission)
RiskLevelStringOverall risk level across all segments: high, medium, low, or none
sliceDetailsArrayPer-segment results. See the sliceDetails fields table below.

sliceDetails fields

FieldTypeDescription
startTimeIntegerSegment start time, in seconds
endTimeIntegerSegment end time, in seconds
startTimestampIntegerSegment start timestamp, in milliseconds
endTimestampIntegerSegment end timestamp, in milliseconds
textStringTranscribed text for the segment
urlStringTemporary URL to the audio clip. Valid for 30 minutes—save it promptly.
labelsStringComma-separated risk tags (e.g., profanity,contraband)
RiskLevelStringRisk level for this segment: high, medium, low, or none
riskWordsStringComma-separated words that triggered the risk tags
riskTipsStringComma-separated sub-tags (e.g., contraband_Drugs)
extendStringReserved field

Example

Request:

Response:

{
  "Code": 200,
  "Data": {
    "DataId": "voice20240307***",
    "LiveId": "liveId1****",
    "RiskLevel": "high",
    "SliceDetails": [
      {
        "StartTime": 0,
        "EndTime": 4065,
        "Labels": "political_content,xxxx",
        "RiskLevel": "high",
        "RiskTips": "contraband_ProhibitedGoods",
        "RiskWords": "Risk Word A",
        "Text": "AI Guardrails product test case",
        "Url": "https://aliyundoc.com"
      }
    ]
  },
  "Message": "OK",
  "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback notifications

If you set a callback URL when submitting a task, the service sends a POST request to that URL when the task completes. The request body contains two form fields:

FieldTypeDescription
checksumStringVerification string generated by applying SHA256 to the concatenation of user UID + seed + content. The user UID is your Alibaba Cloud account ID (not a RAM user UID), found in the Alibaba Cloud Management Console. To verify the request, generate the same string on your server and compare it against checksum.
taskIdStringThe task ID
contentStringJSON string containing the moderation result. Parse it into a JSON object. The structure matches the Data field in the query response.

Return HTTP 200 to acknowledge receipt. Any other status code is treated as a failure, and the service retries up to 16 times. After 16 failed attempts, the push stops.

Error codes

CodeDescription
200Success
280Verifying
400Request parameters are empty
401Request parameters are invalid
402A parameter value exceeds the allowed length
403Request exceeds the QPS limit
404File download failed—check the URL and retry
405File download timed out—the file may be inaccessible; check and retry
406File exceeds the size limit
407File format is not supported
408Insufficient permissions—the account may not have the service activated, may have an overdue balance, or may not be authorized
480Concurrent streams limit exceeded
500System error