This topic introduces the voice moderation service, which uses the Tongyi large model to moderate audio files and live audio streams.
1. Service
Voice Moderation (Large Model Edition) is powered by the Qwen Moderation Large Model, trained specifically to identify content risks in audio. It provides the following voice moderation services:
Audio and Video Media Detection_LLM Edition:
This service moderates audio files. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.
Social & Entertainment Live Stream Detection_LLM Version:
This service moderates live audio streams. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.
Service selection
Service | Description | Supported regions | Typical scenarios |
Audio and Video Media Detection_LLM Edition (audio_detection_byllm_global) |
| Singapore |
|
Social & Entertainment Live Stream Detection_LLM Version (live_detection_byllm_global) |
| Singapore |
|
Billing
The large model services for Voice Moderation V2.0 support two billing methods: pay-as-you-go and .
Pay-as-you-go
After you activate the Voice Moderation V2.0 service, pay-as-you-go is the default billing method. You are billed daily for your actual usage. You incur no fees if you do not call the service.
Moderation type | Service | Unit price |
Voice Moderation Advanced (audio_advanced) |
| USD 18 per 1,000 minutes Note We bill based on the total duration of audio processed. For example, if you use the Audio and Video Media Detection_LLM Edition service to process 100 minutes of audio, we charge you USD 1.80. |
The pay-as-you-go billing frequency for Content Moderation V2.0 is once every 24 hours. In the billing details, moderationType corresponds to the moderation type field. You can view the billing details.
Get started
Step 1: Activate the service
Go to the activation page to activate the Voice Moderation 2.0 service.
After you activate the Voice Moderation 2.0 service, the default billing method is pay-as-you-go. You are billed based on your actual usage, and you incur no fees if you do not use the service. After you integrate and use the API, the system automatically generates bills based on your usage. For more information, see Billing. You can also purchase a resource package. Resource packages offer tiered discounts compared with the pay-as-you-go method and are suitable for users with predictable or high usage.
Step 2: Grant permissions to a RAM user
Before you access the SDK or API, you must grant permissions to a RAM user. To authenticate API calls, use an AccessKey from your Alibaba Cloud account or a RAM user. For more information about how to obtain an AccessKey, see Obtain an AccessKey.
Log in to the RAM console with your Alibaba Cloud account or as a RAM administrator.
Create a RAM user. For more information, see Create a RAM user.
Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. For more information, see Grant permissions to a RAM user. You can then use the RAM user to call Content Moderation APIs.
Step 3: Install and integrate the SDK
See the Voice Moderation 2.0 SDK and Integration Guide for installation and integration instructions. The service is available in the following regions:
Region | Public endpoint | VPC endpoint | Services |
Singapore | green-cip.ap-southeast-1.aliyuncs.com | green-cip-vpc.ap-southeast-1.aliyuncs.com | audio_detection_byllm_global, live_detection_byllm_global |
Step 4: Adjust moderation rules (Optional)
In the Content Moderation console, you can adjust detection rules for the Voice Moderation large model, manage text moderation rules, replicate a service, configure a custom dictionary, query detection records, and check usage.
API reference
Usage notes
You can call this API to create audio content moderation tasks. For instructions on how to construct an HTTP request, see Making native HTTP calls. You can also use an SDK to call the API. For more information, see the SDK and integration guide for audio moderation (enhanced edition) V2.0.
API operations:
Submit a moderation task: VoiceModeration
Query a moderation task: VoiceModerationResult
Cancel a moderation task: VoiceModerationCancel
Billing:
This is a paid API. You are charged only for requests that return an HTTP status code of 200. You are not charged for requests that result in other error codes. For more information, see Billing.
Service performance:
Service performance
Description
Audio file size
The audio moderation (enhanced edition) supports audio files up to 500 MB.
Audio and video file formats
Supported audio file formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR.
Supported video file formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.
Live audio stream
Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP.
QPS
The QPS limit for task submission is 100.
Concurrent streams
The enhanced edition supports 50 concurrent streams by default.
Submit a moderation task
Request parameters
Parameter | Type | Required | Example value | Description |
Service | String | Yes | audio_detection_byllm_global | The moderation service to use. Valid values:
|
ServiceParameters | JSONString | Yes | The parameters required by the moderation service, formatted as a JSON string. For a description of each parameter, see ServiceParameters. |
Table 1. ServiceParameters
Parameter | Type | Required | Example value | Description |
url | String | Yes. You must provide the file by using one of the following three methods:
| http://aliyundoc.com/test.mp3 | The URL of the object to moderate. Supports public HTTP and HTTPS URLs. |
ossBucketName | String | bucket_01 | The name of the authorized OSS Note To use an internal OSS URL, you must first use your Alibaba Cloud account (root account) to grant access on the Cloud Resource Access Authorization page. | |
ossObjectName | String | 20240307/07/28/test.mp3 | The name of the object in the authorized OSS | |
ossRegionId | String | cn-shanghai | The | |
callback | String | No | http://aliyundoc.com | The The
Note Your callback endpoint must return an HTTP |
seed | String | No | abc**** | A random string used in the It can consist of letters, digits, and underscores (_), with a maximum length of 64 characters. You can customize this value to verify that Note This parameter is required when the |
cryptType | String | No | SHA256 | When using a
|
liveId | String | No | liveId1**** | The ID of the live audio stream. This parameter prevents repeated moderation of the same live stream through deduplication. If provided, |
dataId | String | No | voice20240307*** | A custom data ID for the object being moderated. This ID uniquely identifies your business data. The ID can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The maximum length is 64 characters. |
referer | String | No | www.aliyun.com | The |
Response parameters
Parameter | Type | Example value | Description |
Code | Integer | 200 | The |
Data | JSONObject | {"TaskId": "AAAAA-BBBBB","DataId": "voice20240307***"} | The response data. |
Message | String | OK | The response message. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The |
Examples
Request example
{
"Service": "audio_detection_byllm_global",
"ServiceParameters": {
"cryptType": "SHA256",
"seed": "abc***123",
"callback": "https://aliyun.com/callback",
"url": "http://aliyundoc.com/test.mp3"
}
}Successful response example
{
"Code": 200,
"Data": {
"TaskId": "AAAAA-BBBBB",
"DataId": "voice20240307***"
},
"Message": "SUCCESS",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Query task results
When a live stream moderation task is in progress, a query returns the N most recent audio slices. After the task is complete, the query returns all audio slices.
To query a moderation task, call the VoiceModerationResult operation.
Billing: This API operation is not billed.
Query timeout: Query the results 30 seconds after submitting an asynchronous moderation task. The service stores results for up to 24 hours before automatically deleting them.
QPS limit
This operation has a QPS limit of 100 for each user. API calls exceeding this limit are throttled, which may affect your business. We recommend calling this operation at a reasonable rate.
Request parameters
Parameter | Type | Required | Example | Description |
Service | String | Yes | audio_detection_byllm | The type of moderation service. |
ServiceParameters | JSONString | Yes | The required service parameters, provided as a JSON string. For details, see ServiceParameters. |
Table 2. ServiceParameters
Parameter | Type | Required | Example | Description |
taskId | String | Yes | AAAAA-BBBBB | The ID of the task, returned upon task submission. |
Response parameters
Parameter | Type | Example | Description |
Code | Integer | 200 | The status code. For more information, see Status codes. |
Data | JSONObject | The results of the audio content moderation. For more information, see Data. | |
Message | String | OK | The response message. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 3. Data
Parameter | Type | Example | Description |
Url | String | https://aliyundoc.com/text.mp3 | The URL of the moderated object. |
LiveId | String | liveId1**** | The ID of the live audio stream (optional). |
DataId | String | voice20240307*** | The data ID of the moderated object (optional). |
RiskLevel | String | high | The combined risk level of all audio slices. Valid values are:
Note Address high-risk content immediately and manually review medium-risk content. For standard use cases, low-risk content can be treated as safe. Process low-risk content only if a high recall rate is required. |
SliceDetails | JSONArray | An array of objects with detailed results for each audio slice. For more information, see sliceDetails. |
Table 4. SliceDetails
Parameter | Type | Example | Description |
StartTime | Integer | 0 | The start time of the sentence, in seconds. |
EndTime | Integer | 4065 | The end time of the sentence, in seconds. |
StartTimestamp | Integer | 1678854649720 | The start timestamp of the audio slice, in milliseconds. |
EndTimestamp | Integer | 1678854649720 | The end timestamp of the audio slice, in milliseconds. |
Text | String | Disgusting | The text transcribed from the audio. |
Url | String | https://aliyundoc.com | If the moderated content is from a live audio stream, this parameter provides a temporary URL to the corresponding audio segment. The URL is valid for 30 minutes. Save the content promptly if needed. |
RiskLevel | String | high | The risk level of the audio slice. Valid values are:
|
Result | JSONArray | An array of objects that provides the moderation results. For more information, see Result. |
Table 5. Result
Parameter | Type | Example | Description |
Label | String | political_entity | The moderation label assigned to the content. The service may detect multiple labels. This includes labels from the text policy and labels specific to audio:
Note By default, the detection of specified speaking and sexual sounds is disabled. To enable this feature, contact your account manager. |
Description | String | Suspected pornographic content | A description of the Important This field provides an explanation of the |
Confidence | Float | 81.22 | The confidence score, a value from 0 to 100 with up to two decimal places. Some labels do not have a confidence score. |
RiskLevel | String | high | The risk level of the current label. Valid values are:
|
Riskwords | String | AA,BB,CC | Detected risk words, separated by commas. This parameter is not returned for all labels. |
CustomizedHit | JSONArray | [{"LibName":"...","Keywords":"..."}] | If a keyword from a custom library is detected, the |
RiskPositions | JSONArray | The positions of the detected risk words. For more information, see RiskPositions. |
Table 6. CustomizedHit
Parameter | Type | Example | Description |
LibName | String | CustomLibrary1 | The name of the custom library. |
Keywords | String | CustomKeyword1,CustomKeyword2 | Detected custom keywords, separated by commas. |
Table 7. RiskPositions
Parameter | Type | Example | Description |
RiskWord | String | AA | The detected risk word. |
StartPos | Integer | 10 | The start position of the risk word. |
EndPos | Integer | 12 | The end position of the risk word. |
Examples
Request example
{
"Service": "audio_detection_byllm",
"ServiceParameters": {
"taskId": "AAAAA-BBBBB"
}
}Successful response example
{
"Code": 200,
"Data": {
"DataId": "voice20240307***",
"LiveId": "liveId1****",
"RiskLevel": "high",
"SliceDetails": [
{
"StartTime": 0,
"EndTime": 4065,
"RiskLevel": "high",
"Result": [
{
"Label": "political_entity",
"Description": "Suspected political entity",
"Confidence": 100.0,
"RiskLevel": "high",
"RiskWords": "WordA,WordB",
"RiskPositions": [
{
"EndPos": 14,
"RiskWord": "WordA",
"StartPos": 16
}
]
}
],
"Text": "Content Moderation product test case",
"Url": "https://aliyundoc.com"
}
]
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Cancel a moderation task
You can only cancel moderation tasks for live streams. Tasks for file types cannot be canceled.
To cancel a moderation task, call the VoiceModerationCancel operation.
Billing: This operation is free of charge.
Request parameters
Parameter | Type | Required | Example value | Description |
Service | String | Yes | live_detection_byllm_global | Review the service type. |
ServiceParameters | JSONString | Yes | The parameters for the moderation service, formatted as a JSON string. For details about each field, see ServiceParameters. |
Table 5. ServiceParameters
Parameter | Type | Required | Example value | Description |
taskId | String | Yes | AAAAA-BBBBB | The ID of the moderation task to cancel. |
Response parameters
Parameter | Type | Example value | Description |
Code | Integer | 200 | The response status code. For more information, see Code description. |
Message | String | OK | The response message. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The unique ID of the request. |
Examples
Request example
{
"Service": "live_detection_byllm_global",
"ServiceParameters": {
"taskId": "AAAAA-BBBBB"
}
}Successful response example
{
"Code": 200,
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Callback message format
A callback message is a JSON object with the following fields:
Parameter | Type | Description |
checksum | String | A checksum generated by applying the SHA-256 algorithm to the concatenated string of The UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud console. To prevent tampering, verify the message integrity upon receipt by generating a checksum with the same algorithm and comparing it with the received Note The UID must be for your Alibaba Cloud account, not for a RAM user. |
taskId | String | The ID of the associated task. |
content | String | A JSON-formatted string containing the serialized detection results. You must parse this string to retrieve the JSON object. The format is identical to the response from a task result query. For more information, see Response Parameters. |
Codes
This section describes the codes that the API returns. You are only billed for successful requests, which are indicated by a code of 200.
Code | Description |
200 | The request succeeded. |
280 | Detection is in progress. |
400 | A required request parameter is missing or empty. |
401 | A request parameter is invalid. Check the parameter value and try again. |
402 | A request parameter value exceeds the length limit. Check the parameter and try again. |
403 | The request rate exceeds the QPS limit. Check and adjust your QPS limit. |
404 | An error occurred while downloading the input file. Check the file URL and try again. |
405 | The file download timed out. This may be because the file is inaccessible. Check the file URL and your network settings, then try again. |
406 | The input file is too large. Use a smaller file and try again. |
407 | The input file format is not supported. Use a supported format and try again. |
408 | The account is not authorized to call this API. This can occur if the service has not been activated, the account has an overdue payment, or the account lacks the necessary permissions. |
480 | The number of concurrent requests exceeds the limit. Reduce your concurrency and try again. |
500 | A system error occurred. |