Voice Moderation 2.0 features an upgraded voice model that supports voice content in Chinese, English, and a mix of Chinese and English. It provides moderation policies and a tag system tailored for international business. This topic describes the features and usage of the Voice Moderation 2.0 multilingual service.
Features
Compared to Voice Moderation 1.0, Voice Moderation 2.0 uses a separate policy and tag system to meet the needs of international business. It also offers more features to simplify usage and assist with manual review.
Comparison item | Voice Moderation 2.0 | Voice Moderation 1.0 |
Multilingual capabilities |
| Supports only Chinese by default. |
Moderation capabilities |
|
|
Tag system | Uses an international tag system with tags such as profanity and regional. This system supports multiple risk tags and sub-tags. | Uses a tag system designed for supports only a single risk tag. |
API features |
|
|
Internationalized tags
The Voice Moderation 2.0 multilingual service uses an international tag system. If content contains multiple types of risks, the service can return multiple tags simultaneously. Tag categorizations include but are not limited to the following:
Tag type | Categorization |
Primary tags (labels) |
|
Sub-tags (riskTips) | Sub-tags are returned in the |
Service performance
Voice Moderation 2.0 uses a high-performance core engine that can schedule dozens of models and policies with high concurrency to ensure timely service.
Service performance | Description |
File size | Version 2.0 increases the maximum supported voice file size from 200 MB to 500 MB. |
Voice file format | Supported voice file formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR. Supported video file formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM. |
Live voice stream | Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP. |
Queries per second (QPS) | The queries per second (QPS) for submitting tasks is increased from 50 to 100. |
Concurrent streams | Version 2.0 increases the default limit for concurrent streams from 20 to 50. |
In Voice Moderation, QPS refers to the number of requests that the API responds to per second. Concurrent streams refers to the number of voice files or voice streams being detected in the system simultaneously.
Billing information
The Voice Moderation 2.0 service supports pay-as-you-go billing methods.
Pay-as-you-go
After you activate the Voice Moderation 2.0 service, the default billing method is pay-as-you-go. You are charged daily based on your actual usage. If you do not use the service, you are not charged.
Moderation type | Supported business scenarios (services) | Unit price |
Standard Voice Moderation (voice_standard) |
| USD 9.0 per 1,000 minutes |
Access guide
Step 1: Activate the service
Go to Activate Service to activate the Voice Moderation 2.0 service.
Step 2: Grant permissions to a RAM user
Before you use the software development kit (SDK) or API, grant permissions to a RAM user. You can create an AccessKey pair for your Alibaba Cloud account or a RAM user. You must use an AccessKey pair for identity verification when you call Alibaba Cloud API operations. For more information, see Obtain an AccessKey pair.
Procedure
Log on to the RAM console as a RAM administrator.
- Create a RAM user.
For more information, see Create a RAM user.
- Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user.For more information, see Grant permissions to a RAM user.
After completing the preceding operations, you can call the Content Moderation API as the RAM user.
Step 3: Install and integrate the SDK
The following regions are supported:
Region | Internet endpoint | VPC endpoint |
Singapore | https://green-cip.ap-southeast-1.aliyuncs.com | https://green-cip-vpc.ap-southeast-1.aliyuncs.com |
US (Virginia) | https://green-cip.us-east-1.aliyuncs.com | https://green-cip-vpc.us-east-1.aliyuncs.com |
If you need SDK sample code in other languages, you can use the online debugging tool in OpenAPI Developer Portal to debug the API operation. This tool automatically generates SDK sample code for the API operation.
API
Usage notes
Service endpoint: https://green-cip.{region}.aliyuncs.com.
You can call this operation to create an voice content moderation task. You can construct an HTTP request manually or use an SDK. For information about how to construct a manual HTTP request, see Make a raw HTTP call. For more information about SDKs, see voice Moderation 2.0 SDKs and integration guide.
API Operations:
Submit moderation task: VoiceModeration
Query moderation task result: VoiceModerationResult
Billing:
This is a paid operation. You are charged only for requests that return an HTTP status code of 200. Requests that return other error codes are not charged.
Submit moderation task
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | udio_multilingual_global | The type of moderation service. Valid values:
|
ServiceParameters | JSONString | Yes | The parameter set required by the moderation service. This is a JSON string. For a description of each string, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
url | String | Yes | http://aliyundoc.com/test.flv | The URL of the object to be detected. This must be a public HTTP or HTTPS URL. |
callback | String | No | http://aliyundoc.com | The URL to which the moderation result is sent as a callback notification. HTTP and HTTPS are supported. If you leave this field empty, you must periodically poll for the moderation result. The callback interface must support the POST method, UTF-8 encoded data, and the form parameters checksum and content. Content Moderation sets the checksum and content parameters and calls your callback interface to return the moderation result according to the following rules and format.
Note After your server's callback interface receives the result pushed by Content Moderation, if it returns an HTTP status code of 200, the receipt is successful. Any other HTTP status code is considered a failure. On failure, Content Moderation will retry pushing the result up to 16 times until it is successfully received. If it is still not received after 16 retries, the push is stopped. Check the status of your callback interface. |
seed | String | No | abc**** | A random string used for the signature in the callback notification request. It can contain letters, digits, and underscores (_), and must not exceed 64 characters. You can customize this value to verify that the callback notification request is initiated by the Alibaba Cloud Content Moderation service. Note This field is required when using a callback. |
cryptType | String | No | SHA256 | When using a callback notification (callback), this sets the encryption algorithm for the notification content. Content Moderation encrypts the result (a string concatenated from
|
liveId | String | No | liveId1**** | The ID of the live voice stream. This parameter is used to deduplicate live voice stream tasks and prevent repeated moderation. If you pass this parameter, the system checks for an ongoing moderation task based on |
dataId | String | No | voice20240307*** | The data ID corresponding to the detected object. It can consist of uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.), and must not exceed 64 characters. You can use it to uniquely identify your business data. |
Return parameters
Name | Type | Example | Description |
Code | Integer | 200 | The error code. This is consistent with the HTTP status code. For more information, see Code description. |
Data | JSONObject | {"taskId": "AAAAA-BBBBB"} | The moderation result data. |
Message | String | OK | The response message for the request. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Example
Request sample
{
"service":"audio_multilingual_global",
"serviceParameters":"{\"cryptType\":\"SHA256\",\"seed\":\"abc***123\",\"callback\":\"https://aliyun.com/callback\",\"url\":\"http://aliyundoc.com/test.flv"}"
}Sample response
{
"code":200,
"data":{
"taskId":"AAAAA-BBBBB"
},
"message":"SUCCESS",
"requestId":"AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Query task result
After the moderation task is complete, the query result returns data for all voice segments.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | audio_multilingual_global | The type of moderation service. |
ServiceParameters | JSONString | Yes | The parameter set required by the moderation service. This is a JSON string. For a description of each string, see ServiceParameters. |
Table 2. ServiceParameters
Name | Type | Required | Example | Description |
taskId | String | Yes | AAAAA-BBBBB | The ID returned when the task was submitted. |
Return parameters
Name | Type | Example | Description |
Code | Integer | 200 | The error code. This is consistent with the HTTP status code. For more information, see Code description. |
Data | JSONObject | {"url":xxxx,"results":xxx} | The returned parameters in JSON format. |
Message | String | OK | The response message for the request. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 3. Data
Name | Type | Example | Description |
url | String | https://aliyundoc.com | The URL of the detected object. |
LiveId | String | liveId1**** | The ID of the live voice stream (optional). |
DataId | String | voice20240307*** | The data ID corresponding to the detected object (optional). |
RiskLevel | String | high | The risk level of the voice, calculated based on all voice segments. Return values include the following:
Note High-risk content should be handled directly. Medium-risk content should be manually reviewed. Low-risk content should be handled only when high recall is required. Otherwise, treat it the same as content with no risk detected. |
sliceDetails | JSONArray | The detailed results for the voice segments. For more information, see sliceDetails. |
Table 4. sliceDetails
Name | Type | Example | Description |
startTime | Integer | 0 | The start time of the sentence, in seconds. |
endTime | Integer | 4065 | The end time of the sentence, in seconds. |
startTimestamp | Integer | 1678854649720 | The start timestamp of the segment, in milliseconds. |
endTimestamp | Integer | 1678854649720 | The end timestamp of the segment, in milliseconds. |
text | String | disgusting | The text converted from the voice. |
url | String | https://aliyundoc.com | A temporary URL for the voice segment. The URL is valid for 30 minutes. Store it promptly. |
labels | String | pullinTraffic | The tags, separated by commas (,). Includes:
|
RiskLevel | String | high | The risk level of the voice segment. Return values include the following:
|
riskWords | String | AAA,BBB,CCC | The risk words that were hit, separated by commas. |
riskTips | String | sexuality_Suggestive | The sub-tags, separated by commas. |
extend | String | {\"riskTips\":\"sexuality_Suggestive\",\"riskWords\":\"pxxxxy\"} | A reserved field. |
Example
Request sample
{
"service":"audio_multilingual_global",
"serviceParameters":"{\"taskId\":\"AAAAA-BBBBB"}"
}Sample response
{
"Code": 200,
"Data": {
"DataId": "voice20240307***",
"LiveId": "liveId1****",
"RiskLevel": "high",
"SliceDetails": [
{
"EndTime": 4065,
"Labels": "political_content,xxxx",
"RiskLevel": "high",
"RiskTips": "contraband_ProhibitedGoods",
"RiskWords": "Risk Word A",
"StartTime": 0,
"Text": "Content Moderation product test case",
"Url": "https://aliyundoc.com"
}
]
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Callback message format
The callback message data is in JSON format, as shown below:
Field name | Field type | Description |
checksum | String | The checksum. A string generated by concatenating The user UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud Management Console. For tamper-proofing, you can generate a string using the same algorithm when you receive the pushed result and verify it against the checksum. Note The user UID must be the UID of your Alibaba Cloud account, not the UID of a RAM user. |
taskId | String | The task ID of the callback message. |
content | String | The serialized moderation result. This is a JSON string. Parse it into a JSON object. The format of the content result is the same as the response for querying a task result. For more information, see Return parameters. |
Code description
The following are the descriptions of the codes returned by the operation. Fees are incurred only for requests that return a code of 200.
Code | Description |
200 | The request is successful. |
280 | Verifying. |
400 | The request parameters are empty. |
401 | The request parameters are invalid. |
402 | The length of a request parameter does not meet the requirements. Check and modify the parameter. |
403 | The request exceeds the QPS limit. Check and adjust the QPS limit. |
404 | An error occurred while downloading the specified file. Check the file or retry. |
405 | The download of the specified file timed out. The file may be inaccessible. Check the file and retry. |
406 | The specified file exceeds the size limit. Check the file and retry. |
407 | The format of the specified file is not supported. Check the file and retry. |
408 | The account does not have permission to call this operation. The account may not have the service activated, may have an overdue payment, or may not be authorized to access the service. |
480 | The number of concurrent streams exceeds the limit. Check and adjust the concurrency. |
500 | A system error occurred. |