Large model speech moderation - AI Guardrails - Alibaba Cloud Documentation Center

This topic introduces the voice moderation service, which uses the Tongyi large model to moderate audio files and live audio streams.

1. Service

Voice Moderation (Large Model Edition) is powered by the Qwen Moderation Large Model, trained specifically to identify content risks in audio. It provides the following voice moderation services:

Audio and Video Media Detection_LLM Edition:
This service moderates audio files. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.
Social & Entertainment Live Stream Detection_LLM Version:
This service moderates live audio streams. It uses the Tongyi Large Model to deliver enhanced speech recognition and improved risk detection. It supports 26 languages, including Chinese, English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, and Swedish.

Service selection

Service	Description	Supported regions	Typical scenarios
Audio and Video Media Detection_LLM Edition (audio_detection_byllm_global)	Powered by the Tongyi Large Model. Delivers superior performance, making it ideal for scenarios requiring high-accuracy audio file moderation.	Singapore	Ideal for various audio and video media, such as general videos and audiobooks. Uses Qwen ASR to better recognize multiple Chinese dialects (such as Cantonese and Sichuanese) and various other languages (such as English, German, Korean, Japanese, Russian, French, Portuguese, and Vietnamese). Uses a text moderation large model to better identify content such as ideological themes, political sentiment, and metaphors.
Social & Entertainment Live Stream Detection_LLM Version (live_detection_byllm_global)	Powered by the Tongyi Large Model. Delivers superior performance, making it ideal for scenarios requiring high-accuracy live audio stream moderation.	Singapore	Ideal for social live streaming scenarios, which typically feature a single speaker and often have background music. Uses a large model to improve accuracy in complex audio environments, reducing interference from dialects, accents, and background noise. Uses a text moderation large model to focus on identifying risks such as pornographic content, abusive language, innuendo, and political sentiment. It can also detect periods of silence in the live stream.

Billing

The large model services for Voice Moderation V2.0 support two billing methods: pay-as-you-go and .

Pay-as-you-go

After you activate the Voice Moderation V2.0 service, pay-as-you-go is the default billing method. You are billed daily for your actual usage. You incur no fees if you do not call the service.

Moderation type	Service	Unit price
Voice Moderation Advanced (audio_advanced)	Audio and Video Media Detection_LLM Edition: audio_detection_byllm_global Social & Entertainment Live Stream Detection_LLM Version: live_detection_byllm_global	USD 18 per 1,000 minutes Note We bill based on the total duration of audio processed. For example, if you use the Audio and Video Media Detection_LLM Edition service to process 100 minutes of audio, we charge you USD 1.80.

Note

The pay-as-you-go billing frequency for Content Moderation V2.0 is once every 24 hours. In the billing details, moderationType corresponds to the moderation type field. You can view the billing details.

Get started

Step 1: Activate the service

Go to the activation page to activate the Voice Moderation 2.0 service.

After you activate the Voice Moderation 2.0 service, the default billing method is pay-as-you-go. You are billed based on your actual usage, and you incur no fees if you do not use the service. After you integrate and use the API, the system automatically generates bills based on your usage. For more information, see Billing. You can also purchase a resource package. Resource packages offer tiered discounts compared with the pay-as-you-go method and are suitable for users with predictable or high usage.

Step 2: Grant permissions to a RAM user

Before you access the SDK or API, you must grant permissions to a RAM user. To authenticate API calls, use an AccessKey from your Alibaba Cloud account or a RAM user. For more information about how to obtain an AccessKey, see Obtain an AccessKey.

Log in to the RAM console with your Alibaba Cloud account or as a RAM administrator.
Create a RAM user. For more information, see Create a RAM user.
Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. For more information, see Grant permissions to a RAM user. You can then use the RAM user to call Content Moderation APIs.

Step 3: Install and integrate the SDK

See the Voice Moderation 2.0 SDK and Integration Guide for installation and integration instructions. The service is available in the following regions:

Region	Public endpoint	VPC endpoint	Services
Singapore	green-cip.ap-southeast-1.aliyuncs.com	green-cip-vpc.ap-southeast-1.aliyuncs.com	audio_detection_byllm_global, live_detection_byllm_global

Step 4: Adjust moderation rules (Optional)

In the Content Moderation console, you can adjust detection rules for the Voice Moderation large model, manage text moderation rules, replicate a service, configure a custom dictionary, query detection records, and check usage.

API reference

Usage notes

You can call this API to create audio content moderation tasks. For instructions on how to construct an HTTP request, see Making native HTTP calls. You can also use an SDK to call the API. For more information, see the SDK and integration guide for audio moderation (enhanced edition) V2.0.

API operations:
- Submit a moderation task: VoiceModeration
- Query a moderation task: VoiceModerationResult
- Cancel a moderation task: VoiceModerationCancel
Billing:
This is a paid API. You are charged only for requests that return an HTTP status code of 200. You are not charged for requests that result in other error codes. For more information, see Billing.

Service performance:

Service performance	Description
Audio file size	The audio moderation (enhanced edition) supports audio files up to 500 MB.
Audio and video file formats	Supported audio file formats: MP3, WAV, AAC, WMA, OGG, M4A, and AMR. Supported video file formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, RMVB, and RM.
Live audio stream	Supported protocols: RTMP, HLS, HTTP-FLV, and RTSP.
QPS	The QPS limit for task submission is 100.
Concurrent streams	The enhanced edition supports 50 concurrent streams by default.

Submit a moderation task

Request parameters

Parameter

Type

Required

Example value

Description

Service

String

Yes

audio_detection_byllm_global

The moderation service to use. Valid values:

audio_detection_byllm_global: Audio and video moderation (LLM-based)
live_detection_byllm_global: Live stream moderation for social entertainment (LLM-based)

ServiceParameters

JSONString

Yes

The parameters required by the moderation service, formatted as a JSON string. For a description of each parameter, see ServiceParameters.

Table 1. ServiceParameters

Parameter	Type	Required	Example value	Description
url	String	Yes. You must provide the file by using one of the following three methods: Provide the URL of the file in the `url` parameter. Provide the details of an OSS object in the `ossBucketName`, `ossObjectName`, and `ossRegionId` parameters. Upload a local audio file. This method does not consume your OSS storage and the file is temporarily stored for 30 minutes. The SDK has a built-in feature for local file uploads. For code examples, see Audio Moderation Enhancement V2.0 SDK and Integration Guide.	http://aliyundoc.com/test.mp3	The URL of the object to moderate. Supports public HTTP and HTTPS URLs.
ossBucketName	String		bucket_01	The name of the authorized OSS `bucket`. Note To use an internal OSS URL, you must first use your Alibaba Cloud account (root account) to grant access on the Cloud Resource Access Authorization page.
ossObjectName	String		20240307/07/28/test.mp3	The name of the object in the authorized OSS `bucket`.
ossRegionId	String		cn-shanghai	The `region` where the OSS `bucket` is located.
callback	String	No	http://aliyundoc.com	The `callback` URL where moderation results are sent. HTTP and HTTPS are supported. If you omit this parameter, you must use `polling` to retrieve the results. The `callback` endpoint must support the POST method, UTF-8 encoded data, and the form parameters checksum and content. `Content Security` sets the checksum and content parameters and sends the results to your `callback` endpoint as follows. checksum: A string that is generated by applying the SHA256 algorithm to the concatenated string of `UID + seed + content`. The UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud console. To prevent tampering, you can verify the push result by generating a string with the same algorithm and comparing it to the received checksum. Note The user UID must be the ID of your Alibaba Cloud account, not the ID of a RAM user. content: A string in JSON format. You must parse the string to convert it back into a JSON object. For an example of the content result, see the response example for querying detection results. Note Your callback endpoint must return an HTTP `status code` of 200 after receiving a notification from Content Security to confirm receipt. Any other status code is considered a failure. If receipt fails, Content Security retries sending the notification up to 16 times. If all 16 retries fail, Content Security makes no further attempts. We recommend that you check the status of your `callback` endpoint.
seed	String	No	abc****	A random string used in the `signature` for the `callback` request. It can consist of letters, digits, and underscores (_), with a maximum length of 64 characters. You can customize this value to verify that `callback` notifications originate from `Content Security`. Note This parameter is required when the `callback` parameter is specified.
cryptType	String	No	SHA256	When using a `callback`, this parameter specifies the algorithm for generating the `signature` of the callback content. `Content Security` calculates a `signature` for the result (a concatenated string of user UID + seed + content) by using the specified algorithm and sends it to your `callback` URL. Valid values: SHA256 (default): Uses the SHA256 encryption algorithm. SM3: Uses the HMAC-`SM3` algorithm. The result is a hexadecimal string that consists of lowercase letters and digits. For example, `abc` encrypted with `SM3` returns `66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0`.
liveId	String	No	liveId1****	The ID of the live audio stream. This parameter prevents repeated moderation of the same live stream through deduplication. If provided, `Content Security` checks for an active moderation task by using the composite key `uid+service+liveId`. If an active task exists, the service returns its `taskId` instead of creating a new task.
dataId	String	No	voice20240307***	A custom data ID for the object being moderated. This ID uniquely identifies your business data. The ID can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The maximum length is 64 characters.
referer	String	No	www.aliyun.com	The `Referer` request header. Used for purposes such as `hotlink protection`. The maximum length is 256 characters.

Response parameters

Parameter	Type	Example value	Description
Code	Integer	200	The `status code`. For more information, see Status Codes.
Data	JSONObject	{"TaskId": "AAAAA-BBBBB","DataId": "voice20240307***"}	The response data.
Message	String	OK	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The `request ID`.

Examples

Request example

{
    "Service": "audio_detection_byllm_global",
    "ServiceParameters": {
        "cryptType": "SHA256",
        "seed": "abc***123",
        "callback": "https://aliyun.com/callback",
        "url": "http://aliyundoc.com/test.mp3"
    }
}

Successful response example

{
    "Code": 200,
    "Data": {
        "TaskId": "AAAAA-BBBBB",
        "DataId": "voice20240307***"
    },
    "Message": "SUCCESS",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Query task results

When a live stream moderation task is in progress, a query returns the N most recent audio slices. After the task is complete, the query returns all audio slices.

To query a moderation task, call the VoiceModerationResult operation.
Billing: This API operation is not billed.
Query timeout: Query the results 30 seconds after submitting an asynchronous moderation task. The service stores results for up to 24 hours before automatically deleting them.

QPS limit

This operation has a QPS limit of 100 for each user. API calls exceeding this limit are throttled, which may affect your business. We recommend calling this operation at a reasonable rate.

Request parameters

Parameter	Type	Required	Example	Description
Service	String	Yes	audio_detection_byllm	The type of moderation service.
ServiceParameters	JSONString	Yes		The required service parameters, provided as a JSON string. For details, see ServiceParameters.

Table 2. ServiceParameters

Parameter	Type	Required	Example	Description
taskId	String	Yes	AAAAA-BBBBB	The ID of the task, returned upon task submission.

Response parameters

Parameter	Type	Example	Description
Code	Integer	200	The status code. For more information, see Status codes.
Data	JSONObject		The results of the audio content moderation. For more information, see Data.
Message	String	OK	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The request ID.

Table 3. Data

Parameter	Type	Example	Description
Url	String	https://aliyundoc.com/text.mp3	The URL of the moderated object.
LiveId	String	liveId1****	The ID of the live audio stream (optional).
DataId	String	voice20240307***	The data ID of the moderated object (optional).
RiskLevel	String	high	The combined risk level of all audio slices. Valid values are: high: high-risk medium: medium-risk low: low-risk none: no risk detected Note Address high-risk content immediately and manually review medium-risk content. For standard use cases, low-risk content can be treated as safe. Process low-risk content only if a high recall rate is required.
SliceDetails	JSONArray		An array of objects with detailed results for each audio slice. For more information, see sliceDetails.

Table 4. SliceDetails

Parameter	Type	Example	Description
StartTime	Integer	0	The start time of the sentence, in seconds.
EndTime	Integer	4065	The end time of the sentence, in seconds.
StartTimestamp	Integer	1678854649720	The start timestamp of the audio slice, in milliseconds.
EndTimestamp	Integer	1678854649720	The end timestamp of the audio slice, in milliseconds.
Text	String	Disgusting	The text transcribed from the audio.
Url	String	https://aliyundoc.com	If the moderated content is from a live audio stream, this parameter provides a temporary URL to the corresponding audio segment. The URL is valid for 30 minutes. Save the content promptly if needed.
RiskLevel	String	high	The risk level of the audio slice. Valid values are: high: high-risk medium: medium-risk low: low-risk none: no risk detected
Result	JSONArray		An array of objects that provides the moderation results. For more information, see Result.

Table 5. Result

Parameter	Type	Example	Description
Label	String	political_entity	The moderation label assigned to the content. The service may detect multiple labels. This includes labels from the text policy and labels specific to audio: For text policy labels, see Risk Labels. Audio-specific labels include: specified_nontalk: silent audio specified_speaking: specified speaking sexual_sounds: sexual sounds Note By default, the detection of specified speaking and sexual sounds is disabled. To enable this feature, contact your account manager.
Description	String	Suspected pornographic content	A description of the `Label`. Important This field provides an explanation of the `Label` and is subject to change. When processing results, use the `Label` value instead of this description.
Confidence	Float	81.22	The confidence score, a value from 0 to 100 with up to two decimal places. Some labels do not have a confidence score.
RiskLevel	String	high	The risk level of the current label. Valid values are: high: high-risk medium: medium-risk low: low-risk none: no risk detected
Riskwords	String	AA,BB,CC	Detected risk words, separated by commas. This parameter is not returned for all labels.
CustomizedHit	JSONArray	[{"LibName":"...","Keywords":"..."}]	If a keyword from a custom library is detected, the `Label` is `customized`. This parameter returns the name of the custom library and the detected keywords. For more information, see CustomizedHit.
RiskPositions	JSONArray		The positions of the detected risk words. For more information, see RiskPositions.

Table 6. CustomizedHit

Parameter	Type	Example	Description
LibName	String	CustomLibrary1	The name of the custom library.
Keywords	String	CustomKeyword1,CustomKeyword2	Detected custom keywords, separated by commas.

Table 7. RiskPositions

Parameter	Type	Example	Description
RiskWord	String	AA	The detected risk word.
StartPos	Integer	10	The start position of the risk word.
EndPos	Integer	12	The end position of the risk word.

Examples

Request example

{
    "Service": "audio_detection_byllm",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}

Successful response example

{
    "Code": 200,
    "Data": {
        "DataId": "voice20240307***",
        "LiveId": "liveId1****",
        "RiskLevel": "high",
        "SliceDetails": [
            {
                "StartTime": 0,
                "EndTime": 4065,
                "RiskLevel": "high",
                "Result": [
                    {
                        "Label": "political_entity",
                        "Description": "Suspected political entity",
                        "Confidence": 100.0,
                        "RiskLevel": "high",
                        "RiskWords": "WordA,WordB",
                        "RiskPositions": [
                            {
                                "EndPos": 14,
                                "RiskWord": "WordA",
                                "StartPos": 16
                            }
                        ]
                    }
                ],
                "Text": "Content Moderation product test case",
                "Url": "https://aliyundoc.com"
            }
        ]
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Cancel a moderation task

You can only cancel moderation tasks for live streams. Tasks for file types cannot be canceled.

To cancel a moderation task, call the VoiceModerationCancel operation.
Billing: This operation is free of charge.

Request parameters

Parameter	Type	Required	Example value	Description
Service	String	Yes	live_detection_byllm_global	Review the service type.
ServiceParameters	JSONString	Yes		The parameters for the moderation service, formatted as a JSON string. For details about each field, see ServiceParameters.

Table 5. ServiceParameters

Parameter	Type	Required	Example value	Description
taskId	String	Yes	AAAAA-BBBBB	The ID of the moderation task to cancel.

Response parameters

Parameter	Type	Example value	Description
Code	Integer	200	The response status code. For more information, see Code description.
Message	String	OK	The response message.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The unique ID of the request.

Examples

Request example

{
    "Service": "live_detection_byllm_global",
    "ServiceParameters": {
        "taskId": "AAAAA-BBBBB"
    }
}

Successful response example

{
    "Code": 200,
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Callback message format

A callback message is a JSON object with the following fields:

Parameter	Type	Description
checksum	String	A checksum generated by applying the SHA-256 algorithm to the concatenated string of `uid + seed + content`. The UID is your Alibaba Cloud account ID, which you can find in the Alibaba Cloud console. To prevent tampering, verify the message integrity upon receipt by generating a checksum with the same algorithm and comparing it with the received `checksum` value. Note The UID must be for your Alibaba Cloud account, not for a RAM user.
taskId	String	The ID of the associated task.
content	String	A JSON-formatted string containing the serialized detection results. You must parse this string to retrieve the JSON object. The format is identical to the response from a task result query. For more information, see Response Parameters.

Codes

This section describes the codes that the API returns. You are only billed for successful requests, which are indicated by a code of 200.

Code	Description
200	The request succeeded.
280	Detection is in progress.
400	A required request parameter is missing or empty.
401	A request parameter is invalid. Check the parameter value and try again.
402	A request parameter value exceeds the length limit. Check the parameter and try again.
403	The request rate exceeds the QPS limit. Check and adjust your QPS limit.
404	An error occurred while downloading the input file. Check the file URL and try again.
405	The file download timed out. This may be because the file is inaccessible. Check the file URL and your network settings, then try again.
406	The input file is too large. Use a smaller file and try again.
407	The input file format is not supported. Use a supported format and try again.
408	The account is not authorized to call this API. This can occur if the service has not been activated, the account has an overdue payment, or the account lacks the necessary permissions.
480	The number of concurrent requests exceeds the limit. Reduce your concurrency and try again.
500	A system error occurred.