Video File Moderation Enhanced API - Content Moderation - Alibaba Cloud Documentation Center

Video File Moderation 2.0 helps you detect risky or prohibited content in video files. This topic describes how to use API operations to perform video file moderation and detect AI-generated content (AIGC).

Integration guide

Register an Alibaba Cloud account and follow the instructions to complete the registration: Register now.
Activate the pay-as-you-go billing method for Content Moderation. For more information, see Activate the service. Activation is free. After you use the API operations, you are automatically billed based on your usage. For more information, see Billing.
Create an AccessKey using Resource Access Management (RAM). For more information, see Create an AccessKey. If you use the AccessKey of a RAM user, you must use your Alibaba Cloud account to grant the AliyunYundunGreenWebFullAccess permission to the RAM user. For more information, see RAM authorization.
Develop and integrate the service. We recommend that you use an SDK to call the API operations. For more information, see Video Moderation 2.0 SDK and integration guide.
The video file moderation service includes the following two API operations:
- VideoModeration: submits a video file moderation task.
- VideoModerationResult: retrieves the result of a video file moderation task.

Submit a moderation task

API description

API operation: VideoModeration. This operation supports only asynchronous detection for videos.

Supported regions and endpoints:

Region	Public endpoint	VPC endpoint	Supported services
Singapore	green-cip.ap-southeast-1.aliyuncs.com	green-cip-vpc.ap-southeast-1.aliyuncs.com	videoDetection_global, videoDetectionByVL_global
US (Virginia)	https://green-cip.us-east-1.aliyuncs.com	https://green-cip-vpc.us-east-1.aliyuncs.com	videoDetection_global
US (Silicon Valley)	https://green-cip.us-west-1.aliyuncs.com	Not available
Germany (Frankfurt)	green-cip.eu-central-1.aliyuncs.com	Not available

Billing information:
This API operation is billable. You are charged based on the video frame and audio detection policies that you set. For video frames, you can select multiple services. You are charged based on the number of frames multiplied by the unit price of each service. If you also detect violations in the audio content, an additional fee is charged based on the video duration multiplied by the unit price of the audio violation feature. For more information about billing methods, see Billing.
Detection objects: Video files.
Return values: Asynchronous detection tasks do not return detection results in real time. You must retrieve detection results using callbacks or polling. Detection results are retained for up to 24 hours.
- Retrieve detection results using callbacks: When you submit an asynchronous detection task, include the callback parameter in the request to automatically receive detection results.
- Retrieve detection results using polling: When you submit an asynchronous detection task, you do not need to include the callback parameter. After you submit the task, call the result query API operation to retrieve the detection results.
Video requirements:
- Video file URLs support the HTTP and HTTPS protocols.
- The following video file formats are supported: AVI, FLV, MP4, MPG, ASF, WMV, MOV, WMA, RMVB, RM, FLASH, TS, and M3U8.
- Video size limit: By default, a single video cannot exceed 500 MB. If your video exceeds 500 MB, you can split the video into segments. You can also contact your account manager to increase the size limit.
- The time required for video file detection depends on the download time of the video. Ensure that the storage service where the video file is located is stable and reliable. We recommend that you use Alibaba Cloud Object Storage Service (OSS) to store video files.
Detection rule configuration:
- The first time you call this operation, you must configure video moderation rules in the Content Moderation console.
  Note
  In the console, you can configure settings such as the snapshot method, snapshot frequency, image moderation rules, audio moderation rules, and the scope of the results to return. For more information, see Console User Guide.

If you do not configure any settings, the default configurations for the Video Moderation 2.0 API are as follows:

Service	Default configurations
Video file detection (videoDetection_global)	Video snapshot frequency: 1 frame per second Video frame detection service: General baseline detection (baselineCheck_global) Video audio detection: Enabled Video audio detection service: Multilingual audio and video media detection (audio_multilingual_global) Result return method: Returns only results with detected threats
Video file detection (outside China) (videoDetectionByVL_global) Note Currently active only in the Singapore region. The large model version is limited to 10 concurrent ingest endpoints.	Video snapshot frequency: 1 frame per second Video frame detection service: Image moderation service with large and small model fusion (postImageCheckByVL_global) Video audio detection: Enabled Video audio detection service: Multilingual audio and video media detection (audio_multilingual_global) Result return method: Returns only results with detected threats

QPS limits

The queries per second (QPS) limit for this API operation is 100 for a single user, and the concurrent moderation task limit is 50. (This means that only 50 tasks can be processed at the same time. To increase the concurrent task limit, contact your account manager.) If the limits are exceeded, throttling is triggered. This may affect your business. We recommend that you note these limits when you call this operation.

Debugging

Before integration, you can use Alibaba Cloud OpenAPI to debug the VideoModeration API online, view sample code and SDK dependency information, and review the API's usage and parameters.

Important

The online debugging feature calls the Content Moderation API using your current account. These calls are included in your billable usage.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

videoDetection_global

The type of moderation service. Options include the following:

videoDetection_global: Video file detection
videoDetectionByVL_global: Video file detection_Large model version

ServiceParameters

JSONString

Yes

The parameters required by the moderation service. This is a JSON string. For descriptions of each string, see Table 1 ServiceParameters.

Table 1 ServiceParameters

Name	Type	Required	Example	Description
url	String	Yes. Enhanced Video Moderation supports three methods to provide video files. Choose one of the following methods: Use a video URL for moderation. Use OSS authorization for moderation. You must specify ossBucketName, ossObjectName, and ossRegionId. Use a local video for moderation. Uploading a local video does not use your OSS storage. The file is stored for only 30 minutes. The software development kit (SDK) includes the local video upload feature. For code examples, see Enhanced Video Moderation 2.0 SDK and access guide.	http://www.aliyundoc.com/a.flv	The URL of the object that you want to moderate. Make sure the URL is accessible over the public network, or provide an OSS internal network address in the same region. Note The URL cannot contain Chinese characters. The URL can be up to 2,048 characters in length. Make sure to provide only one URL per request.
ossBucketName	String		bucket_01	The name of the authorized OSS bucket. Note Before you use an internal OSS address for a video, use your Alibaba Cloud account to go to the Cloud Resource Access Authorization page and grant the required permissions.
ossObjectName	String		20240307/07/28/test.flv	The name of the file in the authorized OSS bucket.
ossRegionId	String		cn-shanghai	The region where the OSS bucket is located.
callback	String	No	http://www.aliyundoc.com	The URL that receives notifications about the moderation results. The URL can use the HTTP or HTTPS protocol. If you leave this parameter empty, you must periodically poll for the moderation results. The callback endpoint must support the POST method, UTF-8 encoded data, and the form parameters checksum and content. Content Moderation sets the checksum and content parameters based on the following rules and formats, and then calls your callback endpoint to return the moderation results. checksum: A string that is generated using the SHA256 algorithm on a concatenated string of `user UID + seed + content`. The user UID is your Alibaba Cloud account ID. You can find your account ID in the Alibaba Cloud Management Console. For tamper-proofing, you can generate a string using the same algorithm when you receive the result and compare it with the checksum value. Note The user UID must be the UID of an Alibaba Cloud account, not the UID of a Resource Access Management (RAM) user. content: A JSON string. Parse this string to a JSON object. For an example of the content result, see the response examples for querying moderation results. Note After your server-side callback endpoint receives the result from Content Moderation, an HTTP status code of 200 indicates that the result is received. Any other HTTP status code indicates a failure. If the receipt fails, Content Moderation retries to send the result up to 16 times. If the receipt still fails after 16 retries, Content Moderation stops sending the result. Check the status of your callback endpoint.
seed	String	No	abc****	A random string. This value is used for the signature in the callback notification request. The string can contain letters, digits, and underscores (_), and can be up to 64 characters in length. You can customize this value to verify that the callback notification is sent by Content Moderation. Note If you use the callback parameter, you must specify this parameter.
cryptType	String	No	SHA256	If you use callback notifications, set the algorithm to sign the notification content. Content Moderation calculates a signature for the result string (user UID + seed + content) based on the specified encryption algorithm. Then, Content Moderation sends the signature to your callback URL. Valid values: SHA256 (default): The SHA256 encryption algorithm. SM3: The SM3 encryption algorithm. This algorithm returns a hexadecimal string that consists of lowercase letters and digits. For example, if you encrypt abc using the SM3 algorithm, 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0 is returned.
dataId	String	No	videoId****	The data ID of the object that you want to moderate. The ID can contain uppercase letters, lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The ID can be up to 128 characters in length. You can use this ID to uniquely identify your business data.
offline	String	No	false	Specifies whether to use the offline moderation mode. false (default): The real-time moderation mode. Moderation requests that exceed the concurrency limit are rejected. true: The offline moderation mode. Submitted tasks are not guaranteed to be processed in real time. The tasks are queued for processing and the moderation starts within 24 hours. Important This parameter is of the String type. The offline moderation mode is supported in the China (Beijing), China (Shanghai), and China (Hangzhou) regions.
referer	String	No	www.aliyun.com	The Referer request header. This parameter is used for scenarios such as hotlink protection. The value can be up to 256 characters in length.

Note

When your server-side callback interface receives results from Content Moderation, an HTTP status code of 200 indicates that the results were successfully accepted. Any other HTTP status code indicates a failure. If acceptance fails, Content Moderation retries sending the detection results up to 16 times. If the results are still not accepted after 16 retries, Content Moderation stops sending them. You must check the status of your callback interface.

Response parameters

Name		Type	Example	Description
Code		Integer	200	The status code. For more information, see Code description.
Data		JSONObject		The moderation result data.
	TaskId	String	AAAAA-BBBBB	The ID of the detection task.
	DataId	String	dataId0307	The data ID.
Message		String	OK	The response message for the request.
RequestId		String	ABCD1234-1234-1234-1234-123****	The request ID.

Examples

Query example

{
    "Service": "videoDetection_global",
    "ServiceParameters": {
        "url": "http://www.aliyundoc.com/a.flv",
        "dataId": "videoId****"
    }
}

Sample success responses

{
    "Message": "OK",
    "Code": 200,
    "Data": {
        "TaskId": "AAAAA-BBBBB",
        "DataId": "videoId****"
    },
    "RequestId": "ABCD1234-1234-1234-1234-123****"
}

Obtain video file moderation task results

API description

API operation: VideoModerationResult. This operation retrieves the results of a video file moderation task.

Billing information: This API operation is not billable.
Query timeout: The recommended query interval is 30 seconds. This means you should query the results 30 seconds after you submit the asynchronous detection task. The results are automatically deleted after 24 hours.

QPS limits

The QPS limit for this operation is 100 calls per second per account. If the number of calls per second exceeds this limit, throttling is triggered. This may affect your business. We recommend that you note this limit when you call this operation.

Debugging

Before integration, you can use Alibaba Cloud OpenAPI to debug the VideoModerationResult API online, view sample code and SDK dependency information, and review how to use the API and its parameters.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

videoDetection_global

The moderation service type. This must be the same as the service type used to submit the moderation job.

ServiceParameters

JSONString

Yes

The parameters required by the moderation service. The value is a JSON-formatted string. For a description of each parameter, see Table 1 ServiceParameters.

Table 1 ServiceParameters

Name	Type	Required	Example	Description
taskId	string	Yes	abcd****	The taskId of the detection task to be queried. Each request supports inputting one taskId. Note After submitting a detection task, you can obtain the detection task's taskId from the returned data.

Response parameters

Name	Type	Example	Description
RequestId	String	ABCD1234-1234-1234-1234-123****	The ID of this request, which is a unique identifier generated by Alibaba Cloud for the request and can be used for troubleshooting and locating issues.
Data	Object		Video content detection results. For more information, see Table 2 Data.
Code	String	200	Status code. For more information, see Code description.
Message	String	OK	Response message for this request.

Table 2. Data

Name	Type	Example	Description
DataId	String	videoId****	The data ID of the detected object. Note If the DataId parameter is specified in the request, its value is returned here.
TaskId	String	AAAAA-BBBBB-2024-0307	The ID of the detection task.
RiskLevel	String	high	The risk level of the video, determined by a comprehensive analysis of its frames and audio. Valid values: high: High risk medium: Medium risk low: Low risk none: No risk detected Note Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when a high recall rate is required. Otherwise, treat it as content with no detected risk. Configure video frame risk scores in the Content Moderation console.
FrameResult	JSONObject		The result of the video frame detection. If the call is successful (code=200), the response contains a struct. For more information about the struct, see Table 3 FrameResult. Note In a video stream detection scenario, a return code of 280 indicates that the detection is in progress, and 200 indicates that the detection is complete. If the detection is in progress, the detection result includes the data from the start of the detection to the current time.
AudioResult	JSONObject		The result of the video audio detection. The response contains a struct. For more information about the struct, see Table 8 audioResult.

Table 3 FrameResult

Name	Type	Example	Description
FrameNum	Integer	200	The number of video frames returned.
FrameSummarys	JSONArray		A summary of labels for the video frames. For a description of the structure, see Table 4 FrameSummary.
RiskLevel	String	high	The risk level of the video frames. This is calculated based on all video frames. The return value can be one of the following: high: High risk medium: Medium risk low: Low risk none: No risk detected
Frames	JSONArray		Information about video frames that contain hit labels. For a description of the structure, see Table 5 Frame.

Table 4 FrameSummary

Name	Type	Example	Description
Label	String	violent_armedForces	Video snapshot label.
Description	String	Suspected to contain firework-type content elements	Description of the Label field. Important This field explains the Label field and is subject to change. Process results based on the Label field, not this field.
LabelSum	Integer	8	Number of times the label appears.

Table 5 Frame

Name	Type	Example	Description
TempUrl	String	http://www.aliyundoc.com/test.jpg	Temporary URL of the video frame. Valid for 30 minutes. Note If video evidence storage is enabled, the OSS URL of the stored video frame is returned.
Offset	Float	50.5	Timestamp of the video frame from the beginning of the video, in seconds.
RiskLevel	String	high	The risk level of the video frame. The value is based on the configured risk score thresholds. Valid values include the following: high: High risk medium: Medium risk low: Low risk none: No risk detected Note Handle high-risk content immediately. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no detected risk. Configure video frame risk scores in the rule configuration of the Content Moderation console.
Results	JSONArray		The detection results for the video frame, including risk labels, confidence scores, and other parameters. For more information, see Table 6 Results.

Table 6 Results

Name	Type	Example	Description
Service	String	baselineCheck_global	The video screen service that was called.
Result	Array		Results of video snapshot detection, including risk labels, confidence scores, and other parameters. For more information, see Table 7 Result.

Table 7 Result

Name	Type	Example	Description
Label	String	violent_explosion	The label returned after a video snapshot is processed. A single snapshot can return multiple labels and scores. The supported labels are as follows: General baseline detection (baselineCheck_global) supported labels
Confidence	Float	81.22	The confidence score. The value ranges from 0 to 100 and is accurate to two decimal places.
Description	String	Suspected to contain firework-related content elements	A description of the Label field. Important This field explains the Label field and is subject to change. Process results based on the Label field, not this field.

Table 8 audioResult

Name	Type	Example value	Description
AudioSummarys	JSONArray		A summary of audio labels. For a description of the structure, see Table 9 AudioSummarys.
RiskLevel	String	high	The risk level of the audio. The value is calculated based on all audio segments. The following values can be returned: high: High risk medium: Medium risk low: Low risk none: No risk detected
SliceDetails	JSONArray		An array that contains the details of the text corresponding to the audio. Each element in the array represents a sentence. For a description of the structure, see Table 10 SliceDetails.

Table 9 AudioSummarys

Name	Type	Example	Description
Label	String	profanity	The speech label in the video.
LabelSum	Integer	8	The number of times the label appears.

Table 10 SliceDetails

Name	Type	Example	Description
StartTime	Integer	0	The start time of the sentence, in seconds.
EndTime	Integer	4065	The end time of the sentence, in seconds.
StartTimestamp	Integer	1678854649720	The start timestamp of the segment, in milliseconds.
EndTimestamp	Integer	1678854649720	The end timestamp of the segment, in milliseconds.
Text	String	Disgusting	The transcribed text from the audio.
Url	String	https://aliyundoc.com/test.wav	If the scanned content is an audio stream, this is the temporary URL for the corresponding audio segment. The URL is valid for 30 minutes. Save a copy of the audio before the URL expires.
Labels	String	political_content,xxxx	The details of the labels. Multiple labels are separated by commas. The labels include the following: ad: Advertisements violence: Violent and terrorist content political_content: Political content specified_speaking: Content containing specific voices specified_lyrics: Specific songs sexual_content: Pornographic content sexual_sounds: Sexually suggestive sounds contraband: Content related to contraband profanity: Abusive content religion: Religious content cyberbullying: Cyberbullying content negative_content: Inappropriate content nontalk: Silent audio C_customized: A match in a custom library
RiskLevel	String	high	The risk level of the audio or video segment. Valid values: high: High risk medium: Medium risk low: Low risk none: No risk detected
RiskWords	String	AAA,BBB,CCC	The matched risk words. Multiple words are separated by commas.
RiskTips	String	pornography_vulgar_words,pornography_description	The detailed labels. Multiple labels are separated by commas.
Extend	String	{"riskTips":"pornography_vulgar_words","riskWords":"sexual_services"}	A reserved field.

Examples

Query example

{
    "Service": "videoDetection_global",
    "ServiceParameters": {
        "taskId": "abcd****"
    }
}

Sample success responses

Video frame detection only

{
    "Code": 200,
    "RequestId": "25106421-XXXX-XXXX-XXXX-15DA5AAAC546",
    "Message": "success finished",
    "Data": {
        "DataId": "ABCDEF-TESTDATAID",
        "TaskId": "AAAAA-BBBBB-2024-0307-0728",
        "FrameResult": {
            "FrameNum": 2,
            "FrameSummarys": [
                {
                    "Label": "violent_explosion",
                    "LabelSum": 8
                },
                {
                    "Label": "sexual_cleavage",
                    "LabelSum": 5
                }
            ],
            "Frames": [
                {
                    "Offset": 1,
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Label": "nonLabel"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
                },
                {
                    "Offset": 2,
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Confidence": 1,
                                    "Label": "sexual_cleavage"
                                },
                                {
                                    "Confidence": 74.1,
                                    "Label": "violent_explosion"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
                }
            ]
        }
    }
}

Detects both video frames and speech

{
    "Code": 200,
    "RequestId": "25106421-XXXX-XXXX-XXXX-15DA5AAAC546",
    "Message": "success finished",
    "Data": {
        "DataId": "ABCDEF-TESTDATAID",
        "TaskId": "AAAAA-BBBBB-2024-0307-0728",
        "RiskLevel": "medium",
        "AudioResult": {
            "AudioSummarys": [
                {
                    "Label": "sexual_sounds",
                    "LabelSum": 3
                }
            ],
            "RiskLevel": "high",
            "SliceDetails": [
                {
                    "EndTime": 60,
                    "EndTimestamp": 1698912813192,
                    "Labels": "",
                    "RiskLevel": "none",
                    "StartTime": 30,
                    "StartTimestamp": 1698912783192,
                    "Text": "Content Moderation",
                    "Url": "http://abc.oss-ap-southeast-1.aliyuncs.com/test.wav"
                },
                {
                    "EndTime": 30,
                    "EndTimestamp": 1698912813192,
                    "Extend": "{\"customizedWords\":\"service\",\"customizedLibs\":\"test\"}",
                    "Labels": "C_customized",
                    "RiskLevel": "high",
                    "StartTime": 0,
                    "StartTimestamp": 1698912783192,
                    "Text": "Welcome to Alibaba Cloud Content Moderation service",
                    "Url": "http://abc.oss-ap-southeast-1.aliyuncs.com/test.wav"
                }
            ]
        },
        "FrameResult": {
            "FrameNum": 2,
            "FrameSummarys": [
                {
                    "Label": "violent_explosion",
                    "Description": "Suspected to contain firework-type content elements",
                    "LabelSum": 8
                },
                {
                    "Label": "sexual_cleavage",
                    "Description": "Suspected to contain body exposure or sexually suggestive content",
                    "LabelSum": 8
                }
            ],
            "RiskLevel": "medium",
            "Frames": [
                {
                    "Offset": 1,
                    "RiskLevel": "none",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Label": "nonLabel",
                                    "Description": "No risk detected"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
                },
                {
                    "Offset": 2,
                    "RiskLevel": "medium",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Confidence": 1,
                                    "Label": "sexual_cleavage",
                                    "Description": "Suspected to contain body exposure or sexually suggestive content"
                                },
                                {
                                    "Confidence": 74.1,
                                    "Label": "violent_explosion",
                                    "Description": "Suspected to contain firework-type content elements"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
                }
            ]
        }
    }
}

Code description

The following table describes the meanings of the Code values returned by the Video File Moderation 2.0 API. You are billed only for requests that return a Code value of 200 or 280. Requests that return other Code values are not billed.

Code	Description
200	The request is successful or the detection is complete.
280	Detection is in progress.
288	Queued for processing in nearline mode.
400	The request parameters are empty.
401	The request parameters are invalid.
402	The length of a request parameter does not meet the API requirements. Check and modify the parameter.
403	The number of requests per second (QPS) exceeds the limit. Reduce the number of concurrent requests.
404	An error occurred while downloading the video. Check the video URL and retry the request.
405	The video download timed out, possibly because the video is inaccessible. Check the video URL and retry the request.
406	The video file is too large. Reduce the video size and retry the request.
407	The video format is not supported. Use a supported format and retry the request.
408	The account does not have permission to call this API. This can occur if the service is not activated, the account has an overdue payment, or the account is not authorized.
409	The specified TaskId does not exist. The task result may have expired because it is more than 24 hours old.
480	The number of concurrent detection ingest endpoints exceeds the limit. Reduce the number of concurrent requests.
500	A system error occurred.