Video File Moderation 2.0 helps you detect risky or prohibited content in video files. This topic describes how to use API operations to perform video file moderation and detect AI-generated content (AIGC).
Integration guide
Register an Alibaba Cloud account and follow the instructions to complete the registration: Register now.
Activate the pay-as-you-go billing method for Content Moderation. For more information, see Activate the service. Activation is free. After you use the API operations, you are automatically billed based on your usage. For more information, see Billing.
Create an AccessKey using Resource Access Management (RAM). For more information, see Create an AccessKey. If you use the AccessKey of a RAM user, you must use your Alibaba Cloud account to grant the AliyunYundunGreenWebFullAccess permission to the RAM user. For more information, see RAM authorization.
Develop and integrate the service. We recommend that you use an SDK to call the API operations. For more information, see Video Moderation 2.0 SDK and integration guide.
The video file moderation service includes the following two API operations:
VideoModeration: submits a video file moderation task.
VideoModerationResult: retrieves the result of a video file moderation task.
Submit a moderation task
API description
API operation: VideoModeration. This operation supports only asynchronous detection for videos.
Supported regions and endpoints:
Region
Public endpoint
VPC endpoint
Supported services
Singapore
green-cip.ap-southeast-1.aliyuncs.com
green-cip-vpc.ap-southeast-1.aliyuncs.com
videoDetection_global, videoDetectionByVL_global
US (Virginia)
https://green-cip.us-east-1.aliyuncs.com
https://green-cip-vpc.us-east-1.aliyuncs.com
videoDetection_global
US (Silicon Valley)
https://green-cip.us-west-1.aliyuncs.com
Not available
Germany (Frankfurt)
green-cip.eu-central-1.aliyuncs.com
Not available
Billing information:
This API operation is billable. You are charged based on the video frame and audio detection policies that you set. For video frames, you can select multiple services. You are charged based on the number of frames multiplied by the unit price of each service. If you also detect violations in the audio content, an additional fee is charged based on the video duration multiplied by the unit price of the audio violation feature. For more information about billing methods, see Billing.
Detection objects: Video files.
Return values: Asynchronous detection tasks do not return detection results in real time. You must retrieve detection results using callbacks or polling. Detection results are retained for up to 24 hours.
Retrieve detection results using callbacks: When you submit an asynchronous detection task, include the callback parameter in the request to automatically receive detection results.
Retrieve detection results using polling: When you submit an asynchronous detection task, you do not need to include the callback parameter. After you submit the task, call the result query API operation to retrieve the detection results.
Video requirements:
Video file URLs support the HTTP and HTTPS protocols.
The following video file formats are supported: AVI, FLV, MP4, MPG, ASF, WMV, MOV, WMA, RMVB, RM, FLASH, TS, and M3U8.
Video size limit: By default, a single video cannot exceed 500 MB. If your video exceeds 500 MB, you can split the video into segments. You can also contact your account manager to increase the size limit.
The time required for video file detection depends on the download time of the video. Ensure that the storage service where the video file is located is stable and reliable. We recommend that you use Alibaba Cloud Object Storage Service (OSS) to store video files.
Detection rule configuration:
The first time you call this operation, you must configure video moderation rules in the Content Moderation console.
NoteIn the console, you can configure settings such as the snapshot method, snapshot frequency, image moderation rules, audio moderation rules, and the scope of the results to return. For more information, see Console User Guide.
If you do not configure any settings, the default configurations for the Video Moderation 2.0 API are as follows:
Service
Default configurations
Video file detection (videoDetection_global)
Video snapshot frequency: 1 frame per second
Video frame detection service: General baseline detection (baselineCheck_global)
Video audio detection: Enabled
Video audio detection service: Multilingual audio and video media detection (audio_multilingual_global)
Result return method: Returns only results with detected threats
Video file detection (outside China) (videoDetectionByVL_global)
NoteCurrently active only in the Singapore region. The large model version is limited to 10 concurrent ingest endpoints.
Video snapshot frequency: 1 frame per second
Video frame detection service: Image moderation service with large and small model fusion (postImageCheckByVL_global)
Video audio detection: Enabled
Video audio detection service: Multilingual audio and video media detection (audio_multilingual_global)
Result return method: Returns only results with detected threats
QPS limits
The queries per second (QPS) limit for this API operation is 100 for a single user, and the concurrent moderation task limit is 50. (This means that only 50 tasks can be processed at the same time. To increase the concurrent task limit, contact your account manager.) If the limits are exceeded, throttling is triggered. This may affect your business. We recommend that you note these limits when you call this operation.
Debugging
Before integration, you can use Alibaba Cloud OpenAPI to debug the VideoModeration API online, view sample code and SDK dependency information, and review the API's usage and parameters.
The online debugging feature calls the Content Moderation API using your current account. These calls are included in your billable usage.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | videoDetection_global | The type of moderation service. Options include the following:
|
ServiceParameters | JSONString | Yes | The parameters required by the moderation service. This is a JSON string. For descriptions of each string, see Table 1 ServiceParameters. |
Table 1 ServiceParameters
Name | Type | Required | Example | Description |
url | String | Yes. Enhanced Video Moderation supports three methods to provide video files. Choose one of the following methods:
| http://www.aliyundoc.com/a.flv | The URL of the object that you want to moderate. Make sure the URL is accessible over the public network, or provide an OSS internal network address in the same region. Note The URL cannot contain Chinese characters. The URL can be up to 2,048 characters in length. Make sure to provide only one URL per request. |
ossBucketName | String | bucket_01 | The name of the authorized OSS bucket. Note Before you use an internal OSS address for a video, use your Alibaba Cloud account to go to the Cloud Resource Access Authorization page and grant the required permissions. | |
ossObjectName | String | 20240307/07/28/test.flv | The name of the file in the authorized OSS bucket. | |
ossRegionId | String | cn-shanghai | The region where the OSS bucket is located. | |
callback | String | No | http://www.aliyundoc.com | The URL that receives notifications about the moderation results. The URL can use the HTTP or HTTPS protocol. If you leave this parameter empty, you must periodically poll for the moderation results. The callback endpoint must support the POST method, UTF-8 encoded data, and the form parameters checksum and content. Content Moderation sets the checksum and content parameters based on the following rules and formats, and then calls your callback endpoint to return the moderation results.
Note After your server-side callback endpoint receives the result from Content Moderation, an HTTP status code of 200 indicates that the result is received. Any other HTTP status code indicates a failure. If the receipt fails, Content Moderation retries to send the result up to 16 times. If the receipt still fails after 16 retries, Content Moderation stops sending the result. Check the status of your callback endpoint. |
seed | String | No | abc**** | A random string. This value is used for the signature in the callback notification request. The string can contain letters, digits, and underscores (_), and can be up to 64 characters in length. You can customize this value to verify that the callback notification is sent by Content Moderation. Note If you use the callback parameter, you must specify this parameter. |
cryptType | String | No | SHA256 | If you use callback notifications, set the algorithm to sign the notification content. Content Moderation calculates a signature for the result string (user UID + seed + content) based on the specified encryption algorithm. Then, Content Moderation sends the signature to your callback URL. Valid values:
|
dataId | String | No | videoId**** | The data ID of the object that you want to moderate. The ID can contain uppercase letters, lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The ID can be up to 128 characters in length. You can use this ID to uniquely identify your business data. |
offline | String | No | false | Specifies whether to use the offline moderation mode.
Important This parameter is of the String type. The offline moderation mode is supported in the China (Beijing), China (Shanghai), and China (Hangzhou) regions. |
referer | String | No | www.aliyun.com | The Referer request header. This parameter is used for scenarios such as hotlink protection. The value can be up to 256 characters in length. |
When your server-side callback interface receives results from Content Moderation, an HTTP status code of 200 indicates that the results were successfully accepted. Any other HTTP status code indicates a failure. If acceptance fails, Content Moderation retries sending the detection results up to 16 times. If the results are still not accepted after 16 retries, Content Moderation stops sending them. You must check the status of your callback interface.
Response parameters
Name | Type | Example | Description | |
Code | Integer | 200 | The status code. For more information, see Code description. | |
Data | JSONObject | The moderation result data. | ||
TaskId | String | AAAAA-BBBBB | The ID of the detection task. | |
DataId | String | dataId0307 | The data ID. | |
Message | String | OK | The response message for the request. | |
RequestId | String | ABCD1234-1234-1234-1234-123**** | The request ID. | |
Examples
Query example
{
"Service": "videoDetection_global",
"ServiceParameters": {
"url": "http://www.aliyundoc.com/a.flv",
"dataId": "videoId****"
}
}Sample success responses
{
"Message": "OK",
"Code": 200,
"Data": {
"TaskId": "AAAAA-BBBBB",
"DataId": "videoId****"
},
"RequestId": "ABCD1234-1234-1234-1234-123****"
}Obtain video file moderation task results
API description
API operation: VideoModerationResult. This operation retrieves the results of a video file moderation task.
Billing information: This API operation is not billable.
Query timeout: The recommended query interval is 30 seconds. This means you should query the results 30 seconds after you submit the asynchronous detection task. The results are automatically deleted after 24 hours.
QPS limits
The QPS limit for this operation is 100 calls per second per account. If the number of calls per second exceeds this limit, throttling is triggered. This may affect your business. We recommend that you note this limit when you call this operation.
Debugging
Before integration, you can use Alibaba Cloud OpenAPI to debug the VideoModerationResult API online, view sample code and SDK dependency information, and review how to use the API and its parameters.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | videoDetection_global | The moderation service type. This must be the same as the service type used to submit the moderation job. |
ServiceParameters | JSONString | Yes | The parameters required by the moderation service. The value is a JSON-formatted string. For a description of each parameter, see Table 1 ServiceParameters. |
Table 1 ServiceParameters
Name | Type | Required | Example | Description |
taskId | string | Yes | abcd**** | The taskId of the detection task to be queried. Each request supports inputting one taskId. Note After submitting a detection task, you can obtain the detection task's taskId from the returned data. |
Response parameters
Name | Type | Example | Description |
RequestId | String | ABCD1234-1234-1234-1234-123**** | The ID of this request, which is a unique identifier generated by Alibaba Cloud for the request and can be used for troubleshooting and locating issues. |
Data | Object | Video content detection results. For more information, see Table 2 Data. | |
Code | String | 200 | Status code. For more information, see Code description. |
Message | String | OK | Response message for this request. |
Table 2. Data
Name | Type | Example | Description |
DataId | String | videoId**** | The data ID of the detected object. Note If the DataId parameter is specified in the request, its value is returned here. |
TaskId | String | AAAAA-BBBBB-2024*-0307* | The ID of the detection task. |
RiskLevel | String | high | The risk level of the video, determined by a comprehensive analysis of its frames and audio. Valid values:
Note Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when a high recall rate is required. Otherwise, treat it as content with no detected risk. Configure video frame risk scores in the Content Moderation console. |
FrameResult | JSONObject | The result of the video frame detection. If the call is successful (code=200), the response contains a struct. For more information about the struct, see Table 3 FrameResult. Note In a video stream detection scenario, a return code of 280 indicates that the detection is in progress, and 200 indicates that the detection is complete. If the detection is in progress, the detection result includes the data from the start of the detection to the current time. | |
AudioResult | JSONObject | The result of the video audio detection. The response contains a struct. For more information about the struct, see Table 8 audioResult. |
Table 3 FrameResult
Name | Type | Example | Description |
FrameNum | Integer | 200 | The number of video frames returned. |
FrameSummarys | JSONArray | A summary of labels for the video frames. For a description of the structure, see Table 4 FrameSummary. | |
RiskLevel | String | high | The risk level of the video frames. This is calculated based on all video frames. The return value can be one of the following:
|
Frames | JSONArray | Information about video frames that contain hit labels. For a description of the structure, see Table 5 Frame. |
Table 4 FrameSummary
Name | Type | Example | Description |
Label | String | violent_armedForces | Video snapshot label. |
Description | String | Suspected to contain firework-type content elements | Description of the Label field. Important This field explains the Label field and is subject to change. Process results based on the Label field, not this field. |
LabelSum | Integer | 8 | Number of times the label appears. |
Table 5 Frame
Name | Type | Example | Description |
TempUrl | String | http://www.aliyundoc.com/test.jpg | Temporary URL of the video frame. Valid for 30 minutes. Note If video evidence storage is enabled, the OSS URL of the stored video frame is returned. |
Offset | Float | 50.5 | Timestamp of the video frame from the beginning of the video, in seconds. |
RiskLevel | String | high | The risk level of the video frame. The value is based on the configured risk score thresholds. Valid values include the following:
Note Handle high-risk content immediately. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no detected risk. Configure video frame risk scores in the rule configuration of the Content Moderation console. |
Results | JSONArray | The detection results for the video frame, including risk labels, confidence scores, and other parameters. For more information, see Table 6 Results. |
Table 6 Results
Name | Type | Example | Description |
Service | String | baselineCheck_global | The video screen service that was called. |
Result | Array | Results of video snapshot detection, including risk labels, confidence scores, and other parameters. For more information, see Table 7 Result. |
Table 7 Result
Name | Type | Example | Description |
Label | String | violent_explosion | The label returned after a video snapshot is processed. A single snapshot can return multiple labels and scores. The supported labels are as follows: |
Confidence | Float | 81.22 | The confidence score. The value ranges from 0 to 100 and is accurate to two decimal places. |
Description | String | Suspected to contain firework-related content elements | A description of the Label field. Important This field explains the Label field and is subject to change. Process results based on the Label field, not this field. |
Table 8 audioResult
Name | Type | Example value | Description |
AudioSummarys | JSONArray | A summary of audio labels. For a description of the structure, see Table 9 AudioSummarys. | |
RiskLevel | String | high | The risk level of the audio. The value is calculated based on all audio segments. The following values can be returned:
|
SliceDetails | JSONArray | An array that contains the details of the text corresponding to the audio. Each element in the array represents a sentence. For a description of the structure, see Table 10 SliceDetails. |
Table 9 AudioSummarys
Name | Type | Example | Description |
Label | String | profanity | The speech label in the video. |
LabelSum | Integer | 8 | The number of times the label appears. |
Table 10 SliceDetails
Name | Type | Example | Description |
StartTime | Integer | 0 | The start time of the sentence, in seconds. |
EndTime | Integer | 4065 | The end time of the sentence, in seconds. |
StartTimestamp | Integer | 1678854649720 | The start timestamp of the segment, in milliseconds. |
EndTimestamp | Integer | 1678854649720 | The end timestamp of the segment, in milliseconds. |
Text | String | Disgusting | The transcribed text from the audio. |
Url | String | https://aliyundoc.com/test.wav | If the scanned content is an audio stream, this is the temporary URL for the corresponding audio segment. The URL is valid for 30 minutes. Save a copy of the audio before the URL expires. |
Labels | String | political_content,xxxx | The details of the labels. Multiple labels are separated by commas. The labels include the following:
|
RiskLevel | String | high | The risk level of the audio or video segment. Valid values:
|
RiskWords | String | AAA,BBB,CCC | The matched risk words. Multiple words are separated by commas. |
RiskTips | String | pornography_vulgar_words,pornography_description | The detailed labels. Multiple labels are separated by commas. |
Extend | String | {"riskTips":"pornography_vulgar_words","riskWords":"sexual_services"} | A reserved field. |
Examples
Query example
{
"Service": "videoDetection_global",
"ServiceParameters": {
"taskId": "abcd****"
}
}Sample success responses
Video frame detection only
{
"Code": 200,
"RequestId": "25106421-XXXX-XXXX-XXXX-15DA5AAAC546",
"Message": "success finished",
"Data": {
"DataId": "ABCDEF-TESTDATAID",
"TaskId": "AAAAA-BBBBB-2024-0307-0728",
"FrameResult": {
"FrameNum": 2,
"FrameSummarys": [
{
"Label": "violent_explosion",
"LabelSum": 8
},
{
"Label": "sexual_cleavage",
"LabelSum": 5
}
],
"Frames": [
{
"Offset": 1,
"Results": [
{
"Result": [
{
"Label": "nonLabel"
}
],
"Service": "baselineCheck_global"
}
],
"TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
},
{
"Offset": 2,
"Results": [
{
"Result": [
{
"Confidence": 1,
"Label": "sexual_cleavage"
},
{
"Confidence": 74.1,
"Label": "violent_explosion"
}
],
"Service": "baselineCheck_global"
}
],
"TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
}
]
}
}
}Detects both video frames and speech
{
"Code": 200,
"RequestId": "25106421-XXXX-XXXX-XXXX-15DA5AAAC546",
"Message": "success finished",
"Data": {
"DataId": "ABCDEF-TESTDATAID",
"TaskId": "AAAAA-BBBBB-2024-0307-0728",
"RiskLevel": "medium",
"AudioResult": {
"AudioSummarys": [
{
"Label": "sexual_sounds",
"LabelSum": 3
}
],
"RiskLevel": "high",
"SliceDetails": [
{
"EndTime": 60,
"EndTimestamp": 1698912813192,
"Labels": "",
"RiskLevel": "none",
"StartTime": 30,
"StartTimestamp": 1698912783192,
"Text": "Content Moderation",
"Url": "http://abc.oss-ap-southeast-1.aliyuncs.com/test.wav"
},
{
"EndTime": 30,
"EndTimestamp": 1698912813192,
"Extend": "{\"customizedWords\":\"service\",\"customizedLibs\":\"test\"}",
"Labels": "C_customized",
"RiskLevel": "high",
"StartTime": 0,
"StartTimestamp": 1698912783192,
"Text": "Welcome to Alibaba Cloud Content Moderation service",
"Url": "http://abc.oss-ap-southeast-1.aliyuncs.com/test.wav"
}
]
},
"FrameResult": {
"FrameNum": 2,
"FrameSummarys": [
{
"Label": "violent_explosion",
"Description": "Suspected to contain firework-type content elements",
"LabelSum": 8
},
{
"Label": "sexual_cleavage",
"Description": "Suspected to contain body exposure or sexually suggestive content",
"LabelSum": 8
}
],
"RiskLevel": "medium",
"Frames": [
{
"Offset": 1,
"RiskLevel": "none",
"Results": [
{
"Result": [
{
"Label": "nonLabel",
"Description": "No risk detected"
}
],
"Service": "baselineCheck_global"
}
],
"TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
},
{
"Offset": 2,
"RiskLevel": "medium",
"Results": [
{
"Result": [
{
"Confidence": 1,
"Label": "sexual_cleavage",
"Description": "Suspected to contain body exposure or sexually suggestive content"
},
{
"Confidence": 74.1,
"Label": "violent_explosion",
"Description": "Suspected to contain firework-type content elements"
}
],
"Service": "baselineCheck_global"
}
],
"TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
}
]
}
}
}Code description
The following table describes the meanings of the Code values returned by the Video File Moderation 2.0 API. You are billed only for requests that return a Code value of 200 or 280. Requests that return other Code values are not billed.
Code | Description |
200 | The request is successful or the detection is complete. |
280 | Detection is in progress. |
288 | Queued for processing in nearline mode. |
400 | The request parameters are empty. |
401 | The request parameters are invalid. |
402 | The length of a request parameter does not meet the API requirements. Check and modify the parameter. |
403 | The number of requests per second (QPS) exceeds the limit. Reduce the number of concurrent requests. |
404 | An error occurred while downloading the video. Check the video URL and retry the request. |
405 | The video download timed out, possibly because the video is inaccessible. Check the video URL and retry the request. |
406 | The video file is too large. Reduce the video size and retry the request. |
407 | The video format is not supported. Use a supported format and retry the request. |
408 | The account does not have permission to call this API. This can occur if the service is not activated, the account has an overdue payment, or the account is not authorized. |
409 | The specified TaskId does not exist. The task result may have expired because it is more than 24 hours old. |
480 | The number of concurrent detection ingest endpoints exceeds the limit. Reduce the number of concurrent requests. |
500 | A system error occurred. |