The Document Moderation 2.0 API aids in moderating risks or violations in common documents. This topic describes the operations that you can call for Document Moderation 2.0.
Access guidelines
Register an Alibaba Cloud account: Register now and complete the account registration following the provided instructions.
Activate the pay-as-you-go billing method for Content Moderation: Make sure that the Content Moderation 2.0 service is activated. For more information, see Activate service. Activation is complimentary. You are not charged for activating this service. After you call API operations, the billing system automatically charges you based on your usage.
Create an AccessKey pair: Make sure that you have created an AccessKey pair as a Resource Access Management (RAM) user. For more information, see Create AccessKey. If you want to use the AccessKey pair of the RAM user, use your Alibaba Cloud account to grant the AliyunYundunGreenWebFullAccess permission to the RAM user. For more information, see RAM authorization.
Use SDKs: We recommend that you use SDKs to call the API. For more information, see Document Moderation 2.0 SDK and integration guide.
Submit a moderation task
Usage notes
Business operation: FileModeration. The document provides only an asynchronous moderation operation.
Supported regions and access addresses.
Region
Public network access address
Internal network access address
Supported services
Singapore
green-cip.ap-southeast-1.aliyuncs.com
green-cip-vpc.ap-southeast-1.aliyuncs.com
document_detection_global
Billing information:
This operation is chargeable and bills based on the number of pages processed in the document.
Moderation object: Common documents are supported for moderation.
Sample output: If you send asynchronous moderation requests, the moderation results are not returned in real time. To obtain moderation results, you can poll the moderation results periodically or enable callback notification. The moderation results are retained for up to 24 hours.
Enable callback notification to obtain moderation results: When you submit an asynchronous moderation task, you can specify a callback URL for receiving moderation results in the
callback parameter of the moderation request.
Poll moderation results: You do not need to set the callback parameter when you submit asynchronous moderation tasks. After you submit the tasks, you can call the result query operation to query moderation results.
Document requirements:
Supported protocols: HTTP and HTTPS.
Supported formats: DOC, DOCX, PPT, PPTX, PPS, PPSX, PDF, XLS, XLSX, XLTX, XLTM, HTML, and TXT (UTF-8 encoding).
Document size limit: Individual documents must not exceed 200 MB. Compress or split documents exceeding this limit.
Moderation time is contingent on document download time. Ensure the storage service for the document is stable and reliable. Alibaba Cloud OSS is recommended for document storage.
Moderation rule configuration:
Before making your first call, you must configure the Document Moderation rules in the Content Moderation Console. Without this configuration, the Document Moderation 2.0API will default to the standard settings.
QPS limit
You can call this operation up to 100 times per second per account. The system supports a maximum of 20 concurrent moderation tasks. Requests that exceed this limit are dropped and you will experience service interruptions. We recommend that you take note of this limit when you call this operation.
Debugging
Prior to access, use the Alibaba Cloud OpenAPI for online debugging of the Document Moderation 2.0 operation, to view sample code, SDK dependencies, and familiarize yourself with operation usage and parameters.
Before you call the Content Moderation API, you must log on to the Content Moderation console by using your Alibaba Cloud account. Therefore, the fees incurred by calling the operations are billed to the account.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | document_detection_global | Type of moderation service. Examples:
|
ServiceParameters | JSONString | Yes | Parameter set required for the moderation service. The value is a JSON string. For more information about descriptions of strings, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
url | String | Yes. The Document Moderation 2.0 supports three methods to add documents. Please choose one:
| http://www.aliyundoc.com/a.pdf | The URL of the object to be moderated. Make sure that the URL can be accessed through the public network and that the URL address length does not exceed 2048 characters. Note The URL address cannot contain Chinese characters, and make sure that only one URL is specified for one request. |
ossBucketName | String | bucket_0307 | The name of the OSS bucket that has been authorized. Note When using the OSS video intranet address, you must first use the Alibaba Cloud account (that is, the main account) to access the Cloud Resource Access Authorization Page for authorization. | |
ossObjectName | String | 20240307/07/28/test.pdf | The name of the object in the authorized OSS bucket. | |
ossRegionId | String | cn-shanghai | The region of the OSS bucket. | |
docType | String | No | If the document provided by the URL is a file without a suffix, you need to specify the document format. The valid values are doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt. Note When the document type is in txt format, only the text content will be moderated, and the image content will not be moderated by screenshots. We recommend that you directly extract the text from txt format documents and call the Text Moderation 2.0 service. | |
callback | String | No | http://www.aliyundoc.com | The URL for callback notification of moderation results. It supports addresses using HTTP and HTTPS protocols. If this field is empty, you must poll the moderation results regularly. The callback operation must support the POST method, UTF-8 encoded transmission data, and the form parameters checksum and content. Content Moderation sets checksum and content according to the following rules and formats, and calls your callback operation to return the moderation results.
Note If your server receives a callback notification, the server sends an HTTP 200 status code to Content Moderation. If your server fails to receive a callback notification, the server sends other HTTP status codes to Content Moderation. If your server fails to receive a callback notification, Content Moderation continues to push the callback notification until your server receives it. Content Moderation can push a callback notification repeatedly up to 16 times. After 16 times, Content Moderation stops pushing the callback notification. In this case, we recommend that you check the status of the callback URL. |
seed | String | No | abc**** | A random string that is used to generate a signature for the callback notification request. The string can be up to 64 characters in length and can contain letters, digits, and underscores (_). You can customize this string. It is used to verify the callback notification request when Content Moderation pushes callback notifications to your server. Note This parameter is required if you set the callback parameter. |
cryptType | String | No | SHA256 | The algorithm used to sign the callback notification content when you enable callback notification. Content Moderation signs the returned string by using the algorithm that you specify and sends the signed string to the callback URL. The returned string is in the UID + Seed + Content format. Valid values:
|
dataId | String | No | fileId**** | The ID of the object that you want to moderate. The ID can contain letters, digits, underscores (_), hyphens (-), and periods (.). It can be up to 128 characters in length. This ID uniquely identifies your business data. |
Response parameters
Name | Type | Example | Description | |
Code | Integer | 200 | Status code, consistent with the HTTP status code. For more information, see Code description. | |
Data | JSONObject | Moderation result data. | ||
TaskId | String | AAAAA-BBBBB | The task ID of the moderation. | |
Message | String | OK | The response message of the request message. | |
RequestId | String | ABCD1234-1234-1234-1234-123**** | The request ID. |
Examples
Sample requests
{
"Service": "document_detection_global",
"ServiceParameters":
{
"url": "http://www.aliyundoc.com/a.pdf",
"dataId": "fileId-2024-0307-0728***"
}
}
Sample success responses
{
"Msg": "OK",
"Code": 200,
"Data":
{
"TaskId": "AAAAA-BBBBB-CCCCCCCC"
},
"RequestId": "ABCD1234-1234-1234-1234-123****"
}
Obtain Document Moderation task results
Usage notes
Business operation: DescribeFileModerationResult, which retrieves Document Moderation task results.
Billing information: This operation incurs no charges.
Query timeout: We recommend that you query moderation results at least 30 seconds after you send an asynchronous moderation request. Content Moderation retains moderation results for up to 24 hours. After 4 hours, the results are automatically deleted.
QPS limit
You can call this operation up to 100 times per second per account. If the number of calls per second exceeds this limit, throttling will be triggered. This can potentially impact your business operations. Therefore, we recommend that you take note of this limit when making calls to this operation.
Debugging
Before access, use the Alibaba Cloud OpenAPI for online debugging of the operation, to view sample code, SDK dependencies, and familiarize yourself with operation usage and parameters.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | document_detection | The type of moderation service, which needs to be consistent with the moderation service type of the submitted moderation task. |
ServiceParameters | JSONString | Yes | The parameters required by the moderation service. The value is a JSON string. For more information about the description of each string, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
taskId | string | Yes | abcd**** | The ID of the task that you want to query. You can specify one task ID at a time. Note After you submit a moderation task, you can obtain the ID of the task from the response. |
Response parameters
Name | Type | Example | Description |
RequestId | String | ABCD1234-1234-1234-1234-123**** | The request ID, which is used to locate and troubleshoot issues. |
Data | Object | Document content moderation results. For more information, see Data. | |
Code | String | 200 | Status code, consistent with the HTTP status code. For more information, see Code description. |
Message | String | OK | The response message of this request. |
Table 2. Data
Name | Type | Example | Description |
DataId | String | fileId**** | The ID of the object that you want to moderate. Note If you specify the DataId parameter in the request, the value of the DataId parameter is returned in the response. |
Url | String | http://www.aliyundoc.com/a.docx | The URL of the moderation object. |
DocType | String | The format specified for files without a suffix. The valid values are doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt. | |
PageSummary | Object | Summary of Document Moderation results. For more information about the structure, see PageSummary. | |
RiskLevel | String | high | Risk level, returned based on the comprehensive calculation of images and text. The return values include the following:
Note The following handling suggestions are recommended: Customers handle high-risk content directly. Manual review should be performed on medium-risk content. Low-risk content can be handled when more risky content is detected. Customers can handle the content on which no risk is detected based on their business requirements. Risk scores can be configured in the Content Moderation Console. |
PageResult | JSONArray | Document page moderation results. When the call is successful (code=200), the moderation results contain a structure. For more information about the structure, see PageResult. Note The HTTP status code 280 indicates that the moderation task is in progress, and the HTTP status code 200 indicates that the moderation task is complete. If the moderation task is in progress, the returned moderation results contain all the issues that Content Moderation has detected in the task. |
Table 3. PageSummary
Name | Type | Example | Description |
PageSum | Integer | 10 | Total number of pages moderated in the document. |
ImageSummary | Object | Summary of image moderation results. For more information about the structure, see ImageSummary. Note When the document file is in txt format, no image moderation results exist. | |
TextSummary | Object | Summary of text moderation results. For more information about the structure, see TextSummary. |
Table 4. ImageSummary
Name | Type | Example | Description |
RiskLevel | String | high | The risk level, which is returned based on the configured risk scores. Valid values:
|
ImageLabels | JSONArray | Summary of image labels. For more information about the structure, see ImageLabels. |
Table 5. ImageLabels
Name | Type | Example | Description |
Label | String | violent_explosion | Image risk label. For more information, see Risk Label Interpretation Table. |
LabelSum | Integer | Number of occurrences of the label | |
Description | String | Fireworks content | Description of the Label field. Note This field is an explanation of the Label field and may change. It is recommended to handle the Label field when processing the actual result, and not to handle the result based on this field. |
Table 6. TextSummary
Name | Type | Example | Description |
RiskLevel | String | high | Risk level of document text. The return values include the following:
|
TextLabels | JSONArray | Summary of text labels. For specific structure, see TextLabels. |
Table 7. TextLabels
Name | Type | Example | Description |
Label | String | violent_explosion | Text risk label. |
LabelSum | Integer | The number of times that the label is matched. |
Table 8. PageResult
Name | Type | Example | Description |
PageNum | Integer | 50 | The number of document pages. |
ImageUrl | String | http://oss.aliyundoc.com/a.png | URL link of the current page screenshot. |
ImageResult | JSONArray | Image moderation results of the current page. For more information about the structure, see ImageResult. Note When the document file is in txt format, no image moderation results exist. | |
TextResult | JSONArray | Text moderation results of the current page. For more information about the structure, see TextResult. |
Table 9. ImageResult
Name | Type | Example | Description |
Description | String | Moderation of the image content of the document page | Image description. |
Service | String | baselineCheck | The service called for the images. |
RiskLevel | String | high | The risk level, which is returned based on the configured risk scores. Valid values:
|
Location | JSONObject | {"x":0,"y":0,"w":100,"h":100} | (Reserved) Image part coordinates |
LabelResult | JSONArray | Labels returned for the images. For more information about the structure, see LabelResult. |
Table 10. LabelResult
Name | Type | Example | Description |
Label | String | violent_explosion | The labels returned after the image moderation. Multiple labels and scores may be moderated for the same screenshot. For more information, see Risk Label Interpretation Table. |
Confidence | Float | 81.22 | The score of the confidence level. Valid values: 0 to 100. The value is accurate to two decimal places. |
Description | String | Fireworks content | Description of the Label field. Note This field is an explanation of the Label field and may be changed. We recommend that you process the moderation results based on the Label field instead of this field. |
Table 11. TextResult
Name | Type | Example | Description |
Description | String | Moderation of the text content of the document page. | Text part description. |
Service | String | pgc_detection | The service called for the text part. |
Text | String | This is the text part | Text part content. |
Labels | String | ad_compliance,C_customized | Labels returned for the text part. For more information, see Multi-language services provided by Text Moderation 2.0. |
RiskWords | String | Risk word A, Risk word B | Risk words returned for the text part. |
RiskTips | String | Advertising Law_General Prohibition of Extreme Words | Sub-labels returned for the text part. |
RiskLevel | String | high | The risk level, which is returned based on the calculated text risk. Valid values:
|
Examples
Sample requests
{
"service": "document_detection_global",
"serviceParameters": {
"taskId": "abcd****"
}
}
Sample success responses
{
"Code": 200,
"Data": {
"DataId": "fileId-2024-0307-0728***",
"PageResult": [
{
"ImageResult": [
{
"Description": "Moderation of the image content of the document page",
"LabelResult": [
{
"label": "nonLabel"
}
],
"Service": "baselineCheck_global"
}
],
"ImageUrl": "http://oss.aliyundoc.com/a.png",
"PageNum": 1,
"TextResult": [
{
"Description": "Moderation of the text content of the document page",
"Labels": "",
"RiskTips": "",
"RiskWords": "",
"Service": "comment_multilingual_global",
"Text": "Content Moderation product test case a"
}
]
},
...
{
"ImageResult": [
{
"Description": "Moderation of the image content of the document page",
"LabelResult": [
{
"Confidence": 89.01,
"Label": "pornographic_adultContent_tii"
}
],
"Service": "baselineCheck_global"
}
],
"ImageUrl": "http://oss.aliyundoc.com/b.png",
"PageNum": 10,
"TextResult": [
{
"Description": "Moderation of the text content of the document page",
"Labels": "contraband,sexual_content",
"RiskTips": "Prohibited_Prohibited goods, Pornographic_Film resources, Pornographic_Vulgar",
"RiskWords": "Risk word A, Risk word B",
"Service": "comment_multilingual_global",
"Text": "Content Moderation product test case b"
}
]
}
],
"Url": "http://www.aliyundoc.com/a.docx"
},
"Message": "SUCCESS",
"RequestId": "1D0854A7-AAAAA-BBBBBBB-CC8292AE5"
}
Code description
The following details the meanings of codes returned by the Document Moderation 2.0 operation. Only requests with code 200 and 280 are measured and billed. Other Codes are not subject to billing.
Code | Description |
200 | The request is normal or the moderation is complete. |
280 | The moderation in progress. |
400 | Not all request parameters are configured. |
401 | The request parameters are invalid. |
402 | Invalid request parameters. Check and modify them and try again. |
403 | The QPS of requests exceeds the upper limit. Check and modify the number of requests that are sent at a time. |
404 | The specified file failed to be downloaded. Check the URL of the file or try again. |
405 | The download or conversion of the specified file timed out, possibly because the link is inaccessible. Check and adjust the file and try again. |
406 | The specified file is too large. Check and adjust the file size and try again. |
407 | The format of the specified file is not supported. Check and change the file format and try again. |
408 | You do not have the required permissions. The possible cause is that this account is not activated, has overdue payments, or is not authorized to call this API operation. |
409 | The specified RequestId does not exist. The possible cause is that the moderation results have exceeded the 24-hour validity period. |
480 | The number of concurrent moderation tasks exceeds the upper limit. Check and change the number of concurrent moderation tasks. |
500 | A system exception occurred. |