All Products
Search
Document Center

Content Moderation:Document Moderation 2.0 API

Last Updated:Feb 27, 2025

The Document Moderation 2.0 API aids in moderating risks or violations in common documents. This topic describes the operations that you can call for Document Moderation 2.0.

Access guidelines

  1. Register an Alibaba Cloud account: Register now and complete the account registration following the provided instructions.

  2. Activate the pay-as-you-go billing method for Content Moderation: Make sure that the Content Moderation 2.0 service is activated. For more information, see Activate service. Activation is complimentary. You are not charged for activating this service. After you call API operations, the billing system automatically charges you based on your usage.

  3. Create an AccessKey pair: Make sure that you have created an AccessKey pair as a Resource Access Management (RAM) user. For more information, see Create AccessKey. If you want to use the AccessKey pair of the RAM user, use your Alibaba Cloud account to grant the AliyunYundunGreenWebFullAccess permission to the RAM user. For more information, see RAM authorization.

  4. Use SDKs: We recommend that you use SDKs to call the API. For more information, see Document Moderation 2.0 SDK and integration guide.

Submit a moderation task

Usage notes

  • Business operation: FileModeration. The document provides only an asynchronous moderation operation.

  • Supported regions and access addresses.

    Region

    Public network access address

    Internal network access address

    Supported services

    Singapore

    green-cip.ap-southeast-1.aliyuncs.com

    green-cip-vpc.ap-southeast-1.aliyuncs.com

    document_detection_global

  • Billing information:

    This operation is chargeable and bills based on the number of pages processed in the document.

  • Moderation object: Common documents are supported for moderation.

  • Sample output: If you send asynchronous moderation requests, the moderation results are not returned in real time. To obtain moderation results, you can poll the moderation results periodically or enable callback notification. The moderation results are retained for up to 24 hours.

    • Enable callback notification to obtain moderation results: When you submit an asynchronous moderation task, you can specify a callback URL for receiving moderation results in the 

      callback parameter of the moderation request.

    • Poll moderation results: You do not need to set the callback parameter when you submit asynchronous moderation tasks. After you submit the tasks, you can call the result query operation to query moderation results.

  • Document requirements:

    • Supported protocols: HTTP and HTTPS.

    • Supported formats: DOC, DOCX, PPT, PPTX, PPS, PPSX, PDF, XLS, XLSX, XLTX, XLTM, HTML, and TXT (UTF-8 encoding).

    • Document size limit: Individual documents must not exceed 200 MB. Compress or split documents exceeding this limit.

    • Moderation time is contingent on document download time. Ensure the storage service for the document is stable and reliable. Alibaba Cloud OSS is recommended for document storage.

  • Moderation rule configuration:

    • Before making your first call, you must configure the Document Moderation rules in the Content Moderation Console. Without this configuration, the Document Moderation 2.0API will default to the standard settings.

QPS limit

You can call this operation up to 100 times per second per account. The system supports a maximum of 20 concurrent moderation tasks. Requests that exceed this limit are dropped and you will experience service interruptions. We recommend that you take note of this limit when you call this operation.

Debugging

Prior to access, use the Alibaba Cloud OpenAPI for online debugging of the Document Moderation 2.0 operation, to view sample code, SDK dependencies, and familiarize yourself with operation usage and parameters.

Important

Before you call the Content Moderation API, you must log on to the Content Moderation console by using your Alibaba Cloud account. Therefore, the fees incurred by calling the operations are billed to the account.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

document_detection_global

Type of moderation service. Examples:

  • document_detection_global: General Document Moderation

ServiceParameters

JSONString

Yes

Parameter set required for the moderation service. The value is a JSON string. For more information about descriptions of strings, see ServiceParameters.

Table 1. ServiceParameters

Name

Type

Required

Example

Description

url

String

Yes. The Document Moderation 2.0 supports three methods to add documents. Please choose one:

  • Use the document URL method for moderation and specify the url.

  • Use OSS authorization for moderation. You must specify ossBucketName, ossObjectName, and ossRegionId at the same time.

  • Use local documents for moderation. Upload local document files for moderation, which do not occupy your OSS storage space, and the files are only stored for 30 minutes. The SDK access has integrated the local document upload function. For specific code examples, see Document Moderation 2.0 SDK and Access Guide.

http://www.aliyundoc.com/a.pdf

The URL of the object to be moderated. Make sure that the URL can be accessed through the public network and that the URL address length does not exceed 2048 characters.

Note

The URL address cannot contain Chinese characters, and make sure that only one URL is specified for one request.

ossBucketName

String

bucket_0307

The name of the OSS bucket that has been authorized.

Note

When using the OSS video intranet address, you must first use the Alibaba Cloud account (that is, the main account) to access the Cloud Resource Access Authorization Page for authorization.

ossObjectName

String

20240307/07/28/test.pdf

The name of the object in the authorized OSS bucket.

ossRegionId

String

cn-shanghai

The region of the OSS bucket.

docType

String

No

pdf

If the document provided by the URL is a file without a suffix, you need to specify the document format. The valid values are doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt.

Note

When the document type is in txt format, only the text content will be moderated, and the image content will not be moderated by screenshots. We recommend that you directly extract the text from txt format documents and call the Text Moderation 2.0 service.

callback

String

No

http://www.aliyundoc.com

The URL for callback notification of moderation results. It supports addresses using HTTP and HTTPS protocols. If this field is empty, you must poll the moderation results regularly.

The callback operation must support the POST method, UTF-8 encoded transmission data, and the form parameters checksum and content.

Content Moderation sets checksum and content according to the following rules and formats, and calls your callback operation to return the moderation results.

  • checksum: String format, composed of user uid + seed + content, generated through the SHA256 algorithm. The user UID is the Alibaba Cloud account ID, which can be queried in the Alibaba Cloud Management Console. To prevent data tampering, you can use the SHA-256 algorithm to generate a string when your server receives a callback notification and verify the string against the received checksum value.

    Note

    UID must be the ID of your Alibaba Cloud account, but not the ID of a RAM user.

  • content: The JSON-formatted string to be parsed to the callback data in the JSON format. For more information about the format of the content parameter, see the sample success responses of each operation that you can call to query asynchronous moderation results.

Note

If your server receives a callback notification, the server sends an HTTP 200 status code to Content Moderation. If your server fails to receive a callback notification, the server sends other HTTP status codes to Content Moderation. If your server fails to receive a callback notification, Content Moderation continues to push the callback notification until your server receives it. Content Moderation can push a callback notification repeatedly up to 16 times. After 16 times, Content Moderation stops pushing the callback notification. In this case, we recommend that you check the status of the callback URL.

seed

String

No

abc****

A random string that is used to generate a signature for the callback notification request.

The string can be up to 64 characters in length and can contain letters, digits, and underscores (_). You can customize this string. It is used to verify the callback notification request when Content Moderation pushes callback notifications to your server.

Note

This parameter is required if you set the callback parameter.

cryptType

String

No

SHA256

The algorithm used to sign the callback notification content when you enable callback notification. Content Moderation signs the returned string by using the algorithm that you specify and sends the signed string to the callback URL. The returned string is in the UID + Seed + Content format. Valid values:

  • SHA256 (default): Use the SHA256 encryption algorithm.

  • SM3: Use the national secret HMAC-SM3 encryption algorithm, return a hexadecimal string, and the string consists of lowercase letters and numbers.

    For example, 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0 is returned after you encrypt abc by using the HMAC-SM3 encryption algorithm.

dataId

String

No

fileId****

The ID of the object that you want to moderate.

The ID can contain letters, digits, underscores (_), hyphens (-), and periods (.). It can be up to 128 characters in length. This ID uniquely identifies your business data.

Response parameters

Name

Type

Example

Description

Code

Integer

200

Status code, consistent with the HTTP status code. For more information, see Code description.

Data

JSONObject

Moderation result data.

TaskId

String

AAAAA-BBBBB

The task ID of the moderation.

Message

String

OK

The response message of the request message.

RequestId

String

ABCD1234-1234-1234-1234-123****

The request ID.

Examples

Sample requests

{
  "Service": "document_detection_global",
  "ServiceParameters":
  {
    "url": "http://www.aliyundoc.com/a.pdf",
    "dataId": "fileId-2024-0307-0728***"
  }
}

Sample success responses

{
    "Msg": "OK",
    "Code": 200,
    "Data":
    {
        "TaskId": "AAAAA-BBBBB-CCCCCCCC"
    },
    "RequestId": "ABCD1234-1234-1234-1234-123****"
}

Obtain Document Moderation task results

Usage notes

  • Business operation: DescribeFileModerationResult, which retrieves Document Moderation task results.

  • Billing information: This operation incurs no charges.

  • Query timeout: We recommend that you query moderation results at least 30 seconds after you send an asynchronous moderation request. Content Moderation retains moderation results for up to 24 hours. After 4 hours, the results are automatically deleted.

QPS limit

You can call this operation up to 100 times per second per account. If the number of calls per second exceeds this limit, throttling will be triggered. This can potentially impact your business operations. Therefore, we recommend that you take note of this limit when making calls to this operation.

Debugging

Before access, use the Alibaba Cloud OpenAPI for online debugging of the operation, to view sample code, SDK dependencies, and familiarize yourself with operation usage and parameters.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

document_detection

The type of moderation service, which needs to be consistent with the moderation service type of the submitted moderation task.

ServiceParameters

JSONString

Yes

The parameters required by the moderation service. The value is a JSON string. For more information about the description of each string, see ServiceParameters.

Table 1. ServiceParameters

Name

Type

Required

Example

Description

taskId

string

Yes

abcd****

The ID of the task that you want to query. You can specify one task ID at a time.

Note

After you submit a moderation task, you can obtain the ID of the task from the response.

Response parameters

Name

Type

Example

Description

RequestId

String

ABCD1234-1234-1234-1234-123****

The request ID, which is used to locate and troubleshoot issues.

Data

Object

Document content moderation results. For more information, see Data.

Code

String

200

Status code, consistent with the HTTP status code. For more information, see Code description.

Message

String

OK

The response message of this request.

Table 2. Data

Name

Type

Example

Description

DataId

String

fileId****

The ID of the object that you want to moderate.

Note

If you specify the DataId parameter in the request, the value of the DataId parameter is returned in the response.

Url

String

http://www.aliyundoc.com/a.docx

The URL of the moderation object.

DocType

String

pdf

The format specified for files without a suffix. The valid values are doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt.

PageSummary

Object

Summary of Document Moderation results. For more information about the structure, see PageSummary.

RiskLevel

String

high

Risk level, returned based on the comprehensive calculation of images and text. The return values include the following:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk moderated

Note

The following handling suggestions are recommended: Customers handle high-risk content directly. Manual review should be performed on medium-risk content. Low-risk content can be handled when more risky content is detected. Customers can handle the content on which no risk is detected based on their business requirements. Risk scores can be configured in the Content Moderation Console.

PageResult

JSONArray

Document page moderation results. When the call is successful (code=200), the moderation results contain a structure. For more information about the structure, see PageResult.

Note

The HTTP status code 280 indicates that the moderation task is in progress, and the HTTP status code 200 indicates that the moderation task is complete. If the moderation task is in progress, the returned moderation results contain all the issues that Content Moderation has detected in the task.

Table 3. PageSummary

Name

Type

Example

Description

PageSum

Integer

10

Total number of pages moderated in the document.

ImageSummary

Object

Summary of image moderation results. For more information about the structure, see ImageSummary.

Note

When the document file is in txt format, no image moderation results exist.

TextSummary

Object

Summary of text moderation results. For more information about the structure, see TextSummary.

Table 4. ImageSummary

Name

Type

Example

Description

RiskLevel

String

high

The risk level, which is returned based on the configured risk scores. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk moderated

ImageLabels

JSONArray

Summary of image labels. For more information about the structure, see ImageLabels.

Table 5. ImageLabels

Name

Type

Example

Description

Label

String

violent_explosion

Image risk label. For more information, see Risk Label Interpretation Table.

LabelSum

Integer

Number of occurrences of the label

Description

String

Fireworks content

Description of the Label field.

Note

This field is an explanation of the Label field and may change. It is recommended to handle the Label field when processing the actual result, and not to handle the result based on this field.

Table 6. TextSummary

Name

Type

Example

Description

RiskLevel

String

high

Risk level of document text. The return values include the following:

  • high: High risk

  • medium: Medium risk

  • low: Low risk

  • none: No risk moderated

TextLabels

JSONArray

Summary of text labels. For specific structure, see TextLabels.

Table 7. TextLabels

Name

Type

Example

Description

Label

String

violent_explosion

Text risk label.

LabelSum

Integer

The number of times that the label is matched.

Table 8. PageResult

Name

Type

Example

Description

PageNum

Integer

50

The number of document pages.

ImageUrl

String

http://oss.aliyundoc.com/a.png

URL link of the current page screenshot.

ImageResult

JSONArray

Image moderation results of the current page. For more information about the structure, see ImageResult.

Note

When the document file is in txt format, no image moderation results exist.

TextResult

JSONArray

Text moderation results of the current page. For more information about the structure, see TextResult.

Table 9. ImageResult

Name

Type

Example

Description

Description

String

Moderation of the image content of the document page

Image description.

Service

String

baselineCheck

The service called for the images.

RiskLevel

String

high

The risk level, which is returned based on the configured risk scores. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk moderated

Location

JSONObject

{"x":0,"y":0,"w":100,"h":100}

(Reserved) Image part coordinates

LabelResult

JSONArray

Labels returned for the images. For more information about the structure, see LabelResult.

Table 10. LabelResult

Name

Type

Example

Description

Label

String

violent_explosion

The labels returned after the image moderation. Multiple labels and scores may be moderated for the same screenshot. For more information, see Risk Label Interpretation Table.

Confidence

Float

81.22

The score of the confidence level. Valid values: 0 to 100. The value is accurate to two decimal places.

Description

String

Fireworks content

Description of the Label field.

Note

This field is an explanation of the Label field and may be changed. We recommend that you process the moderation results based on the Label field instead of this field.

Table 11. TextResult

Name

Type

Example

Description

Description

String

Moderation of the text content of the document page.

Text part description.

Service

String

pgc_detection

The service called for the text part.

Text

String

This is the text part

Text part content.

Labels

String

ad_compliance,C_customized

Labels returned for the text part. For more information, see Multi-language services provided by Text Moderation 2.0.

RiskWords

String

Risk word A, Risk word B

Risk words returned for the text part.

RiskTips

String

Advertising Law_General Prohibition of Extreme Words

Sub-labels returned for the text part.

RiskLevel

String

high

The risk level, which is returned based on the calculated text risk. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk moderated

Examples

Sample requests

{
    "service": "document_detection_global",
    "serviceParameters": {
        "taskId": "abcd****"
    }
}

Sample success responses

{
    "Code": 200,
    "Data": {
        "DataId": "fileId-2024-0307-0728***",
        "PageResult": [
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of the image content of the document page",
                        "LabelResult": [
                            {
                                "label": "nonLabel"
                            }
                        ],
                        "Service": "baselineCheck_global"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/a.png",
                "PageNum": 1,
                "TextResult": [
                    {
                        "Description": "Moderation of the text content of the document page",
                        "Labels": "",
                        "RiskTips": "",
                        "RiskWords": "",
                        "Service": "comment_multilingual_global",
                        "Text": "Content Moderation product test case a"
                    }
                ]
            },
            ...
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of the image content of the document page",
                        "LabelResult": [
                            {
                                "Confidence": 89.01,
                                "Label": "pornographic_adultContent_tii"
                            }
                        ],
                        "Service": "baselineCheck_global"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/b.png",
                "PageNum": 10,
                "TextResult": [
                    {
                        "Description": "Moderation of the text content of the document page",
                        "Labels": "contraband,sexual_content",
                        "RiskTips": "Prohibited_Prohibited goods, Pornographic_Film resources, Pornographic_Vulgar",
                        "RiskWords": "Risk word A, Risk word B",
                        "Service": "comment_multilingual_global",
                        "Text": "Content Moderation product test case b"
                    }
                ]
            }
        ],
        "Url": "http://www.aliyundoc.com/a.docx"
    },
    "Message": "SUCCESS",
    "RequestId": "1D0854A7-AAAAA-BBBBBBB-CC8292AE5"
}

Code description

The following details the meanings of codes returned by the Document Moderation 2.0 operation. Only requests with code 200 and 280 are measured and billed. Other Codes are not subject to billing.

Code

Description

200

The request is normal or the moderation is complete.

280

The moderation in progress.

400

Not all request parameters are configured.

401

The request parameters are invalid.

402

Invalid request parameters. Check and modify them and try again.

403

The QPS of requests exceeds the upper limit. Check and modify the number of requests that are sent at a time.

404

The specified file failed to be downloaded. Check the URL of the file or try again.

405

The download or conversion of the specified file timed out, possibly because the link is inaccessible. Check and adjust the file and try again.

406

The specified file is too large. Check and adjust the file size and try again.

407

The format of the specified file is not supported. Check and change the file format and try again.

408

You do not have the required permissions. The possible cause is that this account is not activated, has overdue payments, or is not authorized to call this API operation.

409

The specified RequestId does not exist. The possible cause is that the moderation results have exceeded the 24-hour validity period.

480

The number of concurrent moderation tasks exceeds the upper limit. Check and change the number of concurrent moderation tasks.

500

A system exception occurred.