All Products
Search
Document Center

Content Moderation:API integration guide

Last Updated:Jan 21, 2026

This document describes how to use the AI Guardrails API to moderate text content.

Important
  • If you have already integrated the enhanced PLUS edition of the Content Moderation service, you only need to upgrade the software development kit (SDK) to call this API operation.

  • If you have not integrated the enhanced PLUS edition of the Content Moderation service, you can directly integrate the multimodal API operation. You can reuse this multimodal API operation if you later need to moderate content such as AIGC-generated images and files. For more information, see Multimodal API integration guide.

Step 1: Activate the service

Go to the AI Guardrails activation page to activate the AI Guardrails service.

Step 2: Grant permissions to a RAM user

Before you integrate the SDK or API, you must grant permissions to a Resource Access Management (RAM) user and create an AccessKey pair for that user. The AccessKey pair is used for identity verification when you call Alibaba Cloud API operations. For more information about how to obtain an AccessKey pair, see Obtain an AccessKey pair.

Procedure

  1. Log on to the RAM console as a RAM administrator.

  2. Create a RAM user.

    For more information, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user.

    For more information, see Grant permissions to a RAM user.

    After completing the preceding operations, you can call the Content Moderation API as the RAM user.

Step 3: Install and integrate the SDK

For more information about the AI Guardrails service SDK, see SDK Reference.

API description

Usage notes

This API operation creates a text content moderation task.

  • Service API operation: TextModerationPlus

  • Supported regions and endpoints:

Region

Public endpoint

Internal endpoint

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

  • Billing information: This is a paid API operation. Only requests that return an HTTP status code of 200 are billed. Requests that return other error codes are not billed. For more information about the billing method, see the "Activation and billing" section in Billing overview.

QPS limits

This API operation has a queries per second (QPS) limit of 50 for each user. If you exceed the limit, API calls are throttled. This may affect your business operations. Call the API operation at a reasonable frequency.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

query_security_check_intl

  • AI input content security check (query_security_check_intl)

  • AI-generated content security check (response_security_check_intl)

ServiceParameters

JSONString

Yes

The set of parameters required for the moderation service. This is a JSON string. For a description of each parameter in the string, see ServiceParameters.

Table 1. ServiceParameters

Name

Type

Required

Example

Description

content

String

At least one item is required.

Text content to moderate

The text content to moderate.

Important

A maximum of 2,000 characters can be entered at a time.

chatId

String

No

ABC123

A unique ID for an interaction record that consists of a user input and a Large Language Model (LLM) output.

Return parameters

Name

Type

Example

Description

Code

Integer

200

The status code. For more information, see Code description.

Data

JSONObject

{"Result":[...]}

The data of the moderation result. For more information, see Data.

Message

String

OK

The response message for the request.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 2. Data

Name

Type

Example

Description

Result

JSONArray

The results, such as compliance risk labels and confidence scores. For more information, see Result.

RiskLevel

String

high

The risk level. This is determined based on the configured high and low risk score thresholds. Valid values:

  • high: high risk (If a custom dictionary is hit, the risk level is high by default)

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Note

Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no risk detected. You can configure risk scores by logging on to the AI Guardrails console.

SensitiveResult

JSONArray

The results for sensitive content detection, such asrisk labels and sensitive samples. For more information, see SensitiveResult

SensitiveLevel

String

S4

The sensitivity level. Valid values:

S0, S1, S2, S3, and S4

  • S0 indicates that no sensitive content is detected.

  • A higher number indicates a higher sensitivity level.

AttackResult

JSONArray

The results for attack content detection, such as risk labels and confidence scores. For more information, see AttackResult

AttackLevel

String

high

The attack level. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Table 3. Result

Name

Type

Example

Description

Label

String

political_xxx

The label returned after text content moderation. Multiple labels and scores may be returned.

Confidence

Float

81.22

The confidence score. The value ranges from 0 to 100, with two decimal places. Some labels do not have a confidence score.

Riskwords

String

AA,BB,CC

The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

If a custom dictionary is hit, the Label is `customized`. The name of the custom dictionary and the custom words are returned.

Description

String

Suspected political entity

The description of the Label field.

Important

This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results.

Table 4. CustomizedHit

Name

Type

Example

Description

LibName

String

Custom Dictionary 1

The name of the custom dictionary.

Keywords

String

Custom Word 1,Custom Word 2

The custom words. Multiple words are separated by commas.

Table 5. SensitiveResult

Name

Type

Example

Description

Label

String

1780

The label returned after text content moderation. Multiple labels and scores may be returned.

SensitiveLevel

String

S4

The sensitivity level. Valid values:

S0, S1, S2, and S3

  • S0 indicates that no sensitive content is detected.

  • A higher number indicates a higher sensitivity level.

SensitiveData

JSONArray

["6201112223455"]

The detected sensitive samples (0 to 5).

Description

String

Credit card number

The description of the Label field.

Important

This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results.

Table 6. AttackResult

Name

Type

Example

Description

Label

String

Indirect Prompt Injection

The label returned after text content moderation. Multiple labels and scores may be returned.

AttackLevel

String

high

The attack level. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Confidence

Float

100.0

The confidence score. The value ranges from 0 to 100.

Description

String

Indirect prompt injection

The description of the Label field.

Important

This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results.

Examples

Sample request

{
    "Service": "query_security_check",
    "ServiceParameters": {
        "content": "testing content",
        "chatId":"ABC123"
    }
}

Sample response:

A system policy is hit:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description":"Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            },
            {
                "Label": "political_figure",
                "Description":"Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            }
            {
                "Label": "customized",
                "Description": "Hit custom dictionary",
                "Confidence": 100.0,
                "CustomizedHit": [
                     {
                        "LibName": "Custom Dictionary Name 1",
                        "KeyWords": "Custom Keyword"
                     }
                ]
             }
        ],
         "SensitiveResult": [
            {
                "Label": "1780",
                "SensitiveLevel": "S4",
                "Description":"Credit card number",
                "SensitiveData": ["6201112223455"]
            }
        ],     
         "AttackResult": [
            {
                "Label": "Indirect Prompt Injection",
                "AttackLevel": "high", 
                "Description":"Indirect prompt injection",
                "Confidence": 100.0
            }
        ],   
        "RiskLevel": "high",
        "SensitiveLevel": "S3",
        "AttackLevel": "high",                      
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Code description

Code

Status code

Description

200

OK

The request was successful.

400

BAD_REQUEST

The request is invalid. This may be because the request parameters are incorrect. Check the request parameters.

408

PERMISSION_DENY

This may be because your account is not authorized, has an overdue payment, has not activated the service, or is banned.

500

GENERAL_ERROR

An error occurred. This may be a temporary server-side error. We recommend that you retry. If this error code persists, contact us through online support.

581

TIMEOUT

A timeout occurred. We recommend that you retry. If this error code persists, contact us through online support.

588

EXCEED_QUOTA

The request frequency exceeds the quota.