All Products
Search
Document Center

Content Moderation:API access guide

Last Updated:Dec 16, 2025

This document describes how to call the AI Guardrails API to moderate text content.

Step 1: Activate the service

Activate the AI Guardrails service on the AI Guardrails service activation page.

Step 2: Grant permissions to a RAM user

Before you can access the software development kit (SDK) or API, you must grant permissions to a Resource Access Management (RAM) user. You can use an AccessKey pair from your Alibaba Cloud account or a RAM user for identity verification when you call an Alibaba Cloud API operation. For more information, see Obtain an AccessKey pair.

Procedure

  1. Log on to the RAM console as a RAM administrator.

  2. Create a RAM user.

    For more information, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user.

    For more information, see Grant permissions to a RAM user.

    After completing the preceding operations, you can call the Content Moderation API as the RAM user.

Step 3: Install and access the SDK

For more information about the AI Guardrails SDK, see SDK Reference.

API reference

Usage notes

You can call this operation to create a text content detection task.

  • API operation: TextModerationPlus

  • Supported regions and endpoints:

Region

Public endpoint

Private network endpoint

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

  • Billing information: This is a billable operation. You are charged only for requests that return an HTTP status code of 200. You are not charged for requests that return other error codes. For more information about billing, see Billing overview.

QPS limit

The queries per second (QPS) limit for this operation is 20 calls per second for each user. If the number of calls per second exceeds the limit, throttling is triggered. Throttling can affect your business. We recommend that you call this operation based on your business needs.

Request parameters

Name

Type

Required

Example

Description

Service

String

Yes

query_security_check_intl

  • AI input content security check (query_security_check_intl)

  • AI-generated content security check (response_security_check_intl)

ServiceParameters

JSONString

Yes

The parameters required for the moderation service. The value is a JSON string. For more information about each string, see ServiceParameters.

Table 1. ServiceParameters

Name

Type

Required

Example

Description

content

String

At least one item is required.

Text content for detection

The text content to moderate.

Important

The maximum length is 2,000 characters.

chatId

String

No

ABC123

The unique ID of an interaction record that consists of a user input and a large language model (LLM) output.

Response parameters

Name

Type

Example

Description

Code

Integer

200

The status code. For more information, see Code description.

Data

JSONObject

{"Result":[...]}

The data of the moderation result. For more information, see Data.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 2. Data

Name

Type

Example

Description

Result

JSONArray

The detection results, such as content compliance risk labels and confidence scores. For more information, see Result.

RiskLevel

String

high

The risk level. The value is returned based on the configured high and low risk scores. Valid values:

  • high: high risk. If a custom library is hit, the risk level is high by default.

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Note

Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no detected risk. You can configure risk scores in the AI Guardrails console.

SensitiveResult

JSONArray

The detection results, such as sensitive content risk labels and sensitive samples. For more information, see SensitiveResult.

SensitiveLevel

String

S4

The sensitivity level. Valid values:

S0, S1, S2, S3, and S4

  • S0: No sensitive content is detected.

  • The higher the number, the higher the sensitivity level.

AttackResult

JSONArray

The detection results, such as attack content risk labels and confidence scores. For more information, see AttackResult.

AttackLevel

String

high

The attack level. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Table 3. Result

Name

Type

Example

Description

Label

String

political_xxx

The label returned after text content detection. Multiple labels and scores may be returned.

Confidence

Float

81.22

The confidence score. The value ranges from 0 to 100. The value is accurate to two decimal places. Some labels do not have confidence scores.

Riskwords

String

AA,BB,CC

The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

If a custom library is hit, the value of Label is `customized`, and the name of the custom library and the custom words are returned.

Description

String

Suspected political entity

The description of the Label field.

Important

This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field.

Table 4. CustomizedHit

Name

Type

Example

Description

LibName

String

Custom Library 1

The name of the custom library.

Keywords

String

Custom Word 1,Custom Word 2

The custom words. Multiple words are separated by commas.

Table 5. SensitiveResult

Name

Type

Example

Description

Label

String

1780

The label returned after text content detection. Multiple labels and scores may be returned.

SensitiveLevel

String

S4

The sensitivity level. Valid values:

S0, S1, S2, and S3

  • S0: No sensitive content is detected.

  • The higher the number, the higher the sensitivity level.

SensitiveData

JSONArray

["6201112223455"]

The detected sensitive samples (0 to 5).

Description

String

Credit card number

The description of the Label field.

Important

This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field.

Table 6. AttackResult

Name

Type

Example

Description

Label

String

Indirect Prompt Injection

The label returned after text content detection. Multiple labels and scores may be returned.

AttackLevel

String

high

The attack level. Valid values:

  • high: high risk

  • medium: medium risk

  • low: low risk

  • none: no risk detected

Confidence

Float

100.0

The confidence score. The value ranges from 0 to 100.

Description

String

Indirect prompt injection

The description of the Label field.

Important

This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field.

Examples

Request example:

{
    "Service": "query_security_check",
    "ServiceParameters": {
        "content": "testing content",
        "chatId":"ABC123"
    }
}

Response example:

System policy hit:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description":"Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            },
            {
                "Label": "political_figure",
                "Description":"Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            }
            {
                "Label": "customized",
                "Description": "Hit in custom library",
                "Confidence": 100.0,
                "CustomizedHit": [
                     {
                        "LibName": "Custom Library Name 1",
                        "KeyWords": "Custom Keyword"
                     }
                ]
             }
        ],
         "SensitiveResult": [
            {
                "Label": "1780",
                "SensitiveLevel": "S4",
                "Description":"Credit card number",
                "SensitiveData": ["6201112223455"]
            }
        ],     
         "AttackResult": [
            {
                "Label": "Indirect Prompt Injection",
                "AttackLevel": "high", 
                "Description":"Indirect prompt injection",
                "Confidence": 100.0
            }
        ],   
        "RiskLevel": "high",
        "SensitiveLevel": "S3",
        "AttackLevel": "high",                      
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Code description

Code

Status code

Description

200

OK

The request was successful.

400

BAD_REQUEST

The request is invalid. This may be because the request parameters are invalid. Check the request parameters.

408

PERMISSION_DENY

Your account may not have the required permissions, have an overdue payment, or be disabled. The service may also not be activated for your account.

500

GENERAL_ERROR

A server error occurred. This may be a temporary error. Try the request again. If the error persists, contact us through online service.

581

TIMEOUT

The request timed out. Try the request again. If the error persists, contact us through online service.

588

EXCEED_QUOTA

The request frequency exceeds the quota.