All Products
Search
Document Center

AI Guardrails:API integration guide

Last Updated:Mar 31, 2026

Screens text content for compliance risks, sensitive data, and prompt injection attacks using the TextModerationPlus API operation — without bundling the check with model inference.

Important

If you have already integrated the enhanced PLUS edition of the Guardrails service, upgrade the software development kit (SDK) to call this API operation. If you are starting fresh, integrate this API directly. You can reuse it later to moderate AI-generated images and files. For details, see the Multimodal API integration guide.

Prerequisites

Before you begin, decide the following:

  • Which content to check: user inputs (query_security_check_intl), LLM outputs (response_security_check_intl), or both

  • How to handle each risk level: block high-risk content automatically, route medium-risk content to human review, and treat low-risk content as safe unless you need high recall

Then make sure you have:

Set up a RAM user

The AccessKey pair is used for identity verification when calling Alibaba Cloud API operations.

  1. Log on to the RAM console with your Alibaba Cloud account.

  2. Create a RAM user. For details, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. For details, see Grant permissions to a RAM user.

  4. Create an AccessKey pair for the RAM user. For details, see Obtain an AccessKey pair.

Install the SDK

For SDK installation and setup, see the SDK Reference.

API reference

Endpoint

RegionPublic endpointInternal endpoint

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

Usage notes

  • QPS limit: 50 requests per second per user. Requests that exceed this limit are throttled.

  • Content limit: 2,000 characters per request.

  • Billing: Only requests that return HTTP status code 200 are billed. See Billing overview for details.

Request parameters

ParameterTypeRequiredDescription

Service

String

Yes

query_security_check_intl

  • AI input content security check (query_security_check_intl)

  • AI-generated content security check (response_security_check_intl)

ServiceStringYesThe moderation use case. Valid values: query_security_check_intl (AI input check) and response_security_check_intl (AI-generated content check).
ServiceParametersJSONStringYesA JSON string containing the content to moderate. See the table below for fields.

ServiceParameters fields

FieldTypeRequiredDescription
contentStringAt least one field requiredThe text to moderate. Maximum 2,000 characters.
chatIdStringNoA unique ID for an interaction record, pairing a user input with an LLM output.

Response parameters

ParameterTypeDescription
CodeIntegerThe HTTP status code. See Status codes.
DataJSONObjectThe moderation result. See the table below for fields.
MessageStringThe response message.
RequestIdStringThe request ID.

Data fields

FieldTypeDescription
RiskLevelStringThe overall compliance risk level: high, medium, low, or none. Determined by the configured risk score thresholds. If a custom dictionary is hit, the risk level is high by default. Configure thresholds in the Guardrails console.
ResultJSONArrayCompliance risk labels with confidence scores. See Result fields.
SensitiveLevelStringThe overall sensitive content level: S0 (none detected) through S4 (highest).
SensitiveResultJSONArraySensitive content detection results. See SensitiveResult fields.
AttackLevelStringThe overall attack detection level: high, medium, low, or none.
AttackResultJSONArrayPrompt injection detection results. See AttackResult fields.

Result fields

FieldTypeDescription
LabelStringThe compliance risk label (e.g., political_entity, political_figure, customized). Multiple labels may be returned.
ConfidenceFloatThe confidence score, from 0 to 100 with two decimal places. Not all labels include a score.
RiskwordsStringDetected sensitive words, comma-separated. Not all labels include this field.
CustomizedHitJSONArrayPopulated when Label is customized. Contains the matched custom dictionary name and keywords. See CustomizedHit fields.
DescriptionStringA human-readable explanation of the label. This field may change — use Label to drive your business logic, not Description.

CustomizedHit fields

FieldTypeDescription
LibNameStringThe name of the matched custom dictionary.
KeywordsStringThe matched custom words, comma-separated.

SensitiveResult fields

FieldTypeDescription
LabelStringThe sensitive content label (e.g., 1780).
SensitiveLevelStringThe sensitivity level: S0 (none) through S3.
SensitiveDataJSONArrayDetected sensitive samples (0–5 items).
DescriptionStringA human-readable explanation of the label. Use Label to drive your business logic, not Description.

AttackResult fields

FieldTypeDescription
LabelStringThe attack type (e.g., Indirect Prompt Injection).
AttackLevelStringThe attack level: high, medium, low, or none.
ConfidenceFloatThe confidence score, from 0 to 100.
DescriptionStringA human-readable explanation of the label. Use Label to drive your business logic, not Description.
Guardrails activation page

Handle moderation results

Use the top-level fields (RiskLevel, SensitiveLevel, AttackLevel) to route content. Drill into Result, SensitiveResult, and AttackResult arrays for the specific labels and confidence scores that explain the decision.

LevelRecommended action
highBlock the content automatically.
mediumRoute to human review.
lowTreat as safe unless your application requires high recall.
noneNo action required.
A custom dictionary match always sets RiskLevel to high.

Example

Request

{
    "Service": "query_security_check",
    "ServiceParameters": {
        "content": "testing content",
        "chatId":"ABC123"
    }
}

Response (system policy matched)

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description":"Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            },
            {
                "Label": "political_figure",
                "Description":"Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            }
            {
                "Label": "customized",
                "Description": "Hit custom dictionary",
                "Confidence": 100.0,
                "CustomizedHit": [
                     {
                        "LibName": "Custom Dictionary Name 1",
                        "KeyWords": "Custom Keyword"
                     }
                ]
             }
        ],
         "SensitiveResult": [
            {
                "Label": "1780",
                "SensitiveLevel": "S4",
                "Description":"Credit card number",
                "SensitiveData": ["6201112223455"]
            }
        ],     
         "AttackResult": [
            {
                "Label": "Indirect Prompt Injection",
                "AttackLevel": "high", 
                "Description":"Indirect prompt injection",
                "Confidence": 100.0
            }
        ],   
        "RiskLevel": "high",
        "SensitiveLevel": "S3",
        "AttackLevel": "high",                      
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Status codes

CodeStatusDescription
200OKThe request was successful.
400BAD_REQUESTThe request is invalid. Check the request parameters.
408PERMISSION_DENYThe account is not authorized, has an overdue payment, has not activated the service, or is banned.
500GENERAL_ERRORA temporary server-side error occurred. Retry the request. If this code persists, contact online supportonline support.
581TIMEOUTThe request timed out. Retry the request. If this code persists, contact online supportonline support.
588EXCEED_QUOTAThe request frequency exceeds the quota.

What's next