Screens text content for compliance risks, sensitive data, and prompt injection attacks using the TextModerationPlus API operation — without bundling the check with model inference.
If you have already integrated the enhanced PLUS edition of the Guardrails service, upgrade the software development kit (SDK) to call this API operation. If you are starting fresh, integrate this API directly. You can reuse it later to moderate AI-generated images and files. For details, see the Multimodal API integration guide.
Prerequisites
Before you begin, decide the following:
Which content to check: user inputs (
query_security_check_intl), LLM outputs (response_security_check_intl), or bothHow to handle each risk level: block
high-risk content automatically, routemedium-risk content to human review, and treatlow-risk content as safe unless you need high recall
Then make sure you have:
An Alibaba Cloud account with the Guardrails service activated
A Resource Access Management (RAM) user with the
AliyunYundunGreenWebFullAccesssystem policy and an AccessKey pair (see Set up a RAM user)
Set up a RAM user
The AccessKey pair is used for identity verification when calling Alibaba Cloud API operations.
Log on to the RAM console with your Alibaba Cloud account.
Create a RAM user. For details, see Create a RAM user.
Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user. For details, see Grant permissions to a RAM user.Create an AccessKey pair for the RAM user. For details, see Obtain an AccessKey pair.
Install the SDK
For SDK installation and setup, see the SDK Reference.
API reference
Endpoint
| Region | Public endpoint | Internal endpoint |
|---|---|---|
Singapore | green-cip.ap-southeast-1.aliyuncs.com | green-cip-vpc.ap-southeast-1.aliyuncs.com |
Usage notes
QPS limit: 50 requests per second per user. Requests that exceed this limit are throttled.
Content limit: 2,000 characters per request.
Billing: Only requests that return HTTP status code 200 are billed. See Billing overview for details.
Request parameters
| Parameter | Type | Required | Description | |
|---|---|---|---|---|
Service | String | Yes | query_security_check_intl |
|
Service | String | Yes | The moderation use case. Valid values: query_security_check_intl (AI input check) and response_security_check_intl (AI-generated content check). | |
ServiceParameters | JSONString | Yes | A JSON string containing the content to moderate. See the table below for fields. |
ServiceParameters fields
| Field | Type | Required | Description |
|---|---|---|---|
content | String | At least one field required | The text to moderate. Maximum 2,000 characters. |
chatId | String | No | A unique ID for an interaction record, pairing a user input with an LLM output. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
Code | Integer | The HTTP status code. See Status codes. |
Data | JSONObject | The moderation result. See the table below for fields. |
Message | String | The response message. |
RequestId | String | The request ID. |
Data fields
| Field | Type | Description |
|---|---|---|
RiskLevel | String | The overall compliance risk level: high, medium, low, or none. Determined by the configured risk score thresholds. If a custom dictionary is hit, the risk level is high by default. Configure thresholds in the Guardrails console. |
Result | JSONArray | Compliance risk labels with confidence scores. See Result fields. |
SensitiveLevel | String | The overall sensitive content level: S0 (none detected) through S4 (highest). |
SensitiveResult | JSONArray | Sensitive content detection results. See SensitiveResult fields. |
AttackLevel | String | The overall attack detection level: high, medium, low, or none. |
AttackResult | JSONArray | Prompt injection detection results. See AttackResult fields. |
Result fields
| Field | Type | Description |
|---|---|---|
Label | String | The compliance risk label (e.g., political_entity, political_figure, customized). Multiple labels may be returned. |
Confidence | Float | The confidence score, from 0 to 100 with two decimal places. Not all labels include a score. |
Riskwords | String | Detected sensitive words, comma-separated. Not all labels include this field. |
CustomizedHit | JSONArray | Populated when Label is customized. Contains the matched custom dictionary name and keywords. See CustomizedHit fields. |
Description | String | A human-readable explanation of the label. This field may change — use Label to drive your business logic, not Description. |
CustomizedHit fields
| Field | Type | Description |
|---|---|---|
LibName | String | The name of the matched custom dictionary. |
Keywords | String | The matched custom words, comma-separated. |
SensitiveResult fields
| Field | Type | Description |
|---|---|---|
Label | String | The sensitive content label (e.g., 1780). |
SensitiveLevel | String | The sensitivity level: S0 (none) through S3. |
SensitiveData | JSONArray | Detected sensitive samples (0–5 items). |
Description | String | A human-readable explanation of the label. Use Label to drive your business logic, not Description. |
AttackResult fields
| Field | Type | Description |
|---|---|---|
Label | String | The attack type (e.g., Indirect Prompt Injection). |
AttackLevel | String | The attack level: high, medium, low, or none. |
Confidence | Float | The confidence score, from 0 to 100. |
Description | String | A human-readable explanation of the label. Use Label to drive your business logic, not Description. |
Handle moderation results
Use the top-level fields (RiskLevel, SensitiveLevel, AttackLevel) to route content. Drill into Result, SensitiveResult, and AttackResult arrays for the specific labels and confidence scores that explain the decision.
| Level | Recommended action |
|---|---|
high | Block the content automatically. |
medium | Route to human review. |
low | Treat as safe unless your application requires high recall. |
none | No action required. |
A custom dictionary match always setsRiskLeveltohigh.
Example
Request
{
"Service": "query_security_check",
"ServiceParameters": {
"content": "testing content",
"chatId":"ABC123"
}
}Response (system policy matched)
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description":"Suspected political entity",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
},
{
"Label": "political_figure",
"Description":"Suspected political figure",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
}
{
"Label": "customized",
"Description": "Hit custom dictionary",
"Confidence": 100.0,
"CustomizedHit": [
{
"LibName": "Custom Dictionary Name 1",
"KeyWords": "Custom Keyword"
}
]
}
],
"SensitiveResult": [
{
"Label": "1780",
"SensitiveLevel": "S4",
"Description":"Credit card number",
"SensitiveData": ["6201112223455"]
}
],
"AttackResult": [
{
"Label": "Indirect Prompt Injection",
"AttackLevel": "high",
"Description":"Indirect prompt injection",
"Confidence": 100.0
}
],
"RiskLevel": "high",
"SensitiveLevel": "S3",
"AttackLevel": "high",
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Status codes
| Code | Status | Description |
|---|---|---|
| 200 | OK | The request was successful. |
| 400 | BAD_REQUEST | The request is invalid. Check the request parameters. |
| 408 | PERMISSION_DENY | The account is not authorized, has an overdue payment, has not activated the service, or is banned. |
| 500 | GENERAL_ERROR | A temporary server-side error occurred. Retry the request. If this code persists, contact online supportonline support. |
| 581 | TIMEOUT | The request timed out. Retry the request. If this code persists, contact online supportonline support. |
| 588 | EXCEED_QUOTA | The request frequency exceeds the quota. |
What's next
Multimodal API integration guide — extend moderation to AI-generated images and files
SDK Reference — SDK installation and usage
Billing overview — understand how requests are billed