This document describes how to call the AI Guardrails API to moderate text content.
Step 1: Activate the service
Activate the AI Guardrails service on the AI Guardrails service activation page.
Step 2: Grant permissions to a RAM user
Before you can access the software development kit (SDK) or API, you must grant permissions to a Resource Access Management (RAM) user. You can use an AccessKey pair from your Alibaba Cloud account or a RAM user for identity verification when you call an Alibaba Cloud API operation. For more information, see Obtain an AccessKey pair.
Procedure
Log on to the RAM console as a RAM administrator.
- Create a RAM user.
For more information, see Create a RAM user.
- Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user.For more information, see Grant permissions to a RAM user.
After completing the preceding operations, you can call the Content Moderation API as the RAM user.
Step 3: Install and access the SDK
For more information about the AI Guardrails SDK, see SDK Reference.
API reference
Usage notes
You can call this operation to create a text content detection task.
API operation: TextModerationPlus
Supported regions and endpoints:
Region | Public endpoint | Private network endpoint |
Singapore | green-cip.ap-southeast-1.aliyuncs.com | green-cip-vpc.ap-southeast-1.aliyuncs.com |
Billing information: This is a billable operation. You are charged only for requests that return an HTTP status code of 200. You are not charged for requests that return other error codes. For more information about billing, see Billing overview.
QPS limit
The queries per second (QPS) limit for this operation is 20 calls per second for each user. If the number of calls per second exceeds the limit, throttling is triggered. Throttling can affect your business. We recommend that you call this operation based on your business needs.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | query_security_check_intl |
|
ServiceParameters | JSONString | Yes | The parameters required for the moderation service. The value is a JSON string. For more information about each string, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
content | String | At least one item is required. | Text content for detection | The text content to moderate. Important The maximum length is 2,000 characters. |
chatId | String | No | ABC123 | The unique ID of an interaction record that consists of a user input and a large language model (LLM) output. |
Response parameters
Name | Type | Example | Description |
Code | Integer | 200 | The status code. For more information, see Code description. |
Data | JSONObject | {"Result":[...]} | The data of the moderation result. For more information, see Data. |
Message | String | OK | The response message. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 2. Data
Name | Type | Example | Description |
Result | JSONArray | The detection results, such as content compliance risk labels and confidence scores. For more information, see Result. | |
RiskLevel | String | high | The risk level. The value is returned based on the configured high and low risk scores. Valid values:
Note Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no detected risk. You can configure risk scores in the AI Guardrails console. |
SensitiveResult | JSONArray | The detection results, such as sensitive content risk labels and sensitive samples. For more information, see SensitiveResult. | |
SensitiveLevel | String | S4 | The sensitivity level. Valid values: S0, S1, S2, S3, and S4
|
AttackResult | JSONArray | The detection results, such as attack content risk labels and confidence scores. For more information, see AttackResult. | |
AttackLevel | String | high | The attack level. Valid values:
|
Table 3. Result
Name | Type | Example | Description |
Label | String | political_xxx | The label returned after text content detection. Multiple labels and scores may be returned. |
Confidence | Float | 81.22 | The confidence score. The value ranges from 0 to 100. The value is accurate to two decimal places. Some labels do not have confidence scores. |
Riskwords | String | AA,BB,CC | The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words. |
CustomizedHit | JSONArray | [{"LibName":"...","Keywords":"..."}] | If a custom library is hit, the value of Label is `customized`, and the name of the custom library and the custom words are returned. |
Description | String | Suspected political entity | The description of the Label field. Important This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field. |
Table 4. CustomizedHit
Name | Type | Example | Description |
LibName | String | Custom Library 1 | The name of the custom library. |
Keywords | String | Custom Word 1,Custom Word 2 | The custom words. Multiple words are separated by commas. |
Table 5. SensitiveResult
Name | Type | Example | Description |
Label | String | 1780 | The label returned after text content detection. Multiple labels and scores may be returned. |
SensitiveLevel | String | S4 | The sensitivity level. Valid values: S0, S1, S2, and S3
|
SensitiveData | JSONArray | ["6201112223455"] | The detected sensitive samples (0 to 5). |
Description | String | Credit card number | The description of the Label field. Important This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field. |
Table 6. AttackResult
Name | Type | Example | Description |
Label | String | Indirect Prompt Injection | The label returned after text content detection. Multiple labels and scores may be returned. |
AttackLevel | String | high | The attack level. Valid values:
|
Confidence | Float | 100.0 | The confidence score. The value ranges from 0 to 100. |
Description | String | Indirect prompt injection | The description of the Label field. Important This field explains the Label field and may be changed. When you process the results, use the Label field instead of this field. |
Examples
Request example:
{
"Service": "query_security_check",
"ServiceParameters": {
"content": "testing content",
"chatId":"ABC123"
}
}Response example:
System policy hit:
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description":"Suspected political entity",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
},
{
"Label": "political_figure",
"Description":"Suspected political figure",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
}
{
"Label": "customized",
"Description": "Hit in custom library",
"Confidence": 100.0,
"CustomizedHit": [
{
"LibName": "Custom Library Name 1",
"KeyWords": "Custom Keyword"
}
]
}
],
"SensitiveResult": [
{
"Label": "1780",
"SensitiveLevel": "S4",
"Description":"Credit card number",
"SensitiveData": ["6201112223455"]
}
],
"AttackResult": [
{
"Label": "Indirect Prompt Injection",
"AttackLevel": "high",
"Description":"Indirect prompt injection",
"Confidence": 100.0
}
],
"RiskLevel": "high",
"SensitiveLevel": "S3",
"AttackLevel": "high",
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Code description
Code | Status code | Description |
200 | OK | The request was successful. |
400 | BAD_REQUEST | The request is invalid. This may be because the request parameters are invalid. Check the request parameters. |
408 | PERMISSION_DENY | Your account may not have the required permissions, have an overdue payment, or be disabled. The service may also not be activated for your account. |
500 | GENERAL_ERROR | A server error occurred. This may be a temporary error. Try the request again. If the error persists, contact us through online service. |
581 | TIMEOUT | The request timed out. Try the request again. If the error persists, contact us through online service. |
588 | EXCEED_QUOTA | The request frequency exceeds the quota. |