This document describes how to use the AI Guardrails API to moderate text content.
If you have already integrated the enhanced PLUS edition of the Content Moderation service, you only need to upgrade the software development kit (SDK) to call this API operation.
If you have not integrated the enhanced PLUS edition of the Content Moderation service, you can directly integrate the multimodal API operation. You can reuse this multimodal API operation if you later need to moderate content such as AIGC-generated images and files. For more information, see Multimodal API integration guide.
Step 1: Activate the service
Go to the AI Guardrails activation page to activate the AI Guardrails service.
Step 2: Grant permissions to a RAM user
Before you integrate the SDK or API, you must grant permissions to a Resource Access Management (RAM) user and create an AccessKey pair for that user. The AccessKey pair is used for identity verification when you call Alibaba Cloud API operations. For more information about how to obtain an AccessKey pair, see Obtain an AccessKey pair.
Procedure
Log on to the RAM console as a RAM administrator.
- Create a RAM user.
For more information, see Create a RAM user.
- Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user.For more information, see Grant permissions to a RAM user.
After completing the preceding operations, you can call the Content Moderation API as the RAM user.
Step 3: Install and integrate the SDK
For more information about the AI Guardrails service SDK, see SDK Reference.
API description
Usage notes
This API operation creates a text content moderation task.
Service API operation: TextModerationPlus
Supported regions and endpoints:
Region | Public endpoint | Internal endpoint |
Singapore | green-cip.ap-southeast-1.aliyuncs.com | green-cip-vpc.ap-southeast-1.aliyuncs.com |
Billing information: This is a paid API operation. Only requests that return an HTTP status code of 200 are billed. Requests that return other error codes are not billed. For more information about the billing method, see the "Activation and billing" section in Billing overview.
QPS limits
This API operation has a queries per second (QPS) limit of 50 for each user. If you exceed the limit, API calls are throttled. This may affect your business operations. Call the API operation at a reasonable frequency.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | query_security_check_intl |
|
ServiceParameters | JSONString | Yes | The set of parameters required for the moderation service. This is a JSON string. For a description of each parameter in the string, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
content | String | At least one item is required. | Text content to moderate | The text content to moderate. Important A maximum of 2,000 characters can be entered at a time. |
chatId | String | No | ABC123 | A unique ID for an interaction record that consists of a user input and a Large Language Model (LLM) output. |
Return parameters
Name | Type | Example | Description |
Code | Integer | 200 | The status code. For more information, see Code description. |
Data | JSONObject | {"Result":[...]} | The data of the moderation result. For more information, see Data. |
Message | String | OK | The response message for the request. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 2. Data
Name | Type | Example | Description |
Result | JSONArray | The results, such as compliance risk labels and confidence scores. For more information, see Result. | |
RiskLevel | String | high | The risk level. This is determined based on the configured high and low risk score thresholds. Valid values:
Note Handle high-risk content directly. Manually review medium-risk content. Process low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no risk detected. You can configure risk scores by logging on to the AI Guardrails console. |
SensitiveResult | JSONArray | The results for sensitive content detection, such asrisk labels and sensitive samples. For more information, see SensitiveResult | |
SensitiveLevel | String | S4 | The sensitivity level. Valid values: S0, S1, S2, S3, and S4
|
AttackResult | JSONArray | The results for attack content detection, such as risk labels and confidence scores. For more information, see AttackResult | |
AttackLevel | String | high | The attack level. Valid values:
|
Table 3. Result
Name | Type | Example | Description |
Label | String | political_xxx | The label returned after text content moderation. Multiple labels and scores may be returned. |
Confidence | Float | 81.22 | The confidence score. The value ranges from 0 to 100, with two decimal places. Some labels do not have a confidence score. |
Riskwords | String | AA,BB,CC | The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words. |
CustomizedHit | JSONArray | [{"LibName":"...","Keywords":"..."}] | If a custom dictionary is hit, the Label is `customized`. The name of the custom dictionary and the custom words are returned. |
Description | String | Suspected political entity | The description of the Label field. Important This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results. |
Table 4. CustomizedHit
Name | Type | Example | Description |
LibName | String | Custom Dictionary 1 | The name of the custom dictionary. |
Keywords | String | Custom Word 1,Custom Word 2 | The custom words. Multiple words are separated by commas. |
Table 5. SensitiveResult
Name | Type | Example | Description |
Label | String | 1780 | The label returned after text content moderation. Multiple labels and scores may be returned. |
SensitiveLevel | String | S4 | The sensitivity level. Valid values: S0, S1, S2, and S3
|
SensitiveData | JSONArray | ["6201112223455"] | The detected sensitive samples (0 to 5). |
Description | String | Credit card number | The description of the Label field. Important This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results. |
Table 6. AttackResult
Name | Type | Example | Description |
Label | String | Indirect Prompt Injection | The label returned after text content moderation. Multiple labels and scores may be returned. |
AttackLevel | String | high | The attack level. Valid values:
|
Confidence | Float | 100.0 | The confidence score. The value ranges from 0 to 100. |
Description | String | Indirect prompt injection | The description of the Label field. Important This field explains the Label field and may be subject to change. We recommend that you process the Label field instead of this field when handling results. |
Examples
Sample request
{
"Service": "query_security_check",
"ServiceParameters": {
"content": "testing content",
"chatId":"ABC123"
}
}Sample response:
A system policy is hit:
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description":"Suspected political entity",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
},
{
"Label": "political_figure",
"Description":"Suspected political figure",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
}
{
"Label": "customized",
"Description": "Hit custom dictionary",
"Confidence": 100.0,
"CustomizedHit": [
{
"LibName": "Custom Dictionary Name 1",
"KeyWords": "Custom Keyword"
}
]
}
],
"SensitiveResult": [
{
"Label": "1780",
"SensitiveLevel": "S4",
"Description":"Credit card number",
"SensitiveData": ["6201112223455"]
}
],
"AttackResult": [
{
"Label": "Indirect Prompt Injection",
"AttackLevel": "high",
"Description":"Indirect prompt injection",
"Confidence": 100.0
}
],
"RiskLevel": "high",
"SensitiveLevel": "S3",
"AttackLevel": "high",
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Code description
Code | Status code | Description |
200 | OK | The request was successful. |
400 | BAD_REQUEST | The request is invalid. This may be because the request parameters are incorrect. Check the request parameters. |
408 | PERMISSION_DENY | This may be because your account is not authorized, has an overdue payment, has not activated the service, or is banned. |
500 | GENERAL_ERROR | An error occurred. This may be a temporary server-side error. We recommend that you retry. If this error code persists, contact us through online support. |
581 | TIMEOUT | A timeout occurred. We recommend that you retry. If this error code persists, contact us through online support. |
588 | EXCEED_QUOTA | The request frequency exceeds the quota. |