The large language model (LLM)-based text moderation service efficiently and accurately identifies non-compliant content. Compared to traditional text moderation solutions, this service provides more powerful language understanding and analysis capabilities. It can accurately identify complex and subtle non-compliant content and overcomes the limitations of traditional models.
This solution is rapidly evolving. If you have any feedback or suggestions, contact your business manager.
Service description
The following table describes the service that Content Moderation Enhanced Edition provides for LLM-based text moderation.
Service | Content Detection | Use cases |
Service name: UGC Text Moderation (LLM) Service: ugc_moderation_byllm_global | This LLM-based text moderation service is designed for UGC scenarios. It can efficiently and accurately detect various types of non-compliant text content. For a detailed list of check items, see the Content Moderation console. | This service is recommended for all types of text moderation in UGC scenarios. |
2. Billing
The LLM-based text moderation service supports the pay-as-you-go billing methods.
Pay-as-you-go
After you activate the Text Moderation service, the default billing method is pay-as-you-go. Fees are settled daily based on your actual usage, and you are not charged if you do not call the service.
Moderation type | Supported business scenarios (services) | Unit price |
LLM-based text moderation (text_advanced) |
| USD 0.6 per 1,000 calls Note Each call to a service on the left is counted as one billable item. You are charged based on the actual number of calls. For example, if you call the LLM-based Text Moderation Service in AIGC Scenarios 100 times, you are charged USD 0.06. |
For the pay-as-you-go Content Moderation Version 2.0, the billing frequency is once every 24 hours. In the billing details, moderationType corresponds to the Review Type field. You can view the billing details.
3. Risk labels
Label meanings
The LLM-based text moderation service supports over 30 sub-labels across 6 categories, each with a confidence level. If content poses multiple types of risks, the service can return multiple sub-labels. The following tables describe the risk label values, their corresponding confidence score ranges, and their meanings.
Label value (label) | Confidence score range (confidence) | Meaning |
pornographic_adult | 0 to 100. A higher score indicates a higher confidence level. | Suspected pornographic content |
sexual_terms | 0 to 100. A higher score indicates a higher confidence level. | Suspected sexual health content |
sexual_suggestive | 0 to 100. A higher score indicates a higher confidence level. | Suspected vulgar content |
sexual_orientation | 0 to 100. A higher score indicates a higher confidence level. | Suspected content related to sexual orientation |
regional_cn | 0 to 100. A higher score indicates a higher confidence level. | Suspected politically sensitive content related to mainland China |
regional_illegal | 0 to 100. A higher score indicates a higher confidence level. | Suspected illegal political content |
regional_controversial | 0 to 100. A higher score indicates a higher confidence level. | Suspected political controversy |
regional_racism | 0 to 100. A higher score indicates a higher confidence level. | Suspected racism |
violent_extremist | 0 to 100. A higher score indicates a higher confidence level. | Suspected extremist organization |
violent_incidents | 0 to 100. A higher score indicates a higher confidence level. | Suspected extremist content |
violent_weapons | 0 to 100. A higher score indicates a higher confidence level. | Suspected weapons and ammunition |
violence_unscList | 0 to 100. A higher score indicates a higher confidence level. | United Nations sanctions list |
contraband_drug | 0 to 100. A higher score indicates a higher confidence level. | Suspected drug-related content |
contraband_gambling | 0 to 100. A higher score indicates a higher confidence level. | Suspected gambling-related content |
inappropriate_ethics | 0 to 100. A higher score indicates a higher confidence level. | Suspected unethical content |
inappropriate_profanity | 0 to 100. A higher score indicates a higher confidence level. | Suspected offensive or abusive content |
inappropriate_oral | 0 to 100. A higher score indicates a higher confidence level. | Suspected vulgar language |
inappropriate_religion | 0 to 100. A higher score indicates a higher confidence level. | Suspected religious blasphemy |
pt_to_contact | 0 to 100. A higher score indicates a higher confidence level. | Suspected contact information for advertising |
pt_to_sites | 0 to 100. A higher score indicates a higher confidence level. | Suspected redirection to external sites |
customized | 0 to 100. A higher score indicates a higher confidence level. | Hit a custom keyword list |
Manage labels
You can enable or disable each risk label in the console. For some risk labels, you can configure more granular detection scopes. For more information, see the Content Moderation console.
In the navigation pane on the left, choose Automated Moderation V2.0 > Text Moderation > Rule Configuration.
On the Rule Management tab, take the LLM moderation solution (aigc_moderation_byllm_global) as an example. In the Actions column, click Manage Detection Rules.
Select the detection type to adjust, such as inappropriate content detection.
Click Edit to enter edit mode and modify the detection status.
Click Save to save the new detection scope. The new detection scope takes about 2 to 5 minutes to take effect in the production environment.
4. Integration guide
Step 1: Activate the service
Visit Activate Service to activate the Text Moderation V2.0 service.
Step 2: Grant permissions to a RAM user
Before you integrate an SDK or call an API, you must grant permissions to a RAM user. When you call an Alibaba Cloud API, you must use an AccessKey pair to complete identity verification. You can create an AccessKey pair for your Alibaba Cloud account or a RAM user. For more information, see Obtain an AccessKey pair.
Procedure
Log on to the RAM console as a RAM administrator.
- Create a RAM user.
For more information, see Create a RAM user.
- Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user.For more information, see Grant permissions to a RAM user.
After completing the preceding operations, you can call the Content Moderation API as the RAM user.
Step 3: Install and integrate an SDK
For the SDKs for the Text Moderation Enhanced Edition V2.0 PLUS service and the integration guide, see SDKs and integration guide for Text Moderation Enhanced Edition V2.0 PLUS.
5. API description
Usage notes
You can call this operation to create a text content detection task. For more information about how to construct an HTTP request, see Request structure. You can also directly use a pre-constructed HTTP request. For more information, see Integration guide.
You can run this operation in OpenAPI Explorer without calculating signatures. After a successful call, OpenAPI Explorer automatically generates sample SDK code.
API operation: TextModerationPlus
Supported regions and endpoints:
Region | Public endpoint | VPC endpoint |
Singapore | https://green-cip.ap-southeast-1.aliyuncs.com | https://green-cip-vpc.ap-southeast-1.aliyuncs.com |
Billing information: This is a paid operation. You are charged only for requests that return an HTTP status code of 200. No fees are charged for requests that return other error codes. For more information about billing methods, see Billing.
QPS limits
The queries per second (QPS) limit for a single user is 20 calls/second. If the number of calls exceeds this limit, throttling is triggered. This may affect your business. If you require a higher QPS limit, contact your business manager.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | ugc_moderation_byllm_global |
|
ServiceParameters | JSONString | Yes | The parameter set required by the moderation service. The value is a JSON string. For a description of each string, see Table ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
content | String | Yes | Moderation content | The text content to be moderated. The content can be up to 2,000 characters in length. |
dataId | String | No | text0424**** | The data ID of the moderation object. The ID can be up to 64 characters in length and can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.). You can use it to uniquely identify your business data. |
Response parameters
Name | Type | Example | Description |
Code | Integer | 200 | The status code. For more information, see Code description. |
Data | JSONObject | {"Result":[...]} | The moderation result data. For more information, see Data. |
Message | String | OK | The response message for the request. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 2. Data
Name | Type | Example | Description |
Result | JSONArray | The results, such as risk labels and confidence scores. For more information, see Result. | |
RiskLevel | String | high | The risk level, which is returned based on the configured risk scores. Valid values:
Note We recommend that you handle high-risk content directly. Perform a manual review of medium-risk content. Handle low-risk content only when high recall is required. Otherwise, treat low-risk content the same as content with no detected risk. You can configure risk scores in the Content Moderation console. |
DataId | String | text0424**** | The data ID of the moderation object. Note If you specified the dataId parameter in the request, the same dataId is returned here. |
Table 3. Result
Name | Type | Example | Description |
Label | String | political_xxx | The label returned after the text content detection. Multiple labels and scores may be returned. For more information about supported labels, see Risk labels. |
Confidence | Float | 81.22 | The confidence score. Valid values: 0 to 100. The value is accurate to two decimal places. Some labels do not have confidence scores. |
Riskwords | String | AA,BB,CC | The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words. |
CustomizedHit | JSONArray | [{"LibName":"...","Keywords":"..."}] | When a custom keyword list is hit, the Label is customized. The name of the custom keyword list and the custom keywords are returned. For more information, see CustomizedHit. |
Description | String | Suspected pornographic content | The description of the Label field. Important This field explains the Label field and may be subject to change. We recommend that you handle moderation results based on the Label field instead of this field. |
Table 4. CustomizedHit
Name | Type | Example | Description |
LibName | String | Custom library 1 | The name of the custom keyword list. |
Keywords | String | Custom word 1,Custom word 2 | The custom keywords. Multiple keywords are separated by commas. |
Examples
Request example:
{
"Service": "aigc_moderation_byllm_global",
"ServiceParameters": {
"content": "testing content",
"dataId": "text0424****"
}
}Response example:
Hit a system policy:
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description":"Suspected political entity",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
},
{
"Label": "political_figure",
"Description":"Suspected political figure",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
}
],
"RiskLevel": "high",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Hit a custom keyword list:
{
"Code": 200,
"Data": {
"Result": [
{
"Description": "Hit a custom keyword list",
"CustomizedHit": [
{
"LibName": "Custom keyword list name 1",
"KeyWords": "Custom keyword"
}
],
"Confidence": 100,
"Label": "customized"
}
],
"RiskLevel": "high",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Code description
Code | Status code | Description |
200 | OK | The request is successful. |
400 | BAD_REQUEST | The request is invalid. This may be caused by incorrect request parameters. Check the request parameters carefully. |
408 | PERMISSION_DENY | Your account may not be authorized, have an overdue payment, not be activated, or be suspended. |
500 | GENERAL_ERROR | An error occurred. A temporary server-side error may have occurred. We recommend that you retry the request. If the error persists, contact us through Online Service. |
581 | TIMEOUT | A timeout occurred. We recommend that you retry the request. If the error persists, contact us through Online Service. |
588 | EXCEED_QUOTA | The request frequency exceeds the quota. |