Text Moderation 2.0 features upgraded multilingual model capabilities. It can automatically identify languages and supports a wider range of languages. It provides moderation policies and a tag system tailored for international business scenarios. This topic describes the features and usage of the Text Moderation 2.0 multilingual service.
Features
Compared with the multilingual moderation service of Text Moderation 1.0, Text Moderation 2.0 uses independent policies and a tag system to meet international business requirements. It also provides more features to simplify business operations and assist with manual review.
Comparison item | Text Moderation 2.0 | Text Moderation 1.0 |
Multilingual capabilities | Supports 38 languages. | Supports 18 languages. |
Moderation capabilities | Uses multiple models in parallel. The policies are more precise because they are based on language and region attributes. | Uses a single model. The policies balance accuracy and recall based on language attributes. |
Tag system | Uses an internationalized tag system. It adds internationalized tags such as profanity and regional, and supports multiple risk tags and sub-tags. | Uses the tag system for Chinese scenarios and supports only a single risk tag. |
Detection scope | You can configure all detection scopes in the console and enable or disable them as needed. The configurations directly map to the detection results. | Supports general detection scopes, which do not directly map to the detection results. |
API features | You do not need to specify the input language because the service can automatically identify it. After moderation, the service returns the language type and the translated English content to assist with manual review. | You must specify the input language. The service does not return translated content. |
Supported languages
The Text Moderation 2.0 multilingual service supports 38 languages.
Language type | English name | Language code |
English | English | en |
Simplified Chinese | Chinese | zh |
Traditional Chinese | Traditional Chinese | zh-tw |
Indonesian | Indonesian | id |
Malay | Malay | ms |
Thai | Thai | th |
Vietnamese | Vietnamese | vi |
Tagalog | Tagalog | tl |
Hindi | Hindi | hi |
Arabic | Arabic | ar |
Turkish | Turkish | tr |
French | French | fr |
German | German | de |
Russian | Russian | ru |
Portuguese | Portuguese | pt |
Spanish | Spanish | es |
Italian | Italian | it |
Dutch | Dutch | nl |
Polish | Polish | pl |
Japanese | Japanese | ja |
Korean | Korean | ko |
Urdu | Urdu | ur |
Uighur | Uighur | ug |
Bengali | Bengali | bn |
Persian | Persian | fa |
Swedish | Swedish | sv |
Danish | Danish | da |
Norwegian | Norwegian | no |
Icelandic | Icelandic | is |
Finnish | Finnish | fi |
Belarusian | Belarusian | be |
Lithuanian | Lithuanian | lt |
Czech | Czech | cs |
Slovak | Slovak | sk |
Hungarian | Hungarian | hu |
Modern Greek | Modern Greek | el |
Romanian | Romanian | ro |
Irish | Irish | ga |
Internationalized tags
The Text Moderation 2.0 Multilingual PLUS service uses an internationalized tag system. If the content contains multiple types of risks, multiple tags can be returned simultaneously. Tag categories include, but are not limited to, the following:
Label | Confidence score range | Description |
pornographic_adult | 0 to 100. The higher the score, the higher the confidence level. | Suspected pornographic content |
sexual_terms | 0 to 100. The higher the score, the higher the confidence level. | Suspected sexual health content |
sexual_suggestive | 0 to 100. The higher the score, the higher the confidence level. | Suspected vulgar content |
sexual_orientation | 0 to 100. The higher the score, the higher the confidence level. | Suspected sexual orientation content |
regional_cn | 0 to 100. The higher the score, the higher the confidence level. | Suspected domestic political content |
regional_illegal | 0 to 100. The higher the score, the higher the confidence level. | Suspected illegal political content |
regional_controversial | 0 to 100. The higher the score, the higher the confidence level. | Suspected political controversy |
regional_racism | 0 to 100. The higher the score, the higher the confidence level. | Suspected racism |
violent_extremist | 0 to 100. The higher the score, the higher the confidence level. | Suspected extremist organizations |
violent_incidents | 0 to 100. The higher the score, the higher the confidence level. | Suspected extremist content |
violent_weapons | 0 to 100. The higher the score, the higher the confidence level. | Suspected weapons and ammunition |
violence_unscList | 0 to 100. The higher the score, the higher the confidence level. | United Nations Security Council Consolidated List |
contraband_drug | 0 to 100. The higher the score, the higher the confidence level. | Suspected drug-related content |
contraband_gambling | 0 to 100. The higher the score, the higher the confidence level. | Suspected gambling-related content |
inappropriate_ethics | 0 to 100. The higher the score, the higher the confidence level. | Suspected content with undesirable values |
inappropriate_profanity | 0 to 100. The higher the score, the higher the confidence level. | Suspected abusive or insulting content |
inappropriate_oral | 0 to 100. The higher the score, the higher the confidence level. | Suspected vulgar oral content |
inappropriate_religion | 0 to 100. The higher the score, the higher the confidence level. | Suspected religious profanity |
pt_to_contact | 0 to 100. The higher the score, the higher the confidence level. | Suspected contact information for advertising |
pt_to_sites | 0 to 100. The higher the score, the higher the confidence level. | Suspected off-site traffic diversion |
customized | 0 to 100. The higher the score, the higher the confidence level. | Hit a custom keyword library |
Billing
The Text Moderation 2.0 service supports the pay-as-you-go billing method.
Pay-as-you-go
After you activate the Text Moderation 2.0 service, the default billing method is pay-as-you-go. You are charged based on your actual usage on a daily basis. If you do not call the service, you are not charged.
Moderation type | Supported business scenario (service) | Unit price |
Basic text moderation (text_standard) | Multilingual detection for international business (professional version): comment_multilingual_pro_global | USD 0.3 per 1,000 calls |
Integration guide
Step 1: Activate the service
Visit Activate Service to activate the Text Moderation 2.0 service.
After you activate the Text Moderation 2.0 service, the default billing method is pay-as-you-go. You are charged based on your actual usage on a daily basis. If you do not call the service, you are not charged. After you integrate with the API, the system automatically generates bills based on your usage. For more information, see Billing.
Step 2: Grant permissions to a RAM user
Before you integrate with the SDK or API, you must grant permissions to a RAM user. You can create an AccessKey pair for an Alibaba Cloud account or a RAM user. You must use an AccessKey pair to complete identity verification when you call an Alibaba Cloud API. For more information about how to obtain an AccessKey pair, see Obtain an AccessKey pair.
Procedure
Log on to the RAM console as a RAM administrator.
- Create a RAM user.
For more information, see Create a RAM user.
- Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user.For more information, see Grant permissions to a RAM user.
After completing the preceding operations, you can call the Content Moderation API as the RAM user.
Step 3: Install and integrate with the SDK
The service is available in the following regions. For more information about the SDK for the Text Moderation 2.0 service, see Integration guide.
Region | Public endpoint | VPC endpoint |
Singapore | green-cip.ap-southeast-1.aliyuncs.com | green-cip-vpc.ap-southeast-1.aliyuncs.com |
UK (London) | https://green-cip.eu-west-1.aliyuncs.com | Not available |
US (Virginia) | https://green-cip.us-east-1.aliyuncs.com | https://green-cip-vpc.us-east-1.aliyuncs.com |
US (Silicon Valley) | https://green-cip.us-west-1.aliyuncs.com | Not available |
Germany (Frankfurt) | green-cip.eu-central-1.aliyuncs.com | Not available |
The configurations for the UK (London) region reuse the console configurations of the Singapore region. The configurations for the US (Silicon Valley) and Germany (Frankfurt) regions reuse the console configurations of the US (Virginia) region.
API
Usage notes
Business interface: TextModerationPlus
You can call this operation to create a text content moderation task. For more information about how to construct an HTTP request, see Request structure. You can also use a sample HTTP request. For more information, see Integration guide.
Billing information:
This is a billable operation. You are charged only for requests that return a 200 HTTP status code. You are not charged for requests that return other error codes. For more information about billing methods, see Billing.
QPS limit
The queries per second (QPS) limit for a single user on this API is 100 calls per second. If you exceed this limit, your API calls are throttled, which may affect your business.
Request parameters
Name | Type | Required | Example | Description |
Service | String | Yes | comment_multilingual_pro_global | The type of the moderation service. Valid value: comment_multilingual_pro_global: multilingual detection for international business |
ServiceParameters | JSONString | Yes | The parameter set required by the moderation service. It is a JSON string. For more information about each string, see ServiceParameters. |
Table 1. ServiceParameters
Name | Type | Required | Example | Description |
content | String | Yes | Content to be detected | The text content to be moderated. The text cannot exceed 600 characters in length. |
dataId | String | No | text0424**** | The data ID that corresponds to the detection object. It can consist of uppercase letters, lowercase letters, digits, underscores (_), hyphens (-), and periods (.). The ID cannot exceed 64 characters in length and can be used to uniquely identify your business data. |
Response parameters
Name | Type | Example | Description |
Code | Integer | 200 | The status code. For more information, see Code description. |
Data | JSONObject | The data of the moderation result. For more information, see Data. | |
Message | String | OK | The response message for the request. |
RequestId | String | AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** | The request ID. |
Table 2. Data
Name | Type | Example | Description |
Result | JSONArray | The results such as the detected risk labels and confidence scores. For more information, see Result. | |
DataId | String | text0424**** | The data ID that corresponds to the detection object. Note If you pass the dataId in the request parameters for detection, the corresponding dataId is returned here. |
RiskLevel | String | high | The risk level, which is returned based on the configured high and low risk scores. Valid values:
Note We recommend that you handle high-risk content directly. We recommend that you manually review medium-risk content. We recommend that you handle low-risk content only when you have high recall requirements. In other cases, we recommend that you handle low-risk content in the same way as content with no risk detected. You can configure risk scores in the Content Moderation console. |
TranslatedContent | String | Translated text | The translated text content. |
DetectedLanguage | String | en | The detected language. |
Table 3. Result
Name | Type | Example | Description |
Label | String | political_xxx | The label returned after the text content is moderated. Multiple labels and scores may be detected. For more information about the supported labels, see the Internationalized tags section. |
Confidence | Float | 81.22 | The confidence score, which ranges from 0 to 100 and is accurate to two decimal places. Some labels do not have a confidence score. |
Riskwords | String | AA,BB,CC | The detected sensitive words. Multiple words are separated by commas. Some labels do not return sensitive words. |
CustomizedHit | JSONArray | [{"LibName":"...","Keywords":"..."}] | When a custom library is hit, the Label is customized, and the name of the custom library and the custom words are returned. For more information, see CustomizedHit. |
Description | String | Suspected pornographic content | The description of the Label field. Important This field is an explanation of the Label field. It may be changed or adjusted. We recommend that you handle the Label field for the actual processing result and do not rely on this field for result disposition. |
Table 4. CustomizedHit
Name | Type | Example | Description |
LibName | String | Custom library 1 | The name of the custom library. |
Keywords | String | Custom word 1,Custom word 2 | The custom words. Multiple words are separated by commas. |
Example
Request example
{
"Service": "comment_detection_pro_global",
"ServiceParameters": {
"content": "testing content",
"dataId": "text0424****"
}
}Response example:
Hit a system policy:
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description": "Suspected political entity",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
},
{
"Label": "political_figure",
"Description": "Suspected political figure",
"Confidence": 100.0,
"RiskWords": "Word A,Word B,Word C"
}
],
"RiskLevel": "high",
"DetectedLanguage": "en",
"TranslatedContent": "Translated text content",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Hit a custom keyword library:
{
"Code": 200,
"Data": {
"Result": [
{
"Description": "Hit a custom library",
"CustomizedHit": [
{
"LibName": "Custom library name 1",
"KeyWords": "Custom keyword"
}
],
"Confidence": 100,
"Label": "customized"
}
],
"RiskLevel": "high",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}Code description
Code | Status code | Description |
200 | OK | The request was successful. |
400 | BAD_REQUEST | The request is invalid. This may be because the request parameters are incorrect. Check the request parameters carefully. |
407 | NOT_SUPPORT | The language type cannot be identified or is not supported. |
408 | PERMISSION_DENY | This may be because your account is not authorized, has an overdue payment, has not been activated, or has been disabled. |
500 | GENERAL_ERROR | An error occurred. This may be a temporary server-side error. We recommend that you retry. If this error code persists, contact us through online support. |
581 | TIMEOUT | A timeout occurred. We recommend that you retry. If this error code persists, contact us through online support. |
588 | EXCEED_QUOTA | The request frequency exceeds the quota. |