LLM text moderation solution - AI Guardrails - Alibaba Cloud Documentation Center

Important

This solution is rapidly evolving. For feedback or suggestions, contact your business manager.

The large language model (LLM)-based text moderation service detects non-compliant content in user-generated text. Built on LLM technology, it handles complex and nuanced language that rule-based systems miss—identifying subtle violations across six content categories.

Service overview

Content Moderation Enhanced Edition provides the following LLM-based text moderation service:

Service name	Service ID	Use case
UGC Text Moderation (LLM)	`ugc_moderation_byllm_global_global_global`	All text moderation in user-generated content (UGC) scenarios

Service name: UGC Text Moderation (LLM)

Service: ugc_moderation_byllm_global

This service is for UGC scenarios. It supports 119 languages, including Chinese, English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai, and efficiently and accurately identifies various non-compliant content. For details about the detectable content, see the Content Moderation console.

Recommended for text moderation in UGC scenarios.

For the full list of detectable content types, see the Content Moderation consoleContent Moderation console.

Billing

The service uses pay-as-you-go billing, which is the default billing method after activation.

Moderation type	Services	Unit price
LLM-based text moderation (text_advanced)	UGC Text Moderation (LLM): ugc_moderation_byllm_global	USD 0.6 per 1,000 calls Note You are charged once for each call to this service. For example, if you call the LLM-based Text Moderation Service in AIGC Scenarios 100 times, you are charged USD 0.06.
LLM-based text moderation (`text_advanced`)	UGC Text Moderation (LLM): `ugc_moderation_byllm_global`	USD 0.6 per 1,000 calls

Billing details:

Fees are calculated daily based on actual usage. No fees are charged if you do not call the service.V2.024
Each call to the service counts as one billable item.
The billing field moderationType corresponds to the Review Type field in billing details.

View your billing details.

Risk labels

Label categories

The service detects over 30 sub-labels across six categories. Multiple labels may be returned for a single piece of content. Each label includes a confidence score from 0 to 100 (higher scores indicate higher confidence, accurate to two decimal places).

Pornographic/Sexual content

Label	Description
`pornographic_adult`	Suspected pornographic content
`sexual_terms`	Suspected sexual health content
`sexual_suggestive`	Suspected vulgar content
`sexual_orientation`	Suspected content related to sexual orientation

Political/Regional content

Label	Description
`regional_cn`	Suspected politically sensitive content related to the Chinese mainland
`regional_illegal`	Suspected illegal political content
`regional_controversial`	Suspected political controversy
`regional_racism`	Suspected racism

Violence/Extremism

Label	Description
`violent_extremist`	Suspected extremist organization
`violent_incidents`	Suspected extremist content
`violent_weapons`	Suspected weapons and ammunition
`violence_unscList`	United Nations sanctions list

Contraband

Label	Description
`contraband_drug`	Suspected drug-related content
`contraband_gambling`	Suspected gambling-related content

Inappropriate content

Label	Description
`inappropriate_ethics`	Suspected unethical content
`inappropriate_profanity`	Suspected offensive or abusive content
`inappropriate_oral`	Suspected vulgar language
`inappropriate_religion`	Suspected religious blasphemy

Spam/Custom

Label	Description
`pt_to_contact`	Suspected contact information for advertising
`pt_to_sites`	Suspected redirection to external sites
`customized`	Hit a custom keyword list

Risk levels

Each moderation response includes a RiskLevel field that summarizes the overall risk of the content:

Risk level	Meaning	Recommended action V2.0
`high`	High-risk content detected. Custom keyword list hits are always `high` by default.	Block or remove content immediately.
`medium`	Moderate risk detected.	Route to manual review.
`low`	Low-level risk signals detected.	Block only when high recall is required; otherwise treat as clean.
`none`	No risk detected.	No action required.

Configure the confidence score thresholds that map to each risk level in the Content Moderation consoleContent Moderation console.

Manage labels

Enable, disable, or configure each risk label from the console.

In the left navigation pane, go to Automated Moderation V2.0 > Text Moderation > Rule Configuration.
On the Rule Management tab, click Manage Detection Rules in the Actions column for the target moderation solution.
Select the detection type to adjust (for example, inappropriate content detection).
Click Edit to enter edit mode, then modify the detection settings.
Click Save. Changes take effect in the production environment within 2 to 5 minutes.

Get started

Prerequisites

Before you begin, make sure you have:

An activated Text Moderation V2.0 service V2.0
A RAM user with the AliyunYundunGreenWebFullAccess policy

Step 1: Activate the service

Activate the Text Moderation V2.0 service.Activate Service

Step 2: Set up RAM permissions

Content Moderation API calls require AccessKey-based authentication. Use a RAM user's AccessKey pair rather than your Alibaba Cloud account credentials.

Log on to the RAM console as a RAM administrator.
Create a RAM user. For details, see Create a RAM user.
Attach the AliyunYundunGreenWebFullAccess system policy to the RAM user. For details, see Grant permissions to a RAM user.
Create an AccessKey pair for the RAM user. For details, see Obtain an AccessKey pair.

Step 3: Install the SDK

For SDK downloads and integration instructions, see SDKs and integration guide for Text Moderation Enhanced Edition V2.0 PLUS.

API reference

Operation

TextModerationPlus

Submits a text content detection task. For the HTTP request structure, see Request structure. Alternatively, use the pre-constructed integration guide at Integration guide.

Run this operation directly in OpenAPI Explorer without calculating signatures. After a successful call, OpenAPI Explorer generates sample SDK code automatically.

Endpoints

Region	Public endpoint	VPC endpoint
Singapore	`https://green-cip.ap-southeast-1.aliyuncs.com`	`https://green-cip-vpc.ap-southeast-1.aliyuncs.com`

Usage notes

Billing: This is a paid operation. Charges apply only to requests that return HTTP 200. Requests returning other status codes are not charged.
QPS limit: 20 calls/second per user. Exceeding this limit triggers throttling. Contact your business manager to request a higher limit.

Request parameters

Name	Type	Required	Example	Description
`Service`	String	Yes	`ugc_moderation_byllm_global`	The service to call. Valid value: `ugc_moderation_byllm_global` (UGC Text Moderation (LLM)).
`ServiceParameters`	JSONString	Yes	—	The moderation parameters as a JSON string. See the `ServiceParameters` table below.

ServiceParameters

Name	Type	Required	Example	Description
`content`	String	Yes	`testing content`	The text to moderate. Maximum length: 2,000 characters.
`dataId`	String	No	`text0424****`	A unique identifier for your data. Maximum 64 characters. Accepted characters: letters, digits, underscores (`_`), hyphens (`-`), and periods (`.`).

Response parameters

Top-level response

Name	Type	Example	Description
`Code`	Integer	`200`	The status code. See Status codes.
`Data`	JSONObject	—	The moderation result. See the `Data` table below.
`Message`	String	`OK`	The response message.
`RequestId`	String	`AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****`	The request ID.

Data

Name	Type	Example	Description
`Result`	JSONArray	—	The list of detected risk labels and scores. See the `Result` table below.
`RiskLevel`	String	`high`	The overall risk level: `high`, `medium`, `low`, or `none`.
`DataId`	String	`text0424****`	The data ID from the request, echoed back in the response.

Result

Name	Type	Example	Description
`Label`	String	`political_entity`	The risk label. See Label categories.
`Confidence`	Float	`81.22`	The confidence score (0–100, two decimal places). Not all labels include a score.
`Riskwords`	String	`AA,BB,CC`	Comma-separated list of detected sensitive words. Not all labels return sensitive words.
`CustomizedHit`	JSONArray	—	Populated when `Label` is `customized`. See `CustomizedHit` below.
`Description`	String	`Suspected pornographic content`	A human-readable label description. This field may change. Base your moderation logic on `Label`, not `Description`.

CustomizedHit

Name	Type	Example	Description
`LibName`	String	`Custom library 1`	The name of the matched custom keyword list.
`Keywords`	String	`Custom word 1,Custom word 2`	The matched custom keywords, comma-separated.

Examples

Request

{
    "Service": "aigc_moderation_byllm_global",
    "ServiceParameters": {
        "content": "testing content",
        "dataId": "text0424****"
    }
}

Response: system policy hit

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description": "Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            },
            {
                "Label": "political_figure",
                "Description": "Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Response: custom keyword list hit

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Description": "Hit a custom keyword list",
                "CustomizedHit": [
                    {
                        "LibName": "Custom keyword list name 1",
                        "KeyWords": "Custom keyword"
                    }
                ],
                "Confidence": 100,
                "Label": "customized"
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Status codes

Code	Status	Description
`200`	OK	Request succeeded.
`400`	BAD_REQUEST	Invalid request. Check your request parameters.
`408`	PERMISSION_DENY	Account not authorized, has an overdue payment, is not activated, or is suspended.
`500`	GENERAL_ERROR	Server-side error. Retry the request. If the error persists, contact support via Online ServiceOnline Service.
`581`	TIMEOUT	Request timed out. Retry the request. If the error persists, contact support via Online ServiceOnline Service.
`588`	EXCEED_QUOTA	Request frequency exceeds the quota.