All Products
Search
Document Center

AI Guardrails:LLM text moderation service

Last Updated:Mar 31, 2026
Important

This solution is rapidly evolving. For feedback or suggestions, contact your business manager.

The large language model (LLM)-based text moderation service detects non-compliant content in user-generated text. Built on LLM technology, it handles complex and nuanced language that rule-based systems miss—identifying subtle violations across six content categories.

Service overview

Content Moderation Enhanced Edition provides the following LLM-based text moderation service:

Service nameService IDUse case
UGC Text Moderation (LLM)ugc_moderation_byllm_global_global_globalAll text moderation in user-generated content (UGC) scenarios

Service name: UGC Text Moderation (LLM)

Service: ugc_moderation_byllm_global

This service is for UGC scenarios. It supports 119 languages, including Chinese, English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai, and efficiently and accurately identifies various non-compliant content. For details about the detectable content, see the Content Moderation console.

Recommended for text moderation in UGC scenarios.

For the full list of detectable content types, see the Content Moderation consoleContent Moderation console.

Billing

The service uses pay-as-you-go billing, which is the default billing method after activation.

Moderation typeServicesUnit price

LLM-based text moderation (text_advanced)

  • UGC Text Moderation (LLM): ugc_moderation_byllm_global

USD 0.6 per 1,000 calls

Note

You are charged once for each call to this service. For example, if you call the LLM-based Text Moderation Service in AIGC Scenarios 100 times, you are charged USD 0.06.

LLM-based text moderation (text_advanced)UGC Text Moderation (LLM): ugc_moderation_byllm_globalUSD 0.6 per 1,000 calls

Billing details:

  • Fees are calculated daily based on actual usage. No fees are charged if you do not call the service.V2.024

  • Each call to the service counts as one billable item.

  • The billing field moderationType corresponds to the Review Type field in billing details.

View your billing details.

Risk labels

Label categories

The service detects over 30 sub-labels across six categories. Multiple labels may be returned for a single piece of content. Each label includes a confidence score from 0 to 100 (higher scores indicate higher confidence, accurate to two decimal places).

Pornographic/Sexual content

LabelDescription
pornographic_adultSuspected pornographic content
sexual_termsSuspected sexual health content
sexual_suggestiveSuspected vulgar content
sexual_orientationSuspected content related to sexual orientation

Political/Regional content

LabelDescription
regional_cnSuspected politically sensitive content related to the Chinese mainland
regional_illegalSuspected illegal political content
regional_controversialSuspected political controversy
regional_racismSuspected racism

Violence/Extremism

LabelDescription
violent_extremistSuspected extremist organization
violent_incidentsSuspected extremist content
violent_weaponsSuspected weapons and ammunition
violence_unscListUnited Nations sanctions list

Contraband

LabelDescription
contraband_drugSuspected drug-related content
contraband_gamblingSuspected gambling-related content

Inappropriate content

LabelDescription
inappropriate_ethicsSuspected unethical content
inappropriate_profanitySuspected offensive or abusive content
inappropriate_oralSuspected vulgar language
inappropriate_religionSuspected religious blasphemy

Spam/Custom

LabelDescription
pt_to_contactSuspected contact information for advertising
pt_to_sitesSuspected redirection to external sites
customizedHit a custom keyword list

Risk levels

Each moderation response includes a RiskLevel field that summarizes the overall risk of the content:

Risk levelMeaningRecommended action V2.0
highHigh-risk content detected. Custom keyword list hits are always high by default.Block or remove content immediately.
mediumModerate risk detected.Route to manual review.
lowLow-level risk signals detected.Block only when high recall is required; otherwise treat as clean.
noneNo risk detected.No action required.

Configure the confidence score thresholds that map to each risk level in the Content Moderation consoleContent Moderation console.

Manage labels

Enable, disable, or configure each risk label from the console.

  1. In the left navigation pane, go to Automated Moderation V2.0 > Text Moderation > Rule Configuration.

  2. On the Rule Management tab, click Manage Detection Rules in the Actions column for the target moderation solution.

  3. Select the detection type to adjust (for example, inappropriate content detection).

  4. Click Edit to enter edit mode, then modify the detection settings.

  5. Click Save. Changes take effect in the production environment within 2 to 5 minutes.

Get started

Prerequisites

Before you begin, make sure you have:

Step 1: Activate the service

Activate the Text Moderation V2.0 service.Activate Service

Step 2: Set up RAM permissions

Content Moderation API calls require AccessKey-based authentication. Use a RAM user's AccessKey pair rather than your Alibaba Cloud account credentials.

  1. Log on to the RAM console as a RAM administrator.

  2. Create a RAM user. For details, see Create a RAM user.

  3. Attach the AliyunYundunGreenWebFullAccess system policy to the RAM user. For details, see Grant permissions to a RAM user.

  4. Create an AccessKey pair for the RAM user. For details, see Obtain an AccessKey pair.

Step 3: Install the SDK

For SDK downloads and integration instructions, see SDKs and integration guide for Text Moderation Enhanced Edition V2.0 PLUS.

API reference

Operation

TextModerationPlus

Submits a text content detection task. For the HTTP request structure, see Request structure. Alternatively, use the pre-constructed integration guide at Integration guide.

Run this operation directly in OpenAPI Explorer without calculating signatures. After a successful call, OpenAPI Explorer generates sample SDK code automatically.

Endpoints

RegionPublic endpointVPC endpoint
Singaporehttps://green-cip.ap-southeast-1.aliyuncs.comhttps://green-cip-vpc.ap-southeast-1.aliyuncs.com

Usage notes

  • Billing: This is a paid operation. Charges apply only to requests that return HTTP 200. Requests returning other status codes are not charged.

  • QPS limit: 20 calls/second per user. Exceeding this limit triggers throttling. Contact your business manager to request a higher limit.

Request parameters

NameTypeRequiredExampleDescription
ServiceStringYesugc_moderation_byllm_globalThe service to call. Valid value: ugc_moderation_byllm_global (UGC Text Moderation (LLM)).
ServiceParametersJSONStringYesThe moderation parameters as a JSON string. See the ServiceParameters table below.

ServiceParameters

NameTypeRequiredExampleDescription
contentStringYestesting contentThe text to moderate. Maximum length: 2,000 characters.
dataIdStringNotext0424****A unique identifier for your data. Maximum 64 characters. Accepted characters: letters, digits, underscores (_), hyphens (-), and periods (.).

Response parameters

Top-level response

NameTypeExampleDescription
CodeInteger200The status code. See Status codes.
DataJSONObjectThe moderation result. See the Data table below.
MessageStringOKThe response message.
RequestIdStringAAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****The request ID.

Data

NameTypeExampleDescription
ResultJSONArrayThe list of detected risk labels and scores. See the Result table below.
RiskLevelStringhighThe overall risk level: high, medium, low, or none.
DataIdStringtext0424****The data ID from the request, echoed back in the response.

Result

NameTypeExampleDescription
LabelStringpolitical_entityThe risk label. See Label categories.
ConfidenceFloat81.22The confidence score (0–100, two decimal places). Not all labels include a score.
RiskwordsStringAA,BB,CCComma-separated list of detected sensitive words. Not all labels return sensitive words.
CustomizedHitJSONArrayPopulated when Label is customized. See CustomizedHit below.
DescriptionStringSuspected pornographic contentA human-readable label description. This field may change. Base your moderation logic on Label, not Description.

CustomizedHit

NameTypeExampleDescription
LibNameStringCustom library 1The name of the matched custom keyword list.
KeywordsStringCustom word 1,Custom word 2The matched custom keywords, comma-separated.

Examples

Request

{
    "Service": "aigc_moderation_byllm_global",
    "ServiceParameters": {
        "content": "testing content",
        "dataId": "text0424****"
    }
}

Response: system policy hit

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description": "Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            },
            {
                "Label": "political_figure",
                "Description": "Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "Word A,Word B,Word C"
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Response: custom keyword list hit

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Description": "Hit a custom keyword list",
                "CustomizedHit": [
                    {
                        "LibName": "Custom keyword list name 1",
                        "KeyWords": "Custom keyword"
                    }
                ],
                "Confidence": 100,
                "Label": "customized"
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Status codes

CodeStatusDescription
200OKRequest succeeded.
400BAD_REQUESTInvalid request. Check your request parameters.
408PERMISSION_DENYAccount not authorized, has an overdue payment, is not activated, or is suspended.
500GENERAL_ERRORServer-side error. Retry the request. If the error persists, contact support via Online ServiceOnline Service.
581TIMEOUTRequest timed out. Retry the request. If the error persists, contact support via Online ServiceOnline Service.
588EXCEED_QUOTARequest frequency exceeds the quota.