Manage custom text libraries - Content Moderation - Alibaba Cloud Documentation Center

Content Moderation supports custom text libraries. You can use custom text libraries to ensure that moderation results meet specific business requirements. You can use custom text libraries for text violation detection in images, ad violation detection, text anti-spam, file anti-spam, and audio anti-spam. You can specify the text to be blocked, passed, or reviewed in different custom text libraries to meet specific management requirements.

Background information

Important

We recommend that you follow the instructions in this topic to use custom text libraries. This prevents you from adding improper terms that affect the accuracy of moderation results.

Custom text libraries include feedback-based text libraries and self-managed text libraries:

Feedback-based text libraries are automatically created to accommodate the text that is reviewed. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries. For example, you cannot disable or delete a feedback-based text library. For more information about human review, see Review machine-assisted moderation results.
Self-managed text libraries are libraries that you create to moderate text in a specific moderation scenario or a specific type of moderation scenario. You can manage the text in self-managed text libraries and perform operations on the self-managed text libraries.

Note

You can create a maximum of 10 self-managed text libraries.

This topic describes how to manage custom text libraries for the Content Moderation API in the Content Moderation console. You can also manage custom text libraries by calling API operations or using Content Moderation SDKs. For more information, see the following topics:

Text types

You can add terms and text patterns to custom text libraries.

Terms
Terms are designed to moderate words in text. If a sentence or a piece of text contains a specific term, the term is hit. You can add different terms for different business scenarios.
In Content Moderation, you can apply term-based moderation to text violation detection in images and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios. The relevant parameters in these two scenarios may be slightly different.
You can add the AND (&) and NOT (~) logical operators in Chinese terms. Examples:
- The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
- The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
Note
If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term.
Text patterns
Text patterns are designed to compare the similarity between sentences or pieces of text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a specific degree, the text pattern is hit.
You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business requirements. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.

Limits

Type	Item	Limit
Self-managed text library	Quantity	Supports a maximum of 10 self-managed text libraries.
Self-managed text library	Name length	Supports a maximum of 20 characters in length for each library name.
Term	Term type	Chinese terms are supported. Letters and digits can be used as terms. Note Each combination of letters and digits is considered as a word during word-breaking. English words or phrases cannot be used as terms.
Term	Number of terms in a text library	Supports a maximum of 10,000 terms in a text library.
Term	Term length	Supports a maximum of 50 characters in length for each term, including logical operators.
Term	Encoding for Chinese terms	UTF-8.
Term	Term format	The following special characters in full-width and half-width forms are not supported: At signs (@), number signs (#), dollar signs ($), percent signs (%), carets (^), asterisks (*), parentheses (()), angle brackets (<>), forward slashes (/), question marks (?), commas (,), periods (.), semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs.
Text pattern	Text pattern length	Supports 20 to 4,000 characters in length for each text pattern. Note If the text added to the text library is excessively long, invalid matches may occur. We recommend that you set each text pattern to a maximum of 200 characters in length.
Text pattern	Number of text patterns in a text library	Supports a maximum of 10,000 text patterns in a text library.
Text pattern	Encoding	UTF-8.
Text pattern	Text content	Requires clear Chinese semantic characteristics that can be extracted. If few semantic characteristics can be identified from a text pattern, this text pattern is ignored. Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored.

Procedure

Log on to the Content Moderation console.
In the left-side navigation pane, choose Machine audit V1.0 > Risk Libraries.
Click Create Text Library.

In the Create Custom Text Library dialog box, set the parameters based on the Parameters for creating a custom text library table. Then, click OK.

Table 1. Parameters for creating a custom text library

Parameter	Description
Name	The name of the custom text library. You can set the same name for multiple text libraries. However, we recommend that you set a unique name for each text library.
Scene	The scenario to which the text library applies. Valid values: Text Anti-spam: text anti-spam where the value of the scene parameter contains antispam in API requests Ad: image moderation where the value of the scene parameter contains ad in API requests
Type	The text type of the text library. Valid values: Keyword: matches the text to be moderated that contains terms. You can detect more risky text by using terms. Similar Text: matches the text to be moderated that is similar to text patterns at a specific probability. You can detect risky text more accurately by using text patterns. Note You can set this parameter to Similar Text only if the Scene parameter is set to Text Anti-spam.
Match Mode	The match mode applied to the custom text library. This parameter is required if the Type parameter is set to Keyword. Valid values: Precise: matches the text to be moderated that contains the same terms in the text library. Check after Preprocess Texts: preprocesses the terms and the text to be moderated, and then matches the preprocessed text to be moderated that hits the preprocessed terms. The terms and the text to be moderated are preprocessed in the following ways: Convert uppercase letters to lowercase letters. For example, if the text to be moderated is "bitCoin", the term "bitcoin" is hit. Convert traditional Chinese characters to simplified Chinese characters. Convert similar words. Note By default, the Check after Preprocess Texts mode is selected for libraries that consist of text patterns.
List Category	The category of the moderation result that is returned based on the custom text library. If the Type parameter is set to Keyword, valid values of the List Category parameter are: Block list: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block. Review List: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review. Filter List: The text excluding that hits terms in the text library is moderated. If the Type parameter is set to Similar Text, valid values of the List Category parameter are: Block list: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block. Review List: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review. Trust list: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of pass.
bizType	The business scenario to which the custom text library applies. You can specify different text libraries in API requests to meet your business requirements. For example, you can use the bizType parameter to specify the text library to be applied in a specific moderation scenario. The bizType parameter takes effect in the following ways: If the bizType parameter in a moderation request is set to A, the text libraries of which the bizType parameter is set to A are used for moderation. These text libraries can be used only if they are enabled. In other cases, all enabled text libraries are used for moderation.

After the text library is created, you can view it in the text library list.

Manage terms or text patterns in the text library.
The Custom Text Library tab displays all custom text libraries. The libraries marked with System and named in the SCENARIO_FEEDBACK_WHITE or SCENARIO_FEEDBACK_BLACK format are feedback-based text libraries. For example, the ANTISPAM_FEEDBACK_BLACK library is a blacklist that consists of text patterns added by the system and is used for text anti-spam.
1. Find the term library that you want to manage and click Manage in the Actions column.
2. On the Text Libraries page, manage terms in the library.
  The Text Libraries page displays all terms added to the library and displays the number of times each term is hit in the last seven days in the Detected in Last Seven Days column, excluding the statistics on the current day.
  Note
  You can add and delete terms. The operations take effect within about 15 minutes.
  - Click Add Keyword or Import and add terms as prompted.
  - Select one or more terms that you no longer need and click Batch Delete to delete the terms. You can also find a specific term and click Delete in the Actions column to delete the term.

Delete, modify, or disable a text library

On the Custom Text Library tab, you can click Delete, Edit, or Disable in the Actions column to delete, modify, or disable a self-managed text library.