Content Moderation bases its moderation service on the global text library of Alibaba Cloud by default, which can meet most of the moderation needs. To meet specific business needs, Content Moderation also supports custom text libraries. You can manage text to be blocked, passed, or reviewed separately in different custom text libraries to handle emergencies.
Background information
- Feedback-based text libraries are automatically created to accommodate the text that is reviewed. For more information, see Review data. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries, such as disabling or deleting a feedback-based text library.
- User-created text libraries are created by you to moderate text in a specific or a type of moderation scenario. You can manage the text in user-created text libraries and perform operations on user-created text libraries.
When using the Content Moderation API, you can apply custom text libraries to ad violation detection and text anti-spam.
Text types
The text in custom text libraries consists of terms and text patterns.
- Terms
Terms are designed to moderate words in text. If a sentence or a piece of text contains a certain term, the term is hit. You can add different terms for different business scenarios.
In Content Moderation, you can apply term-based moderation to ad violation detection and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios.
You can add the AND (&) and NOT (~) logical operators in Chinese terms. For example:- The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
- The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
Note If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term. - Text patterns
Text patterns are designed to compare the similarity between sentences or text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a certain degree, the text pattern is hit.
You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business needs. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.
Limits
Type | Item | Limit |
---|---|---|
User-created text library | Quantity | Supports a maximum of 10 user-created text libraries. |
User-created text library | Name length | Supports a maximum of 20 characters in length for each library name. |
Term library | Term type | Supports Chinese characters and combinations of letters and digits. Currently, English
words or phrases cannot be used as terms.
Note Each combination of letters and digits is considered as a word during word-breaking.
|
Term library | Number of terms in a text library | Supports a maximum of 10,000 terms in a text library. |
Term library | Term length | Supports a maximum of 50 characters in length for each term, including logical operators. |
Term library | Encoding for Chinese terms | Supports UTF-8 encoding. |
Term library | Term format | Excludes the following full-width and half-width special characters: at signs (@), number signs (#), dollar signs ($), percent signs (%), carets (^), asterisks (*), parentheses (), angle brackets (<>), forward slashes (/), question marks (?), commas (,), periods (.), semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs. |
Text pattern library | Text pattern library | Supports 10 to 4,000 characters in length for each text pattern.
Note If the text added to the text library is too long, it may cause incorrect matches.
We recommend that each text pattern be up to 200 characters in length. You can submit
a ticket to seek technical support.
|
Text pattern library | Number of text patterns in a text library | Supports a maximum of 10,000 text patterns in a text library. |
Text pattern library | Encoding | Supports UTF-8 encoding. |
Text pattern library | Text content | Requires clear Chinese semantic features that can be extracted. If few semantic features
can be identified from a text pattern, this text pattern is ignored.
Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored.
|