Content Moderation supports custom text libraries. You can use custom text libraries to ensure that moderation results meet specific business requirements. You can use custom text libraries for text violation detection in images, ad violation detection, text anti-spam, file anti-spam, and audio anti-spam. You can specify the text to be blocked, passed, or reviewed in different custom text libraries to meet specific management requirements.
Background information
- Feedback-based text libraries are automatically created to accommodate the text that is reviewed. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries. For example, you cannot disable or delete a feedback-based text library. For more information about human review, see Review machine-assisted moderation results.
- Self-managed text libraries are libraries that you create to moderate text in a specific moderation scenario or a specific type of moderation scenario. You can manage the text in self-managed text libraries and perform operations on the self-managed text libraries.
Text types
You can add terms and text patterns to custom text libraries.
- Terms
Terms are designed to moderate words in text. If a sentence or a piece of text contains a specific term, the term is hit. You can add different terms for different business scenarios.
In Content Moderation, you can apply term-based moderation to text violation detection in images and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios. The relevant parameters in these two scenarios may be slightly different.
You can add the AND (&) and NOT (~) logical operators in Chinese terms. Examples:- The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
- The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
Note If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term. - Text patterns
Text patterns are designed to compare the similarity between sentences or pieces of text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a specific degree, the text pattern is hit.
You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business requirements. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.
Limits
Type | Item | Limit |
---|---|---|
Self-managed text library | Quantity | Supports a maximum of 10 self-managed text libraries. |
Self-managed text library | Name length | Supports a maximum of 20 characters in length for each library name. |
Term | Term type |
|
Term | Number of terms in a text library | Supports a maximum of 10,000 terms in a text library. |
Term | Term length | Supports a maximum of 50 characters in length for each term, including logical operators. |
Term | Encoding for Chinese terms | UTF-8. |
Term | Term format | The following special characters in full-width and half-width forms are not supported: At signs (@), number signs (#), dollar signs ($), percent signs (%), carets (^), asterisks (*), parentheses (()), angle brackets (<>), forward slashes (/), question marks (?), commas (,), periods (.), semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs. |
Text pattern | Text pattern length | Supports 20 to 4,000 characters in length for each text pattern. Note If the text added to the text library is excessively long, invalid matches may occur. We recommend that you set each text pattern to a maximum of 200 characters in length. |
Text pattern | Number of text patterns in a text library | Supports a maximum of 10,000 text patterns in a text library. |
Text pattern | Encoding | UTF-8. |
Text pattern | Text content | Requires clear Chinese semantic characteristics that can be extracted. If few semantic characteristics can be identified from a text pattern, this text pattern is ignored. Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored. |
Procedure
- Log on to the Content Moderation console.
- In the left-side navigation pane, choose .
- Click Create Text Library.
- In the Create Custom Text Library dialog box, set the parameters based on the Parameters for creating a custom text library table. Then, click OK.
Table 1. Parameters for creating a custom text library Parameter Description Name The name of the custom text library. You can set the same name for multiple text libraries. However, we recommend that you set a unique name for each text library. Scene The scenario to which the text library applies. Valid values: - Text Anti-spam: text anti-spam where the value of the scene parameter contains antispam in API requests
- Ad: image moderation where the value of the scene parameter contains ad in API requests
Type The text type of the text library. Valid values: - Keyword: matches the text to be moderated that contains terms. You can detect more risky text by using terms.
- Similar Text: matches the text to be moderated that is similar to text patterns at a specific probability. You can detect risky text more accurately by using text patterns. Note You can set this parameter to Similar Text only if the Scene parameter is set to Text Anti-spam.
Match Mode The match mode applied to the custom text library. This parameter is required if the Type parameter is set to Keyword. Valid values: - Precise: matches the text to be moderated that contains the same terms in the text library.
- Check after Preprocess Texts: preprocesses the terms and the text to be moderated, and then matches the preprocessed text to be moderated that hits the preprocessed terms. The terms and the text to be moderated are preprocessed in the following ways:
- Convert uppercase letters to lowercase letters. For example, if the text to be moderated is "bitCoin", the term "bitcoin" is hit.
- Convert traditional Chinese characters to simplified Chinese characters.
- Convert similar words.
Note By default, the Check after Preprocess Texts mode is selected for libraries that consist of text patterns.
List Category The category of the moderation result that is returned based on the custom text library. - If the Type parameter is set to Keyword, valid values of the List Category parameter are:
- Block list: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block.
- Review List: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review.
- Filter List: The text excluding that hits terms in the text library is moderated.
- If the Type parameter is set to Similar Text, valid values of the List Category parameter are:
- Block list: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block.
- Review List: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review.
- Trust list: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of pass.
bizType The business scenario to which the custom text library applies. You can specify different text libraries in API requests to meet your business requirements. For example, you can use the bizType parameter to specify the text library to be applied in a specific moderation scenario. The bizType parameter takes effect in the following ways: - If the bizType parameter in a moderation request is set to A, the text libraries of which the bizType parameter is set to A are used for moderation. These text libraries can be used only if they are enabled.
- In other cases, all enabled text libraries are used for moderation.
After the text library is created, you can view it in the text library list. - Manage terms or text patterns in the text library. The Custom Text Library tab displays all custom text libraries. The libraries marked with System and named in the SCENARIO_FEEDBACK_WHITE or SCENARIO_FEEDBACK_BLACK format are feedback-based text libraries. For example, the ANTISPAM_FEEDBACK_BLACK library is a blacklist that consists of text patterns added by the system and is used for text anti-spam.
Delete, modify, or disable a text library
On the Custom Text Library tab, you can click Delete, Edit, or Disable in the Actions column to delete, modify, or disable a self-managed text library.