Content Moderation supports custom text libraries. You can use custom text libraries to ensure that moderation results meet specific business needs. You can use custom text libraries for text violation detection in images, ad violation detection, text anti-spam, file anti-spam, and audio anti-spam. You can specify the text to be blocked, passed, or reviewed in different custom text libraries to meet specific management needs.

Background information

Notice We recommend that you follow the instructions in this topic to use custom text libraries. This prevents you from adding improper terms that affect the accuracy of moderation results.
Custom text libraries include feedback-based text libraries and self-managed text libraries:
  • Feedback-based text libraries are automatically created to accommodate the text that is reviewed. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries. For example, you cannot disable or delete a feedback-based text library. For more information about human review, see Review machine-assisted moderation results.
  • Self-managed text libraries are libraries that you create to moderate text in a specific moderation scenario or a specific type of moderation scenario. You can manage the text in self-managed text libraries and perform operations on self-managed text libraries.
Note You can create up to 10 self-managed text libraries.

When you use the Content Moderation API, you can use custom text libraries to detect text violation in images and implement text anti-spam.

This topic describes how to manage custom text libraries for the Content Moderation API in the Alibaba Cloud Content Moderation console. You can also manage custom text libraries by calling API operations or using Content Moderation SDKs. For more information, see the following topics:

Text types

You can add terms and text patterns to custom text libraries.

  • Terms

    Terms are designed to moderate words in text. If a sentence or a piece of text contains a specific term, the term is hit. You can add different terms for different business scenarios.

    In Content Moderation, you can apply term-based moderation to text violation detection in images and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios. The relevant parameters in these two scenarios may be slightly different.

    You can add the AND (&) and NOT (~) logical operators in Chinese terms. Examples:
    • The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
    • The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
    Note If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term.
  • Text patterns

    Text patterns are designed to compare the similarity between sentences or pieces of text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a specific degree, the text pattern is hit.

    You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business needs. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.

Limits

Type Item Limit
Self-managed text library Quantity Supports a maximum of 10 self-managed text libraries.
Self-managed text library Name length Supports a maximum of 20 characters in length for each library name.
Term Term type
  • Supports Chinese terms.
  • Supports letters and digits as terms.
    Note Each combination of letters and digits is considered as a word during word-breaking.
  • English words or phrases cannot be used as terms.
Term Number of terms in a text library Supports a maximum of 10,000 terms in a text library.
Term Term length Supports a maximum of 50 characters in length for each term, including logical operators.
Term Encoding for Chinese terms UTF-8
Term Term format The following special characters in full-width and half-width forms are not supported:

At signs (@), number signs (#), dollar signs ($), percent signs (%), carets (^), asterisks (*), parentheses (()), angle brackets (<>), backslashes (/), question marks (?), commas (,), periods (.), semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs.

Text pattern Text pattern length Supports 10 to 4,000 characters in length for each text pattern.
Note If the text added to the text library is too long, invalid matches may occur. We recommend that you set each text pattern to a maximum of 200 characters in length.
Text pattern Number of text patterns in a text library Supports a maximum of 10,000 text patterns in a text library.
Text pattern Encoding UTF-8
Text pattern Text content Requires clear Chinese semantic traits that can be extracted. If few semantic traits can be identified from a text pattern, this text pattern is ignored.
Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored.

Procedure

  1. Log on to the Content Moderation console.
  2. In the left-side navigation pane, choose Machine audit > Risk Libraries.
  3. Click Create Text Library.
  4. In the Create Custom Text Library dialog box, set the parameters based on the Parameters for creating a custom text library table. Then, click OK. Create Custom text Library dialog box
    Table 1. Parameters for creating a custom text library
    Parameter Description
    Name The name of the custom text library. You can set the same name for multiple text libraries. However, we recommend that you set a unique name for each text library.
    Scene The scenario of the text library. Valid values:
    • Text Anti-spam: text anti-spam where the value of the scenes parameter contains antispam in API requests
    • Ad: image moderation where the value of the scenes parameter contains ad in API requests
    Type The text type of the text library. Valid values:
    • Keyword: matches the text to be moderated that contains terms. You can detect more risky text by using terms.
    • Similar Text: matches the text to be moderated that is similar to text patterns at a specific probability. You can detect risky text more accurately by using text patterns.
      Note You can set this parameter to Similar Text only if the Scene parameter is set to Text Anti-spam.
    Match Mode The match mode applied to the custom text library. This parameter is required if the Type parameter is set to Keyword. Valid values:
    • Precise: matches the text to be moderated that contains the same terms in the text library.
    • Fuzzy: preprocesses the text to be moderated and terms, and then matches the preprocessed text to be moderated that hits the preprocessed terms. The text to be moderated and terms are preprocessed in the following ways:
      • Convert uppercase letters to lowercase letters. For example, if the text to be moderated is "bitCoin", the term "bitcoin" is hit.
      • Convert traditional Chinese characters to simplified Chinese characters.
      • Convert similar words. For example, if the text to be moderated is "②", the term "2" is hit.
      Note By default, the fuzzy match mode is selected for libraries that consist of text patterns.
    List Category The category of the moderation result that is returned based on the custom text library.
    • Valid values if the Type parameter is set to Keyword parameter:
      • Blacklist: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block.
      • Review List: If the text to be moderated hits terms in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review.
      • Ignore List: The text excluding that hits terms in the text library is moderated.
    • Valid values if the Type parameter is set to Similar Text:
      • Blacklist: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of block.
      • Review List: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of review.
      • Whitelist: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result contains the suggestion parameter that has a value of pass.
    bizType The business scenario to which the custom text library applies. You can specify different text libraries in API requests to meet your business needs. For example, you can use the bizType parameter to specify the text library to be applied in a specific moderation scenario. The bizType parameter takes effect in the following ways:
    • If the bizType parameter in a moderation request is set to A, the text libraries of which the bizType parameter is set to A are used for moderation. These text libraries can be used only if they are enabled.
    • In other cases, all enabled text libraries are used for moderation.
    After the text library is created, you can view it in the text library list.
  5. Manage terms or text patterns in the text library.
    The Custom Text Library tab displays all custom text libraries. The libraries marked with System and named in the SCENARIO_FEEDBACK_WHITE or SCENARIO_FEEDBACK_BLACK format are feedback-based text libraries. For example, the ANTISPAM_FEEDBACK_BLACK library is a blacklist that consists of text patterns added by the system and is used for text anti-spam.
    1. Find the text library that you want to manage and click Manage in the Actions column.
    2. On the Text Libraries page, manage terms or text patterns in the library. In this example, you can manage terms. Manage a text library whose text type is term
      The Text Libraries page displays all text patterns added to the library and displays the number of times each text pattern is hit in the last seven days in the Detected in Last Seven Days column, excluding the statistics on the current day.
      Note You can add and delete terms. The operations take effect in about 15 minutes.
      • Click Add Keyword or Import and add terms as prompted. Add Keyword
      • Select one or more terms that you no longer need and click Batch Delete to delete the terms. You can also find a specific term and click Delete in the Actions column to delete the term.

Delete, modify, or disable a text library

On the Custom Text Library tab, you can click Delete, Edit, or Disable in the Actions column to delete, modify, or disable a self-managed text library.