Content Moderation bases its moderation service on the global text library of Alibaba Cloud by default, which can meet most of the moderation needs. To meet specific business needs, Content Moderation also supports custom text libraries. You can manage text to be blocked, passed, or reviewed separately in different custom text libraries to handle emergencies.

Background information

Notice To seek technical support, you can submit a ticket to Alibaba Cloud. We recommend that you do not add terms unless necessary. Otherwise, the accuracy of moderation results cannot be guaranteed due to incorrect matches.
Custom text libraries consist of feedback-based text libraries and user-created text libraries:
  • Feedback-based text libraries are automatically created to accommodate the text that is reviewed. For more information, see Review data. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries, such as disabling or deleting a feedback-based text library.
  • User-created text libraries are created by you to moderate text in a specific or a type of moderation scenario. You can manage the text in user-created text libraries and perform operations on user-created text libraries.
Note You can create up to 10 user-created text libraries.

When using the Content Moderation API, you can apply custom text libraries to ad violation detection and text anti-spam.

This topic describes how to manage custom text libraries for the Content Moderation API in the Alibaba Cloud Content Moderation console. In addition to operations in the console, you can also manage custom text libraries by using the API or SDK. For more information, see the following methods:

Text types

The text in custom text libraries consists of terms and text patterns.

  • Terms

    Terms are designed to moderate words in text. If a sentence or a piece of text contains a certain term, the term is hit. You can add different terms for different business scenarios.

    In Content Moderation, you can apply term-based moderation to ad violation detection and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios.

    You can add the AND (&) and NOT (~) logical operators in Chinese terms. For example:
    • The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
    • The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
    Note If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term.
  • Text patterns

    Text patterns are designed to compare the similarity between sentences or text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a certain degree, the text pattern is hit.

    You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business needs. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.

Limits

Type Item Limit
User-created text library Quantity Supports a maximum of 10 user-created text libraries.
User-created text library Name length Supports a maximum of 20 characters in length for each library name.
Term library Term type Supports Chinese characters and combinations of letters and digits. Currently, English words or phrases cannot be used as terms.
Note Each combination of letters and digits is considered as a word during word-breaking.
Term library Number of terms in a text library Supports a maximum of 10,000 terms in a text library.
Term library Term length Supports a maximum of 50 characters in length for each term, including logical operators.
Term library Encoding for Chinese terms Supports UTF-8 encoding.
Term library Term format Excludes the following full-width and half-width special characters: at signs (@), number signs (#), dollar signs ($), percent signs (%), carets (^), asterisks (*), parentheses (), angle brackets (<>), forward slashes (/), question marks (?), commas (,), periods (.), semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs.
Text pattern library Text pattern library Supports 10 to 4,000 characters in length for each text pattern.
Note If the text added to the text library is too long, it may cause incorrect matches. We recommend that each text pattern be up to 200 characters in length. You can submit a ticket to seek technical support.
Text pattern library Number of text patterns in a text library Supports a maximum of 10,000 text patterns in a text library.
Text pattern library Encoding Supports UTF-8 encoding.
Text pattern library Text content Requires clear Chinese semantic features that can be extracted. If few semantic features can be identified from a text pattern, this text pattern is ignored.
Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored.

Manage feedback-based text libraries

Note Feedback-based text libraries contain text patterns rather than terms. You can manage custom text libraries whose text type is text pattern in the same way.
  1. Log on to the Alibaba Cloud Content Moderation console.
  2. In the left-side navigation pane, choose Moderation API > Risk library management. The Text Library tab of the Risk library management page appears.
    The Text Library tab lists all custom text libraries. The libraries marked with System and named in SCENARIO_FEEDBACK_WHITE or SCENARIO_FEEDBACK_BLACK format are feedback-based text libraries. For example, the ANTISPAM_FEEDBACK_BLACK library is a blacklist that consists of text patterns added by the system and is used for text anti-spam.Manage feedback-based text libraries
  3. Find the target feedback-based text library and click Manage in the Operations column.
    The Manage Text Library page appears. This page lists all text patterns added to the library and displays the number of times that each text pattern is hit in the last seven days in the Detected Last 7 Days column, excluding the statistics on the current day.Manage text patterns
  4. On the Manage Text Library page, manage text patterns in the library.
    Note You can add and delete text patterns. The operations take effect in 15 minutes.
    • Click Add Text or Import and add text patterns as prompted. Add text patterns
    • Select text patterns that you do not need and click Delete Selected at the bottom of the page. Alternatively, find a text pattern that you do not need and click Delete in the Operations column.

Create and manage custom text libraries

  1. Log on to the Alibaba Cloud Content Moderation console.
  2. In the left-side navigation pane, choose Moderation API > Risk library management. The Text Library tab of the Risk library management page appears.
  3. On the Text Library tab, click New.Create a custom text library
  4. In the Create Text Library dialog box that appears, set relevant parameters and click OK. The following table describes the parameters.
    Parameter Description
    Name The name of the custom text library. You can set the same name for multiple text libraries. However, we recommend that you set a unique name for each text library.
    Scene The moderation scenario to which the custom text library applies. Valid values:
    • Text Antispam: text anti-spam where the value of the scenes parameter contains antispam in API requests
    • Ad: image moderation where the value of the scenes parameter contains ad in API requests
    Type The type of text in the custom text library. Valid values:
    • Keyword: matches the text to be moderated that contains terms. You can detect more risky text through terms.
    • Similar Text: matches the text to be moderated that is similar to text patterns at a certain probability. You can detect risky text more accurately through text patterns.
      Note You can set this parameter to Similar Text only when the Scene parameter is set to Text Antispam.
    Match Mode The match mode applied to the custom text library. This parameter is required when the Type parameter is set to Keyword. Valid values:
    • Precise: matches the text to be moderated that contains the same terms in the text library.
    • Fuzzy: preprocesses the text to be moderated and terms, and then matches the preprocessed text to be moderated that hits the preprocessed terms. The text to be moderated and terms are preprocessed in the following ways:
      • Convert uppercase letters to lowercase letters. For example, if the text to be moderated is "bitCoin", the term "bitcoin" is hit.
      • Convert traditional Chinese characters to simplified Chinese characters.
      • Convert similar words. For example, if the text to be moderated is "(2)", the term "2" is hit.
      Note The fuzzy mode is selected by default for text pattern libraries.
    Category The category of moderation results returned based on the custom text library.
    • Select Keyword for Type. Valid values:
      • Black: If the text to be moderated hits terms in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of block.
      • Review: If the text to be moderated hits terms in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of review.
      • Ignore: If the text to be moderated hits terms in the text library, the text is ignored and the machine-assisted moderation result returns the suggestion parameter with a value of pass.
    • Select Similar Text for Type. Valid values:
      • Black: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of block.
      • Review: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of review.
      • White: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of pass.
    bizType The business scenario to which the custom text library applies. You can specify different text libraries in API operations to meet business needs. For example, you can use the bizType parameter to specify the text library to be applied in a specific moderation scenario. The bizType parameter takes effect in the following ways:
    • If the bizType parameter in a moderation request is set to A, the text libraries of which the bizType parameter is set to A are used for moderation. These text libraries must be enabled.
    • In other cases, all enabled text libraries are used for moderation.

    Set this parameter as needed. We recommend that you submit a ticket to Alibaba Cloud to seek technical support.

    Create a custom text library
    The custom text library is created. You can view the created text library on the Text Library tab.
  5. Optional: If the text type of the created text library is term, follow these steps to manage terms. For more information about how to manage text patterns if the text type of the created text library is text pattern, see Manage feedback-based text libraries.
    1. Find the target text library whose text type is term and click Manage in the Operations column.
      The Manage Text Library page appears. This page lists all terms added to the library and displays the number of times that each term is hit in the last seven days in the Detected Last 7 Days column, excluding the statistics on the current day.Manage a text library whose text type is term
    2. On the Manage Text Library page, manage terms in the library.
      Note You can add and delete terms. The operations take effect in 15 minutes.
      • Click Add Keyword or Import and add terms as prompted.Add terms
      • Select terms that you do not need and click Delete Selected at the bottom of the page. Alternatively, find a term that you do not need and click Delete in the Operations column.
  6. Delete, modify, or disable a custom text library. Return to the Text Library tab. Select the target text library and click Delete, Modify, or Disable in the Operations column to perform the corresponding operation.