By default, Content Moderation bases its moderation service on the global text library of Alibaba Cloud, which can meet most of the moderation needs. To meet specific business needs, Content Moderation also supports custom text libraries. You can manage text to be separately blocked, passed, or reviewed in different custom text libraries to handle emergencies.

Background information

Notice We recommend that you do not add terms unless necessary. Otherwise, the accuracy of moderation results cannot be guaranteed due to invalid matches.
Custom text libraries consist of feedback-based text libraries and user-created text libraries:
  • Feedback-based text libraries are automatically created to accommodate the text that is reviewed. By default, you can use feedback-based text libraries to moderate text in all moderation scenarios of the same type. You can manage the text in feedback-based text libraries. However, you cannot perform operations on feedback-based text libraries, such as disabling or deleting a feedback-based text library. For more information about human review, see Review data.
  • User-created text libraries are created by you to moderate text in a specific or a type of moderation scenario. You can manage the text in user-created text libraries and perform operations on user-created text libraries.
Note You can create up to 10 user-created text libraries.

When you use the Content Moderation API, you can apply custom text libraries to ad violation detection and text anti-spam.

This topic shows you how to manage custom text libraries for the Content Moderation API in the Alibaba Cloud Content Moderation console. In addition to operations in the console, you can also manage custom text libraries by using the API or SDK. For more information, see the following topics:

Text types

The text in custom text libraries consists of terms and text patterns.

  • Terms

    Terms are designed to moderate words in text. If a sentence or a piece of text contains a specific term, the term is hit. You can add different terms for different business scenarios.

    In Content Moderation, you can apply term-based moderation to ad violation detection and text anti-spam. For more information about relevant parameters, see the parameter description of moderation operations in different scenarios. The relevant parameter in the ad violation detection and text anti-spam may be slightly different.

    You can add the AND (&) and NOT (~) logical operators in Chinese terms. For example:
    • The term "A&B" is added. If a piece of text contains both A and B, the term is hit.
    • The term "A~B" is added. If a piece of text contains A but does not contain B, the term is hit.
    Note If you add both logical operators in a term, the AND (&) operator must be added before the NOT (~) operator. For example, you can add "A&B~C" as a term, but cannot add "A~C&B" as a term.
  • Text patterns

    Text patterns are designed to compare the similarity between sentences or text. If two sentences or two pieces of text are partially different but express the same meaning, the two sentences or two pieces of text show a close similarity. Content Moderation can determine whether a piece of text has a close similarity to a text pattern in text pattern libraries. If the similarity reaches a specific degree, the text pattern is hit.

    You can apply text pattern libraries to text anti-spam. Content Moderation allows you to customize a blacklist, a whitelist, and a review list for text pattern libraries based on your business needs. The review list contains the text that needs human review. You can manage text patterns related to your business in text pattern libraries. In this case, the content that hits text patterns can be filtered out in text anti-spam.

Limits

Type Item Limit
User-created text library Quantity Supports a maximum of 10 user-created text libraries.
User-created text library Name length Supports a maximum of 20 characters in length for each library name.
Term library Term type
  • Supports Chinese terms.
  • Supports letters and digits as terms.
    Note Each combination of letters and digits is considered as a word during word-breaking.
  • English words or phrases cannot be used as terms.
Term library Number of terms in a text library Supports a maximum of 10,000 terms in a text library.
Term library Term length Supports a maximum of 50 characters in length for each term, including logical operators.
Term library Encoding for Chinese terms UTF-8
Term library Term format Excludes the following full-width and half-width special characters:

@ # $ % ^ * ( ) < > / ?, . semicolons (;), underscores (_), plus signs (+), hyphens (-), equal signs (=), single quotation marks ('), double quotation marks ("), spaces, and tabs.

Text pattern library Text pattern library Supports 10 to 4,000 characters in length for each text pattern.
Note If the text added to the text library is too long, it may cause invalid matches. We recommend that each text pattern be up to 200 characters in length.
Text pattern library Number of text patterns in a text library Supports a maximum of 10,000 text patterns in a text library.
Text pattern library Encoding UTF-8
Text pattern library Text content Requires clear Chinese semantic features that can be extracted. If few semantic features can be identified from a text pattern, this text pattern is ignored.
Note A text pattern that consists of meaningless letters, digits, or emoticons may be ignored.

Manage feedback-based text libraries

Note Feedback-based text libraries contain text patterns rather than terms. You can manage custom text libraries whose text type is text pattern in the same way.
  1. Log on to the Alibaba Cloud Content Moderation console.
  2. In the left-side navigation pane, choose Moderation API > Risk Libraries.
  3. On the Risk Libraries page, click the Custom Text Library tab.Manage feedback-based text libraries
    The Custom Text Library tab displays all custom text libraries. The libraries marked with System and named in SCENARIO_FEEDBACK_WHITE or SCENARIO_FEEDBACK_BLACK format are feedback-based text libraries. For example, the ANTISPAM_FEEDBACK_BLACK library is a blacklist that consists of text patterns added by the system and is used for text anti-spam.
  4. Find the feedback-based text library that you want to manage and click Manage in the Actions column.
  5. On the Text Libraries page, manage text patterns in the library.Manage text patterns
    The Text Libraries page displays all text patterns added to the library and displays the number of times that each text pattern is hit in the last seven days in the Detected in Last Seven Days column, excluding the statistics on the current day.
    Note You can add and delete text patterns. The operations take effect in 15 minutes.
    • Click Add Keyword or Import and add text patterns as prompted. Add text patterns
    • Select terms that you do not need and click Batch Delete to delete the terms. You can also find the term that you do not need and click Delete to delete the term.

Create and manage custom text libraries

  1. Log on to the Alibaba Cloud Content Moderation console.
  2. In the left-side navigation pane, choose Moderation API > Risk Libraries.
  3. On the Risk Libraries page, click the Custom Text Library tab.
  4. Click Create Text Library.Create a custom text library
  5. In the Create Custom Text Library dialog box, set relevant parameters and click OK.Create a custom text library
    Parameter Description
    Name The name of the custom text library. You can set the same name for multiple text libraries. However, we recommend that you set a unique name for each text library.
    Scene The scenario of the text library. Valid values:
    • Text Anti-spam: text anti-spam where the value of the scenes parameter contains antispam in API requests
    • Ad: image moderation where the value of the scenes parameter contains ad in API requests
    Type The text type of the text library. Valid values:
    • Keyword: matches the text to be moderated that contains terms. You can detect more risky text by using terms.
    • Similar Text: matches the text to be moderated that is similar to text patterns at a specific probability. You can detect risky text more accurately by using text patterns.
      Note You can set this parameter to Similar Text only when the Scene parameter is set to Text Anti-spam.
    Match Mode The match mode applied to the custom text library. This parameter is required when the Type parameter is set to Keyword. Valid values:
    • Precise: matches the text to be moderated that contains the same terms in the text library.
    • Fuzzy: preprocesses the text to be moderated and terms, and then matches the preprocessed text to be moderated that hits the preprocessed terms. The text to be moderated and terms are preprocessed in the following ways:
      • Convert uppercase letters to lowercase letters. For example, if the text to be moderated is "bitCoin", the term "bitcoin" is hit.
      • Convert traditional Chinese characters to simplified Chinese characters.
      • Convert similar words. For example, if the text to be moderated is "(2)", the term "2" is hit.
      Note The fuzzy mode is selected by default for text pattern libraries.
    List Category The category of moderation results returned based on the custom text library.
    • Select Keyword for Type. Valid values:
      • Blacklist: If the text to be moderated hits terms in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of block.
      • Review List: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of review.
      • Ignore List: The text excluding that hits text patterns in the text library is moderated.
    • Select Similar Text for Type. Valid values:
      • Blacklist: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of block.
      • Review List: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of review.
      • Whitelist: If the text to be moderated hits text patterns in the text library, the machine-assisted moderation result returns the suggestion parameter with a value of pass.
    BizType The business scenario to which the custom text library applies. You can specify different text libraries in API operations to meet business needs. For example, you can use the bizType parameter to specify the text library to be applied in a specific moderation scenario. The bizType parameter takes effect in the following ways:
    • If the bizType parameter in a moderation request is set to A, the text libraries of which the bizType parameter is set to A are used for moderation. These text libraries must be enabled.
    • In other cases, all enabled text libraries are used for moderation.
    After the text library is created, you can view the created text library in the text library list.
  6. Optional:Manage text, such as terms and text patterns, in the text library.
    The following steps show you how to manage terms if the text type of the created text library is term. If the text type of the created text library is text pattern, you can refer to the method of managing feedback-based text libraries. For more information, see Manage feedback-based text libraries.
    1. Find the text library that you want to manage and click Manage in the Actions column.
    2. On the Text Libraries page, manage terms in the library.Manage a text library whose text type is term
      The Text Libraries page displays all text patterns added to the library and displays the number of times that each text pattern is hit in the last seven days in the Detected in Last Seven Days column, excluding the statistics on the current day.
      Note You can add and delete terms. The operations take effect in 15 minutes.
      • Click Add Keyword or Import and add terms as prompted.Add terms
      • Select terms that you do not need and click Batch Delete to delete the terms. You can also find the term that you do not need and click Delete to delete the term.
  7. Delete, modify, or disable a text library. Return to the Custom Text Library tab. Select the text library that you want to manage and click Delete, Edit, or Disable in the Actions column to perform the corresponding operation.