All Products
Search
Document Center

Content Moderation:Features

Last Updated:Jul 09, 2025

AI Guardrails supports risk detection for content compliance, sensitive content, and prompt injection attacks in two scenarios: model input content and generated content. Additionally, the console provides comprehensive functions such as online testing, data reports, and result query.

AI Guardrails product feature set

  1. Check item configuration

    • Feature description: Supports setting appropriate check items for different scenarios and configuring switches for fine-grained labels. The following table provides the details:

    • Function

      Description

      Service name

      Content compliance detection

      Detects baseline risks such as politically sensitive content, pornography, and violence, along with violations such as abuse, bias, and harmful values in LLM input or generated content.

      • AI input content security check (query_security_check_intl)

      • AI generated content security check (response_security_check_intl)

      Sensitive content detection

      Automatically identifies, classifies, and grades personal sensitive information and enterprise sensitive information in LLM generated content.

      • AI input content security check (query_security_check_intl)

      • AI generated content security check (response_security_check_intl)

      Prompt injection attack detection

      Identifies deliberately generated violating content in LLM output that bypasses security policies through prompt manipulation (such as inductive or adversarial prompt construction) or technical means (such as encoding obfuscation or multi-round conversation disguise).

      • AI input content security check (query_security_check_intl)

      • AI generated content security check (response_security_check_intl)

  2. Word library management and matching

    • Feature description: When performing content compliance detection, you can set up lists of risky prohibited keywords or keywords that need to be filtered out before text detection, and then configure detection rules for keyword matching if you need to customize private moderation rules.

    • For more information, see Operation guide.

  3. Answer library management and settings

    • Feature description: You can use this feature when performing content compliance detection if you need to replace blocked content with pre-set answers from an answer library.

    • For more information, see Operation guide.

  4. Online testing

    • Feature description: Supports online testing of content compliance detection, sensitive content detection, and prompt injection attack detection covered by AI Guardrails to quickly verify the effectiveness of moderation policies.

    • For more information, see Quick guide to using the online testing feature.

  5. Result query

    • Feature description: Allows users to view the moderation results and returned parameters of checked content through the result query function, meeting requirements such as analyzing high-frequency risk types.

    • For more information, see Operation guide.

  6. Risk reports

    • Feature description: Allows users to understand the Content Moderation, Sensitive Data Detection, and Prompt Injection Attack Detection call Trend Statistics and call Risk Distribution by viewing risk reports.

    • For more information, see Operation guide.