AI Guardrails supports risk detection for content compliance, sensitive content, and prompt injection attacks in two scenarios: model input content and generated content. Additionally, the console provides comprehensive functions such as online testing, data reports, and result query.
AI Guardrails product feature set
Check item configuration
Feature description: Supports setting appropriate check items for different scenarios and configuring switches for fine-grained labels. The following table provides the details:
Function
Description
Service name
Content compliance detection
Detects baseline risks such as politically sensitive content, pornography, and violence, along with violations such as abuse, bias, and harmful values in LLM input or generated content.
AI input content security check (query_security_check_intl)
AI generated content security check (response_security_check_intl)
Sensitive content detection
Automatically identifies, classifies, and grades personal sensitive information and enterprise sensitive information in LLM generated content.
AI input content security check (query_security_check_intl)
AI generated content security check (response_security_check_intl)
Prompt injection attack detection
Identifies deliberately generated violating content in LLM output that bypasses security policies through prompt manipulation (such as inductive or adversarial prompt construction) or technical means (such as encoding obfuscation or multi-round conversation disguise).
AI input content security check (query_security_check_intl)
AI generated content security check (response_security_check_intl)
Word library management and matching
Feature description: When performing content compliance detection, you can set up lists of risky prohibited keywords or keywords that need to be filtered out before text detection, and then configure detection rules for keyword matching if you need to customize private moderation rules.
For more information, see Operation guide.
Answer library management and settings
Feature description: You can use this feature when performing content compliance detection if you need to replace blocked content with pre-set answers from an answer library.
For more information, see Operation guide.
Online testing
Feature description: Supports online testing of content compliance detection, sensitive content detection, and prompt injection attack detection covered by AI Guardrails to quickly verify the effectiveness of moderation policies.
For more information, see Quick guide to using the online testing feature.
Result query
Feature description: Allows users to view the moderation results and returned parameters of checked content through the result query function, meeting requirements such as analyzing high-frequency risk types.
For more information, see Operation guide.
Risk reports
Feature description: Allows users to understand the Content Moderation, Sensitive Data Detection, and Prompt Injection Attack Detection call Trend Statistics and call Risk Distribution by viewing risk reports.
For more information, see Operation guide.