What is Guardrails - AI Guardrails - Alibaba Cloud Documentation Center

Guardrails is a security product from Alibaba Cloud designed for artificial intelligence (AI) systems. Its highly available and precise risk detection helps these systems deliver safe, compliant, and reliable services in response to user prompts.

Product features

When developing and operating AI applications and AI Agents, developers and businesses often face security threats such as content compliance risk, data breaches, prompt injection attacks, hallucination, and jailbreaks. These AI risks can disrupt normal business operations and expose companies to significant compliance and social risks.

Guardrails ensures the compliance, security, and stability of AI services. It provides end-to-end protection for various use cases, including those involving pre-trained large models, AI services, and AI Agents. For generative AI input and output, Guardrails offers precise risk detection and proactive defense.

Risk detection capabilities
Guardrails provides comprehensive detection, including content compliance detection, sensitive content detection, and prompt injection attack detection.
- Content compliance detection: Reviews text inputs and outputs for generative AI across multiple compliance dimensions. It covers risk categories such as politically sensitive content, pornography and vulgarity, bias and discrimination, and harmful values to ensure that AI-generated content complies with laws, regulations, and platform policies. Use cases: Chatbots, AI in education, intelligent customer service, and AIGC creation platforms.
- Sensitive content detection: Inspects AI interactions for potential leaks of private data and other sensitive information. It identifies sensitive content related to personal privacy and corporate secrets, preventing the leakage of both training and conversational data. Use cases: AI in healthcare, AI-powered financial services, and enterprise knowledge base Q&A.
- Prompt injection attack detection: Provides specialized defense against injection attacks targeting generative AI. It accurately identifies adversarial behaviors like jailbreak commands, role-playing inducements, and system prompt tampering to build an "immune system" for your AI. Use cases: Securing command interactions for an AI Agent, defending against adversarial attacks in open-domain dialogue systems, and managing permissions for third-party plugin calls.
- Malicious file detection: Analyzes common document formats uploaded by users, such as PDF, PPT, and DOC files. It identifies hidden malicious content, including executable scripts, macro viruses, and nested attack code, to prevent attackers from gaining unauthorized control or exfiltrating data through file injection. Use cases: AI applications that support document uploads, such as intelligent resume parsing, contract Q&A, and enterprise knowledge base construction.
- Malicious URL detection: Analyzes links received or generated during AI interactions in real time. It identifies high-risk URLs, such as phishing websites, malicious redirects, and links with hidden attack payloads. This prevents the large model from accessing illicit resources or becoming a vector for cyberattacks. Use cases: AI-powered search, web page summarization, RAG-based knowledge retrieval, and automated external operations.
- Digital watermarking: Automatically embeds visible or invisible watermarks into AI-generated images. This feature supports content identification regulations by ensuring AIGC content is traceable and accountable. It helps prevent the spread of misinformation and copyright disputes. Use cases: AIGC creation platforms, news media, government communications, and educational content generation in compliance-sensitive scenarios.
Custom protection configuration
Guardrails lets you configure granular risk detection settings. You can log on to the Guardrails console to enable or disable detection rules at any time and create a risk detection template that suits your needs.
- Custom detection items: Lets you configure the granular tags used for content compliance detection.
- Custom risk thresholds: Lets you set the hit threshold for each granular tag. Thresholds are based on the model's confidence score, which ranges from 0 to 100, and can be adjusted in increments of 1.
- Custom filter words: Lets you configure a list of sensitive words to detect and block, such as competitor names. You can manage this list by adding, deleting, or modifying words.

To learn more about product features, see the Features documentation.

Use cases

Use Guardrails for risk detection in these business scenarios:

Processing user prompts submitted to a generative AI model.
Analyzing multimodal content, including text, images, and videos, generated by a generative AI model.
Scanning and detoxifying the training corpus for a generative AI model.
Detecting risks in the inputs and outputs of an AI Agent.

Product features

Risk detection capabilities

Custom protection configuration

Use cases