This topic describes how to configure check items in the Guardrails console. - AI Guardrails

Use the Guardrails console to control which content risks are detected for your AI applications. Enable or disable detection services, configure blocklists and allowlists, and fine-tune risk tag detection scopes to match your application's needs.

Prerequisites

Before you begin, ensure that you have:

Access to the Guardrails consoleGuardrails console
The permissions required to modify protection configurations

Configure detection policies

Log on to the Guardrails consoleGuardrails console.
In the navigation pane on the left, choose Protection Configuration > Configuration.
Two policies are available:
Policy Identifier
AI input content moderation query_security_check_intl
AI-generated content moderation response_security_check_intl
Enable or disable detection services as needed:
- Sensitive content detection
- Prompt injection detection
To review the rule details for a service before enabling it, click ManagementAI input content moderation (query_security_check_intl) in the Actions column.
Note: Enabling Sensitive content detection or Prompt injection detection triggers a billing notification. These services are billed separately. For details, see Activation and billing overview.
Configure vocabularies: select a vocabulary to add to the blocklist or allowlist. For details, see Vocabulary management.
Manage risk tags: Enable or disable each risk tag in the Guardrails console. For certain risk tags, configure a more specific detection scope. The following steps use the AI input content moderation (query_security_check_intl) policy as an example:
1. On the Rule Management tab, click Management in the Actions column.
2. Select the detection type to configure, such as Undesirable Content.
3. Click Edit to enter edit mode, then modify the detection status.
4. Click Save.
Note: Changes take effect in about 2 to 5 minutes.

AI Guardrails:Configure check items

Prerequisites

Configure detection policies

What's next

Policy	Identifier
AI input content moderation	`query_security_check_intl`
AI-generated content moderation	`response_security_check_intl`