Use the Guardrails console to control which content risks are detected for your AI applications. Enable or disable detection services, configure blocklists and allowlists, and fine-tune risk tag detection scopes to match your application's needs.
Prerequisites
Before you begin, ensure that you have:
Access to the Guardrails consoleGuardrails console
The permissions required to modify protection configurations
Configure detection policies
Log on to the Guardrails consoleGuardrails console.
In the navigation pane on the left, choose Protection Configuration > Configuration.
Two policies are available:
Policy Identifier AI input content moderation query_security_check_intlAI-generated content moderation response_security_check_intl
Enable or disable detection services as needed:
Sensitive content detection
Prompt injection detection
To review the rule details for a service before enabling it, click ManagementAI input content moderation (query_security_check_intl) in the Actions column.
Note: Enabling Sensitive content detection or Prompt injection detection triggers a billing notification. These services are billed separately. For details, see Activation and billing overview.
Configure vocabularies: select a vocabulary to add to the blocklist or allowlist. For details, see Vocabulary management.
Manage risk tags: Enable or disable each risk tag in the Guardrails console. For certain risk tags, configure a more specific detection scope. The following steps use the AI input content moderation (
query_security_check_intl) policy as an example:On the Rule Management tab, click Management in the Actions column.
Select the detection type to configure, such as Undesirable Content.
Click Edit to enter edit mode, then modify the detection status.
Click Save.
Note: Changes take effect in about 2 to 5 minutes.
