Protect AI Apps from Prompt Injection with MSE Nacos - Microservices Engine

Publishing Nacos configurations or MCP service definitions can expose your applications to prompt injection, malicious URLs, sensitive data leaks, brute-force attacks, and non-compliant content. The content security guardrail in MSE Nacos scans content at publish time and blocks threats based on policies you define.

Typical use cases:

Block prompt injection or jailbreak attempts embedded in MCP tool definitions before they reach your AI models.
Prevent accidental exposure of API keys, tokens, or personally identifiable information (PII) in Nacos configuration values.
Enforce content compliance by catching politically sensitive, violent, or prohibited content before it goes live.

Prerequisites

Before you begin, make sure that you have:

An MSE Nacos instance
Enterprise Edition with database engine version 3.1.1.0 or later (Developer Edition and Professional Edition do not support this feature)

Detection dimensions and protection levels

The guardrail scans content across four detection dimensions. You set an independent protection level for each dimension to control whether flagged content is logged or blocked.

Detection dimensions

Dimension	What it detects
Malicious URL detection	Malicious links, phishing websites, and other dangerous URLs
Prompt attack detection	Prompt injection, jailbreak attacks, and other malicious prompt patterns
Content compliance detection	Politically sensitive, violent, terrorist, or otherwise prohibited content
Sensitive content detection	Privacy leaks and other sensitive data

Protection levels

Each dimension supports four protection levels. Higher tolerance means fewer items are blocked.

Protection level	Behavior
Do not block	Detects and logs risks without blocking. Publishing proceeds normally.
Low risk	Blocks content rated low risk or higher. Most restrictive.
Medium risk	Blocks content rated medium risk or higher. Allows low-risk content through.
High risk	Blocks only high-risk content. Most tolerant.

The following table summarizes what each level blocks:

Detected risk level	Do not block	Low risk	Medium risk	High risk
Low	Log only	Blocked	Allowed	Allowed
Medium	Log only	Blocked	Blocked	Allowed
High	Log only	Blocked	Blocked	Blocked

Protection scope

Choose which operations trigger security scanning:

Scope	Description
Configuration creation and modification	Scans content when Nacos configurations are created or updated
MCP service creation and modification	Scans content when MCP servers are created or updated

Enable the guardrail

Log on to the MSE Management Console.
In the left-side navigation pane, choose Service Registry & Configuration Center > Instances.
Click the name of your instance.
In the left-side navigation pane, choose Security Protection > Content Security Guardrail.
Click Enable.
Note
- After you enable this feature, the system automatically checks the security and compliance of your content when you publish a configuration. Security policies are used to detect sensitive content and compliance issues, such as privacy leaks, malicious script injections, and risks from publishing non-compliant content.
- On first use, you are prompted to authorize the AliyunServiceRoleForMSEEngineService service-linked role. Follow the on-screen instructions to grant the required permissions.

Configure mitigation policies

After you enable the guardrail, configure the detection policies and protection scope.

On the Mitigation Policy Settings page, set the Blocking Policy for each detection dimension. Select a protection level for each of the four dimensions based on your security requirements. For example, set prompt attack detection to Low risk (block all flagged content) and malicious URL detection to Medium risk (allow low-risk URLs through).
Under Protection Scope, select the operations that trigger security scanning:
- Configuration creation and modification
- MCP service creation and modification
Click Save Changes.

Important

Changes take effect immediately. All subsequent publish operations are scanned against the configured policies.

What happens when content is blocked

When you publish a configuration or an MCP service, the guardrail scans the content against all enabled detection dimensions:

If the content passes all checks, it is published normally.
If a policy violation is detected:
- At the Do not block level, the system logs the finding, sends alerts, and allows publishing to proceed.
- At other protection levels, the system sends alerts and blocks publishing when the detected risk meets or exceeds the configured threshold.

To review detection history, go to the Content Security Guardrail page in the console.

Considerations

Each detection dimension operates independently. Adjusting the protection level for one dimension does not affect the others.
The Do not block level still logs detected risks. Use this level to evaluate the guardrail's detection accuracy before enforcing stricter policies.