Protect LLM Apps from Prompt Injection with WAF - Web Application Firewall

AI applications are increasingly exposed to prompt injection attacks, where malicious inputs are crafted to manipulate large language models (LLMs), potentially causing them to disclose sensitive information or execute unintended actions. The prompt injection prevention feature safeguards your AI systems by proactively detecting and blocking such adversarial activity, ensuring ongoing security and reliability.

What is prompt injection?

Prompt injection is an attack where a malicious user manipulates the input to an LLM, embedding hidden instructions designed to override the model's safety mechanisms and ethical constraints. This can cause the model to behave in unintended ways.

Common attack vectors include:

Instruction hijacking: The attacker appends directives like"ignore previous instructions" to the prompt, in an attempt to override system-level presets or guidelines set for the model.
Persona-based prompt injection: The model is coaxed into adopting a persona, such as pretending to be someone with fewer restrictions (e.g., the"Grandma Exploit," in which the model impersonates a grandmother telling stories). This can trick the model into disclosing sensitive or restricted information, such as activation keys or instructions for prohibited actions.
Model jailbreaking: Jailbreaking is the overarching objective of these and other attack vectors—it involves crafting prompts (often over multi-turn conversations or via clever combinations of inputs) that bypass or break through the model's safety and alignment protocols, ultimately causing it to generate content it ordinarily would restrict.

Configure a prompt injection prevention template

To set up a prompt injection prevention template, first enable the AI Application Protection Service and create an asset.

Log on to the Web Application Firewall (WAF) 3.0 console. In the top menu bar, select the appropriate resource group and region (Chinese Mainland or Outside Chinese Mainland) for your WAF instance and click Create Template.
For Template Name, enter a unique name for the template.
In the Policy Configuration section, click Create Rule and configure the following settings before clicking OK. Multiple rules can be added.
- Rule Name: Specify a name for each rule.
- Threat Level: Set the rule's threat level—High, Medium, or Low. To cover all traffic, create individual rules for each threat level.
- Action: Choose the action to apply when a request matches the rule:
  - Monitor: Requests are not blocked but are logged for review. You can analyze logged requests in Log Query to assess rule effectiveness and identify any false positives.
  - Block: Requests matching the rule are blocked by WAF and not forwarded to the backend LLM. You may select a custom response template. Examples include:
    - Example 1: Returns a block page that does not follow the response format of the LLM application.
      - Status Code: 403
      - Header Name: Content-Type
      - Header Value: text/plain; charset=utf-8
      - Response Body: {"error_id":" {::trace_id::}","msg":"Prohibited content detected. Response blocked."}
    - Example 2: Returns a response formatted to match the requirements of your LLM application, ensuring a better user experience. Adjust the configuration as needed to align with your application's response format.
      - Status Code: 200
      - Header Name: Content-Type
      - Header Value: text/event-stream; charset=utf-
      - Response Body:
        data: {"id":"","object":"chat.completion.chunk","created":1747364919,"model":"deepseek-chat","system_fingerprint":"","choices":[{"index":0,"delta":{"content":"Your input contains prohibited content and has been blocked by WAF."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0,"prompt_tokens_details":{"cached_tokens":0},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":0}} data: [DONE]
  - Custom Response: When a request matches a rule, WAF forwards the original request to the backend LLM, but replaces the model's response with your custom content. You only need to specify the response text; it does not need to match the LLM's response format. For example:"The response contains non-compliant content and has been blocked by WAF."
For Protected Assets, select the assets you created in Asset Management to apply the protection template.
Note
A single protection template can be associated with multiple assets, but each asset can only be linked to one protection template at a time.

View and manage prompt injection protection templates

The Prompt Injection Protection page provides a template list where you can manage your protection templates with the following actions:

View the number of assets linked to each template.
Use the Status toggle to enable or disable a template.
Click Create Rule to add new rules to a template.
Edit, Delete, or Copy templates as needed.
To manage individual rules within a template, click the icon next to the template name. This allows you to:
- View details for each rule, such as Rule ID and Detected Threat Level.
- Enable or disable rules with the Status toggle.
- Edit or Delete existing rules.

Next steps

You can query hit records for specific AI application protection templates on the Security Reports page and review mitigation logs in Log Search.

Feature availability

This feature currently supports protected objects in the Hangzhou and Singapore clusters only when added in canonical name (CNAME) record mode, in cloud native mode for Elastic Compute Service (ECS) instances, or as Layer 4 Classic Load Balancer (CLB) instances. Support for additional clusters is being rolled out gradually. If you require support for other clusters at this time, please contact your account manager.