Content moderation - Web Application Firewall - Alibaba Cloud Documentation Center

Safeguard your generative AI applications from compliance risks. Web Application Firewall (WAF) content moderation feature inspects both user prompts and model responses in real time for harmful content, such as sexually explicit, politically sensitive, or other prohibited material. Powered by a multi-modal AI engine, it provides bidirectional risk detection and enforcement to keep your AI services secure and compliant.

Create a template and configure rules

Before you start, make sure you have activated the AI application protection service and added your application to WAF as assets.

Log on to the WAF console and navigate to the AI Application Protection page. In the top menu bar, select the resource group and region (Chinese Mainland or Outside Chinese Mainland) where your WAF instance resides. Click Create Template.
In the dialog box that appears, enter a Template Name.

Under Policy Configuration, click Create Rule. On the rule configuration page, configure the following parameters, and then click OK. Multiple rules can be created.

Configuration Item	Description
Rule Name	Enter a descriptive name for the rule.
Detection Category	Select the risk types to detect. You can choose multiple categories, such as sexually explicit content, politically sensitive content, terrorist content, non-compliant content, and inappropriate content.
Detected Threat Level	Select the threat level. Options: High, Medium, or Low.

Select Detect Request to inspect user requests or Detection and Response to inspect AI model responses. Configure one or both options.
Detect Request
Select this option to inspect the content that users send to your AI model. Choose one of the following actions:
- Monitor: Logs the response without blocking it. You can view matching requests in LogSearch to analyze the rule effectiveness, for example, by checking for false positives.
- Replace Response: WAF blocks the request from reaching the backend model. It then returns a custom response that you define. Configure the response text. The response does not need to match the format of the large model's response. For example, set the response text to: "The response contains prohibited content and has been blocked by WAF."
- Block: WAF blocks the request and prevents it from reaching your AI model. You can return a custom response template, such as a simple JSON error message.
  - Example 1: Returns a block page without matching the response format of the large model application.
    Status Code: 403
    Header Name: Content-Type
    Header Value: text/plain; charset=utf-8
    Response Body: {"error_id":" {::trace_id::}","msg":"Non-compliant content detected. The response has been blocked."}
  - Example 2: Matches the response format of the large model application for a better user experience. This may need to be adjusted based on your application's actual response.
    Status Code: 200
    Header Name: Content-Type
    Header Value: text/event-stream; charset=utf-8
    Response Body:
    data: {"id":"","object":"chat.completion.chunk","created":1747364919,"model":"deepseek-chat","system_fingerprint":"","choices":[{"index":0,"delta":{"content":"Your input contains non-compliant content and has been blocked by WAF"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0,"prompt_tokens_details":{"cached_tokens":0},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":0}} data: [DONE]
Detect Response
Select this option to inspect the content that your AI model sends back to users. The actions available depend on whether your application uses Non-streaming Response Action or Streaming Response Action.
- Non-streaming response action
  - Monitor: WAF logs any request that matches the rule but does not block it. You can view matching responses in LogSearch to analyze rule effectiveness, for example, by checking for false positives.
  - Replace Response: WAF forwards the original request to your AI model but intercepts and replaces the model's eventual response with a custom message you define. For example, set the response text to: "The response contains prohibited content and has been blocked by WAF."
  - Block: WAF blocks the response and returns a custom error page or message to the user. Select a custom response template.
- Streaming response action
  - Monitor: Logs the response chunks without blocking them. View matching responses in LogSearch to analyze rule effectiveness, for example, by checking for false positives.
  - Block: After sending a harmful message chunk, WAF sends a special event in the Server-Sent Events (SSE) stream. This event instructs your client-side application to withdraw the last message and display a safe alternative. The event format must match your application's logic to function correctly.
    In this Dify example, the event field uses message_replace to indicate the revocation of the previous message. The answer field contains the message to display to the user after the revocation.
    When the client receives the SSE stream, it monitors the event field in each message. When it receives "event": "message_replace", the client should:
    Stop receiving the current streaming output.
    Clear or overwrite the rendered unsafe content.
    Display the message in the answer field as the final response.
    data: {"event": "message_replace", "conversation_id": "", "message_id": "", "created_at": 1755685383, "task_id": "", "id": "", "answer": "The content is illegal. The response has been revoked!", "from_variable_selector": null}
  - Replace Response: WAF terminates the original SSE stream from the model and sends a new, non-streaming response to the client containing the custom content you define. For example, set the response text to: "The response contains prohibited content and has been blocked by WAF."
Under Protected Assets, select the assets you created in Asset Management to apply the template. A template can be applied to multiple protected objects, but each protected object can be associated with only one template.

Manage templates

On the Content Moderation page, you can perform the following actions on the protection templates:

View the number of protected assets associated with a template.
Use the Status toggle to enable or disable a template.
Click Create Rule to add a rule to the template.
Edit, Delete, or Copy the protection template.
Click the icon to the left of a template name to view and manage within that template:
- View information such as Rule ID and Detected Threat Level.
- Use the Status toggle to enable or disable a rule.
- Edit or Delete the rule.

Next steps

Go to the Security reports page view statistics and hit records for your content security rules, or use Query logs to query detailed WAF logs for in-depth analysis.

Limitations

This feature is currently supported only for protected objects that are added through CNAME, ECS instances, or Layer-4 CLB instances in the Hangzhou and Singapore clusters. Support for other clusters is being rolled out gradually. If you need to use this feature in other clusters, contact your account manager.

Create a template and configure rules

Detect Request

Detect Response

Manage templates

Next steps

Limitations