WAF monitoring metrics: Thresholds and actions - Web Application Firewall

Log Service for WAF collects request-level metrics that you can use to configure alerts and monitor your protected domains. This reference describes each metric, its recommended alert threshold, and what to investigate when an alert fires.

Metrics are grouped into three categories:

Traffic and performance
Security action metrics
Error codes

Traffic and performance

These metrics measure request volume and latency across the WAF pipeline.

Metric	Description	Recommended threshold	When the alert fires
`200`	The server has processed the request and returned the requested data.	Set the threshold to 90% before your workloads go live. Adjust based on observed baseline traffic.	Identify which other status code has increased. A drop in 200 responses typically signals a rise in 4xx or 5xx errors.
`request_time_msec`	End-to-end latency: the time between the client sending a request and receiving the final response. Covers both the client-to-WAF leg and the WAF-to-origin leg.	Set based on your service's baseline response time. Use the P99 or average value, depending on your SLA sensitivity.	Check network connectivity on both legs: client to WAF, and WAF to origin server. Verify the origin server is responding within expected time.
`upstream_response_time`	Back-to-origin latency: the time between WAF forwarding a request to the origin server and receiving the response. This metric isolates the WAF-to-origin leg only.	Set based on your origin server's baseline response time. Use the P99 or average value.	High values indicate slowness on the back-to-origin network or the origin server itself. Check origin server load, database performance, and back-to-origin network quality.
`ssl_handshake_time`	The time required for the TLS/SSL handshake between the client and WAF during HTTPS requests.	Set based on the observed baseline for your client pool.	Long handshake times may indicate client-side issues, certificate validation delays, or network latency on the client-to-WAF leg.

Security action metrics

These metrics track requests intercepted by WAF protection rules. Monitor them to detect attack traffic and identify false positives.

Metric	Description	Recommended threshold	When the alert fires
`status:302 and block_action:tmd`	WAF returned a 302 redirect to trigger a CAPTCHA challenge. HTTP flood protection is active.	Set to 5%–10% when your workloads first go live. Adjust based on the volume of traffic WAF is blocking.	Check whether the domain is under HTTP flood attacks. If yes, create or tighten custom HTTP flood protection rules. Also check for a spike in 5xx or 4xx codes, which may indicate server-side problems compounding the issue.
`status:200 and block_action:tmd`	WAF returned 200 but HTTP flood protection was triggered without a CAPTCHA challenge.	Set to 5%–10% when your workloads first go live. Adjust based on the volume of traffic WAF is blocking.	Determine whether the domain is under HTTP flood attacks and refine your protection rules accordingly. Also check for a spike in 5xx or 4xx codes.
`status:200 and block_action:antifraud`	The request was blocked by data risk control.	Test the alert rule before applying it. There is no fixed recommended threshold.	If this alert fires frequently, contact the Alibaba Cloud R&D team to review and adjust the threshold.
`status:405`	The request was blocked by web application protection rules or HTTP ACL policy rules.	No fixed threshold. Monitor the rate trend over time.	Use the log analysis feature to examine the blocked request and the rule that matched it. Determine whether this is a false positive before adjusting any rules.
`status:444`	The request was blocked by custom HTTP flood protection rules.	No fixed threshold.	Determine whether the domain is under HTTP flood attack. If the blocked requests are legitimate API calls rather than attack traffic, adjust the threshold or allow API calls on specified servers.

Error codes

These status codes indicate problems with origin server availability or connectivity. Use them alongside upstream_response_time to distinguish WAF-layer issues from origin-layer issues.

Metric	Description	When the alert fires
`status:404`	The server cannot find the requested resources.	Query the source IP addresses that triggered the alert. A single IP address suggests a path traversal attack. Multiple IP addresses suggest the server may be misconfigured or files may be missing.
`status:499`	The client disconnected after the origin server failed to return data within the client's maximum wait time.	Check for slow responses on the origin server, including slow database queries. Also check whether an attack has exhausted origin server resources.
`status:500`	The origin server returned a 500 Internal Server Error.	Check the origin server's CPU and memory load and database status.
`status:502`	WAF received an invalid response from the origin server (502 Bad Gateway). This occurs when the back-to-origin network is degraded or when the origin server's access control policies block WAF's back-to-origin IP address.	Check the back-to-origin network quality and the origin server's access control policies. Verify that the origin server has not blocked WAF's back-to-origin IP address. Also check origin server load and database status.
`status:503`	The service is unavailable due to origin server overload or maintenance.	Check for exceptions on the origin server.
`status:504`	WAF did not receive a timely response from the upstream server (504 Gateway Timeout).	Possible causes: the origin server is overloaded and cannot respond in time; the origin server discarded the request without resetting the connection; or protocol-level communication failed between WAF and the origin server.