HTTP 503 errors in Service Mesh (ASM) typically result from connection lifecycle mismatches, configuration changes, or traffic interception issues. This guide covers each scenario with its root cause, diagnostic signals, and solution.
Identify the root cause
Check the Envoy access logs of the affected pod to find the response flag. The flag indicates why the request failed:
kubectl logs <pod-name> -c istio-proxy -n <namespace>Match the response flag to a scenario:
| Response flag | Symptom | Scenario |
|---|---|---|
UC | Intermittent 503 under normal traffic, no config changes | Idle connection timeout mismatch |
UC | Brief 503 spike immediately after changing custom metrics | Metric customization config change |
| N/A | Persistent 503 after enabling mTLS, health checks fail | Health check failure with mTLS |
| N/A | All requests to a specific service return 503 | Application listening on localhost |
Intermittent 503 scenarios
Brief 503 spike after a metric customization config change
A small number of requests return HTTP 503 immediately after you update custom metric configurations.
Root cause
The metric customization feature generates an Envoy filter that updates the istio.stats configuration. This update is delivered through the Listener Discovery Service (LDS), which modifies the Envoy Listener. When the listener configuration changes, existing connections are terminated and any in-transit requests on those connections receive a 503 response.
The 503 is not sent by the upstream server. The client-side sidecar proxy generates it in response to the upstream connection reset.
Why the default retry policy does not help
The default sidecar proxy retry policy covers these conditions:
"retry_policy": {
"retry_on": "connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes",
"num_retries": 2,
"retry_host_predicate": [
{
"name": "envoy.retry_host_predicates.previous_hosts"
}
],
"host_selection_retry_max_attempts": "5",
"retriable_status_codes": [
503
]
}| Condition | Trigger |
|---|---|
connect-failure | Connection failure (connect timeout) |
refused-stream | HTTP/2 REFUSED_STREAM error |
unavailable | gRPC unavailable status |
cancelled | gRPC cancelled status |
retriable-status-codes | Response status code matches a code in retriable_status_codes (503 by default) |
The reset condition -- which covers upstream disconnects and connection resets -- is not included. That is the condition this scenario triggers.
Solution: add reset to the retry policy
Add reset (and optionally 503) to the retryOn field in a VirtualService for the affected service.
The following example configures retries for the Ratings service:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ratings-route
spec:
hosts:
- ratings.prod.svc.cluster.local
http:
- route:
- destination:
host: ratings.prod.svc.cluster.local
subset: v1
retries:
attempts: 2
retryOn: connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes,reset,503Replace ratings.prod.svc.cluster.local and subset v1 with the host and subset of your target service.
For the full list of Envoy retry conditions, see:
Router - x-envoy-retry-on (HTTP/2 and HTTP/3 retry conditions)
x-envoy-retry-grpc-on (gRPC-specific retry conditions)
Intermittent 503 from idle connection timeout mismatch
HTTP 503 errors appear intermittently without any configuration changes, often increasing under higher traffic. The Envoy access log shows response flag UC (Upstream Connection termination). This typically affects inbound sidecar proxy traffic.
Root cause
The sidecar proxy and the application have different idle connection timeout values. The default idle connection timeout for the sidecar proxy is 1 hour.
When the proxy timeout is longer than the application timeout:
The application closes the idle connection first, but the sidecar proxy still considers the connection active. If a new request arrives on that connection, the proxy forwards it to a closed connection and returns HTTP 503 (response_flags=UC).

When the proxy timeout is shorter than the application timeout:
The proxy closes the connection first and creates a new one for the next request. No 503 error occurs in this case.

Solution 1: set idleTimeout in a DestinationRule
Align the idle timeout by setting idleTimeout in a DestinationRule. This setting applies to both inbound and outbound sidecar proxy traffic. It also works when the client does not have a sidecar proxy.
Set idleTimeout to a value slightly shorter than the application's idle timeout. A value that is too short increases the total number of connections.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: <your-service-idle-timeout>
spec:
host: <your-service-host>
trafficPolicy:
connectionPool:
tcp:
idleTimeout: 30mReplace <your-service-idle-timeout> and <your-service-host> with your service name and host. Adjust 30m based on your application's idle timeout -- set it slightly shorter than the application's value.
Solution 2: configure retries in a VirtualService
A retry triggers a new connection, which resolves the stale-connection problem. Follow the same retry configuration as in Brief 503 spike after a metric customization config change.
Retries on non-idempotent requests (such as POST) are high-risk operations and can cause duplicate operations. Evaluate carefully before enabling retries for these request types.
Intermittent 503 during pod restarts due to sidecar lifecycle misconfiguration
HTTP 503 errors appear briefly each time pods restart.
Root cause
The sidecar proxy container lifecycle is misconfigured. The proxy may shut down before the application finishes draining connections, or begin receiving traffic before the application is ready.
Solution
Configure the sidecar proxy container lifecycle to align with your application's startup and shutdown sequence. For details, see Sidecar proxy lifecycle.
Persistent 503 scenarios
Application listening on localhost
All requests from other pods to a specific application return HTTP 503.
Root cause
The application binds to localhost (127.0.0.1) instead of 0.0.0.0. The sidecar proxy forwards traffic to the application's port, but the application rejects connections from non-loopback addresses.
Solution
Bind the application to 0.0.0.0 so that the sidecar proxy and other pods can reach it. For details, see Expose a cluster application that listens on localhost to other pods.
Health check failure after enabling mTLS
After sidecar injection, pod health checks (liveness and readiness probes) consistently fail, and an HTTP 503 status code is reported.
Root cause
When mutual TLS (mTLS) is enabled in ASM, the sidecar proxy intercepts all incoming traffic to the pod, including kubelet health check requests. Because kubelet lacks an Istio-issued TLS certificate, it cannot complete the mTLS handshake and every health check fails.
Solution
Exclude the health check port from sidecar traffic interception so that kubelet can reach the application directly. For details, see Why is no valid health check information displayed after sidecar injection?