Fault injection introduces deliberate failures into a service mesh to improve fault tolerance, discover client-side bugs, and identify potential faults. Unlike network-layer chaos testing (dropping packets or killing pods), fault injection works at the application layer through Envoy proxies, targeting specific failures like HTTP delays or error codes.
Service Mesh (ASM) supports fault injection through VirtualService resources. The following example injects a delay fault into the HTTPBin service and verifies the result.
Fault injection types
ASM supports two fault types, both configured in a VirtualService:
| Type | What it simulates | Use case |
|---|---|---|
| Delay | Increased network latency or an overloaded upstream service | Test timeout handling and retry logic |
| Abort | Upstream service failure (HTTP error codes) | Test error handling and fallback behavior |
The following example demonstrates delay fault injection. For the full VirtualService fault injection API, see Manage virtual services.
Prerequisites
Complete the preparations and deploy the HTTPBin and sleep services. For more information, see Preparations.
Step 1: Verify that the services are running
Before injecting faults, confirm that the HTTPBin service responds normally.
Use kubectl to connect to your Container Service for Kubernetes (ACK) cluster based on the information in the kubeconfig file, and open a shell in the sleep pod:
kubectl exec -it deploy/sleep -- shSend a request to the HTTPBin service:
curl -I httpbin:8000Expected output:
HTTP/1.1 200 OK server: envoy date: Fri, 11 Aug 2023 09:50:24 GMT content-type: text/html; charset=utf-8 content-length: 9593 access-control-allow-origin: * access-control-allow-credentials: true x-envoy-upstream-service-time: 3A
200 OKresponse withx-envoy-upstream-service-time: 3confirms that the service responds in about 3 milliseconds with no artificial delay.
Step 2: Inject a delay fault
Create a VirtualService that adds a 5-second delay to all requests to the HTTPBin service.
Apply the following YAML through the ASM console or kubectl. For detailed steps, see Manage virtual services.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: httpbin-vs
namespace: default
spec:
hosts:
- httpbin
http:
- fault:
delay:
fixedDelay: 5s
percentage:
value: 100
route:
- destination:
host: httpbinKey fields:
| Field | Description |
|---|---|
fault.delay.fixedDelay | Duration of the injected delay. Set to 5s in this example. |
fault.delay.percentage.value | Percentage of requests affected. 100 means all requests are delayed. |
hosts | Target service. Requests to httpbin are matched. |
route.destination.host | Upstream service that receives the request after the delay. |
Step 3: Verify the delay fault
After applying the VirtualService, confirm that requests to HTTPBin are now delayed by 5 seconds.
Open a shell in the sleep pod:
kubectl exec -it deploy/sleep -- shSend a request and measure the total response time:
curl -w "Total time: %{time_total} seconds\n" -I httpbin:80Expected output:
HTTP/1.1 200 OK server: istio-envoy date: Sun, 27 Aug 2023 12:41:05 GMT content-type: text/html; charset=utf-8 content-length: 9593 access-control-allow-origin: * access-control-allow-credentials: true x-envoy-upstream-service-time: 3 Total time: 5.008333 seconds
Interpret the results
Compare the output from Step 1 and Step 3:
| Metric | Before fault injection | After fault injection |
|---|---|---|
| Total response time | ~3 ms | ~5 seconds |
| Server header | envoy | istio-envoy |
| HTTP status | 200 OK | 200 OK |
The response still succeeds with 200 OK, but the total time increased from milliseconds to about 5 seconds, matching the fixedDelay: 5s configuration. The server header changed to istio-envoy, indicating that the ASM sidecar proxy is actively processing the request and injecting the delay.
Delay fault injection is especially useful for discovering timeout mismatches across a microservice call chain. If an upstream service has a hard-coded timeout shorter than the injected delay, the test exposes that bug before it causes production incidents.
What's next
Adjust the
percentage.valuefield to inject delays on a subset of traffic (for example, set it to50to delay 50% of requests).Replace the
delayblock with anabortblock to return HTTP error codes instead of delays. For the API specification, see Manage virtual services.Add header-based match conditions to scope fault injection to specific users or traffic subsets, keeping other traffic unaffected.