All Products
Search
Document Center

Alibaba Cloud Service Mesh:Configure fault injection

Last Updated:Mar 11, 2026

Fault injection introduces deliberate failures into a service mesh to improve fault tolerance, discover client-side bugs, and identify potential faults. Unlike network-layer chaos testing (dropping packets or killing pods), fault injection works at the application layer through Envoy proxies, targeting specific failures like HTTP delays or error codes.

Service Mesh (ASM) supports fault injection through VirtualService resources. The following example injects a delay fault into the HTTPBin service and verifies the result.

Fault injection types

ASM supports two fault types, both configured in a VirtualService:

TypeWhat it simulatesUse case
DelayIncreased network latency or an overloaded upstream serviceTest timeout handling and retry logic
AbortUpstream service failure (HTTP error codes)Test error handling and fallback behavior

The following example demonstrates delay fault injection. For the full VirtualService fault injection API, see Manage virtual services.

Prerequisites

Complete the preparations and deploy the HTTPBin and sleep services. For more information, see Preparations.

Step 1: Verify that the services are running

Before injecting faults, confirm that the HTTPBin service responds normally.

  1. Use kubectl to connect to your Container Service for Kubernetes (ACK) cluster based on the information in the kubeconfig file, and open a shell in the sleep pod:

       kubectl exec -it deploy/sleep -- sh
  2. Send a request to the HTTPBin service:

       curl -I httpbin:8000

    Expected output:

       HTTP/1.1 200 OK
       server: envoy
       date: Fri, 11 Aug 2023 09:50:24 GMT
       content-type: text/html; charset=utf-8
       content-length: 9593
       access-control-allow-origin: *
       access-control-allow-credentials: true
       x-envoy-upstream-service-time: 3

    A 200 OK response with x-envoy-upstream-service-time: 3 confirms that the service responds in about 3 milliseconds with no artificial delay.

Step 2: Inject a delay fault

Create a VirtualService that adds a 5-second delay to all requests to the HTTPBin service.

Apply the following YAML through the ASM console or kubectl. For detailed steps, see Manage virtual services.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: httpbin-vs
  namespace: default
spec:
  hosts:
    - httpbin
  http:
    - fault:
        delay:
          fixedDelay: 5s
          percentage:
            value: 100
      route:
        - destination:
            host: httpbin

Key fields:

FieldDescription
fault.delay.fixedDelayDuration of the injected delay. Set to 5s in this example.
fault.delay.percentage.valuePercentage of requests affected. 100 means all requests are delayed.
hostsTarget service. Requests to httpbin are matched.
route.destination.hostUpstream service that receives the request after the delay.

Step 3: Verify the delay fault

After applying the VirtualService, confirm that requests to HTTPBin are now delayed by 5 seconds.

  1. Open a shell in the sleep pod:

       kubectl exec -it deploy/sleep -- sh
  2. Send a request and measure the total response time:

       curl -w "Total time: %{time_total} seconds\n" -I httpbin:80

    Expected output:

       HTTP/1.1 200 OK
       server: istio-envoy
       date: Sun, 27 Aug 2023 12:41:05 GMT
       content-type: text/html; charset=utf-8
       content-length: 9593
       access-control-allow-origin: *
       access-control-allow-credentials: true
       x-envoy-upstream-service-time: 3
    
       Total time: 5.008333 seconds

Interpret the results

Compare the output from Step 1 and Step 3:

MetricBefore fault injectionAfter fault injection
Total response time~3 ms~5 seconds
Server headerenvoyistio-envoy
HTTP status200 OK200 OK

The response still succeeds with 200 OK, but the total time increased from milliseconds to about 5 seconds, matching the fixedDelay: 5s configuration. The server header changed to istio-envoy, indicating that the ASM sidecar proxy is actively processing the request and injecting the delay.

Delay fault injection is especially useful for discovering timeout mismatches across a microservice call chain. If an upstream service has a hard-coded timeout shorter than the injected delay, the test exposes that bug before it causes production incidents.

What's next

  • Adjust the percentage.value field to inject delays on a subset of traffic (for example, set it to 50 to delay 50% of requests).

  • Replace the delay block with an abort block to return HTTP error codes instead of delays. For the API specification, see Manage virtual services.

  • Add header-based match conditions to scope fault injection to specific users or traffic subsets, keeping other traffic unaffected.