All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use graceful shutdown to prevent traffic loss

Last Updated:Dec 05, 2025

When you scale in or perform a rolling restart on a Service Mesh (ASM) gateway, gateway pods are deleted. This can cause a small amount of traffic loss. You can enable the graceful shutdown feature to prevent this loss. When graceful shutdown is enabled, existing connections continue to transfer data for a period of time while the gateway pods are being deleted. This topic describes how to use the graceful shutdown feature.

Scope

Step 1: Enable graceful shutdown

Enable the feature for an existing gateway

In ASM 1.26 and later, changing the graceful shutdown configuration causes the gateway to restart. Perform this operation during off-peak hours.

Console

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose ASM Gateways > Ingress Gateway.

  3. On the Ingress Gateway page, click the name of the target gateway.

  4. On the Gateway overview page, click Advanced Options, click the Edit icon next to Graceful Shutdown, select the Graceful Shutdown checkbox, set the Connection timeout (seconds), and then click Submit.

YAML configuration (for ASM versions below 1.26)

Add the required annotations to the gateway YAML file under the serviceAnnotations field.

apiVersion: istio.alibabacloud.com/v1
kind: IstioGateway
metadata:
  name: ingressgateway
  namespace: istio-system
spec:
  gatewayType: ingress
  serviceAnnotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: 'on'          # Enable connection draining for the load balancer, which is graceful shutdown.
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: '10'  # The connection draining timeout period. Valid values: 10 to 30.
...

YAML configuration (for ASM versions 1.26 and later)

Add the required annotation to the gateway YAML file under the annotations field.

apiVersion: istio.alibabacloud.com/v1
kind: IstioGateway
metadata:
  annotations:
    # For Classic Load Balancer (CLB) and Network Load Balancer (NLB) gateways, the valid values are 10 to 890.
    # For ClusterIP and NodePort gateways, there is no upper limit.
    asm.alibabacloud.com/gateway-drain-timeout-seconds: "30"
  name: ingressgateway
  namespace: istio-system
...

Enable the feature when you create a gateway

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose ASM Gateways > Ingress Gateway.

  3. On the Ingress Gateway page, click Create.

  4. On the Create page, select a Deployment Cluster, set CLB Type to Public Network Access, select a load balancer specification under New CLB Instance, and set Number Of Gateway Replicas to 10. Keep the default values for the other configuration items.

    For more information about the configuration items, see Create an ingress gateway.

  5. Click Advanced Options, select the Graceful Shutdown checkbox, set Connection Timeout (Seconds), and then click Create.

    Configuration item

    Description

    Graceful Shutdown

    If you select this option, the Classic Load Balancer (CLB) instance smoothly drains existing connections when gateway pods are rolling restarted. This minimizes the impact on your services and better supports scenarios such as configuration changes and gateway upgrades.

    Connection Timeout (Seconds)

    After the CLB instance removes a gateway pod, it waits for the configured connection timeout period before it disconnects from the pod. This parameter provides a buffer for the gateway pod to process existing connections. The default graceful shutdown time for a gateway pod is 30 seconds. The timeout period that you configure for the CLB instance should not exceed 30 seconds.

    Starting from version 1.26, you can set the timeout period to a maximum of 890 seconds.

Step 2: Deploy a sample application

  1. Connect to the ACK cluster by using kubectl. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Create an httpbin.yaml file with the following content.

    Expand to view the httpbin.yaml file

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: httpbin
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: httpbin
      labels:
        app: httpbin
        service: httpbin
    spec:
      ports:
      - name: http
        port: 8000
        targetPort: 80
      selector:
        app: httpbin
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: httpbin
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: httpbin
          version: v1
      template:
        metadata:
          labels:
            app: httpbin
            version: v1
        spec:
          serviceAccountName: httpbin
          containers:
          - image: docker.io/kennethreitz/httpbin
            imagePullPolicy: IfNotPresent
            name: httpbin
            ports:
            - containerPort: 80
  3. Deploy the httpbin application.

    kubectl apply -f httpbin.yaml -n default

Step 3: Create a virtual service and a gateway rule

  1. Create a virtual service.

    1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

    2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > VirtualService. On the page that appears, click Create from YAML.

    3. On the Create page, select a Namespace and a Scenario Template, enter the following YAML configuration, and then click Create.

      apiVersion: networking.istio.io/v1beta1
      kind: VirtualService
      metadata:
        name: httpbin
        namespace: default
      spec:
        gateways:
          - httpbin-gateway
        hosts:
          - '*'
        http:
          - route:
              - destination:
                  host: httpbin
                  port:
                    number: 8000
  2. Create a gateway rule.

    1. On the details page of the ASM instance, choose ASM Gateways > Gateway in the left-side navigation pane. On the page that appears, click Create from YAML.

    2. On the Create page, select a Namespace and a Scenario Template, enter the following YAML configuration, and then click Create.

      apiVersion: networking.istio.io/v1beta1
      kind: Gateway
      metadata:
        name: httpbin-gateway
        namespace: default
      spec:
        selector:
          istio: ingressgateway
        servers:
          - hosts:
              - '*'
            port:
              name: http
              number: 80
              protocol: HTTP
  3. Verify that the routing is configured successfully.

    1. Obtain the ASM gateway address. For more information, see Create an ingress gateway.

    2. In the address bar of your browser, enter http://<ASM gateway address>.

      The following page appears, which indicates that the routing is configured successfully.httpbin

Step 4: Verify the effect of graceful shutdown

  1. Download and install a version of the lightweight stress testing tool hey that is compatible with your operating system. For more information, see hey.

  2. Scale in the ASM gateway.

    1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

    2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose ASM Gateways > Ingress Gateway.

    3. On the Ingress Gateway page, click Edit YAML to the right of the target gateway.

    4. In the Edit dialog box, set the value of the replicaCount parameter to 1, and then click OK.

      缩容

  3. Run the following command to send 50,000 requests to the httpbin application with a concurrency of 200. Check the traffic loss before and after graceful shutdown is enabled.

    hey -c 200 -n 50000 -disable-keepalive http://<ASM gateway address>/

    Type

    Result analysis

    Graceful shutdown disabled

    The following output is returned:

    Status code distribution:
      [200] 49747 responses
    
    Error distribution:
      [253] Get "http://47.55.2xx.xx": dial tcp 47.55.2xx.xx:80: connect: connection refused

    Of the 50,000 requests, only 49,747 return a status code of 200. This indicates that only 49,747 requests are successful and a small amount of traffic is lost.

    Graceful shutdown enabled

    The following output is returned:

    ............
    Status code distribution:
      [200] 50000 responses

    All 50,000 requests return a status code of 200. This indicates that all 50,000 requests are successful and no traffic is lost.