All Products
Search
Document Center

Container Service for Kubernetes:Zero-downtime deployments: Rolling updates and graceful shutdown

Last Updated:Apr 02, 2026

To update applications in a Container Service for Kubernetes (ACK) cluster without service interruptions, you can configure a Deployment with a readiness probe, readinessGates, a preStop hook, and Server Load Balancer (SLB) connection draining. This configuration ensures smooth traffic migration and continuous high availability.

How it works

To ensure high availability during service upgrades, you can use the Rolling Update strategy for stateless applications (Deployments). This strategy replaces Pods one by one to ensure continuous Pod availability for incoming traffic. The core process is divided into the following phases:

  1. Startup phase: First, a new version (v2) of the Pod is created. Kubernetes waits for the new Pod to pass its readiness probe, confirming it can process requests. Until then, the Pod does not receive any traffic from the Service.

  2. Traffic shifting phase: After readinessGates is enabled, a new Pod must first pass its readiness check. Its IP is then registered with the Endpoints of the associated Service and synchronized with the backend server group of the load balancer (SLB) to start receiving traffic. Subsequently, the system sends a termination signal to the old version (v1) Pod and removes its IP from the Endpoints so that it no longer receives new requests.

    For more information, see How readinessGates works.
  3. Graceful shutdown phase: Before an old Pod is deleted, it executes a predefined preStop hook and uses the termination grace period (terminationGracePeriodSeconds) to finish processing established connections, while the SLB performs connection draining for in-flight requests. This process ensures that all in-progress requests are completed, which achieves a zero-downtime rolling update.

image

Prerequisites

Deploy a sample application

The following example shows how to deploy a stateless NGINX application.

Console

  1. On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Deployments.

  2. On the Deployments page, click Create from YAML. Copy the following content to the template editor and click Create.

    Sample application YAML

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-demo
    spec:
      replicas: 1                 # Set to 2 or more for production HA. Set to 1 for demonstration purposes.
      selector:
        matchLabels:
          app: nginx-demo
      # Rolling update strategy: ensures service availability during updates.
      # strategy:
        # type: RollingUpdate     # Default strategy for Deployments.
        # rollingUpdate:
          # maxUnavailable: "25%" # Default. Max 25% of Pods can be unavailable during the update.
          # maxSurge: "25%"       # Default. Max 25% extra Pods can be created above the desired replica count.
      template:
        metadata:
          labels:
            app: nginx-demo 
        spec:
          # Pod-level graceful shutdown limit. Must be greater than the sum of preStop execution and app cleanup time.
          terminationGracePeriodSeconds: 60 
          readinessGates:
          - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Set the Readiness Gate for the nginx-demo-service Service.
          containers:
          - name: nginx
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
              limits:
                cpu: 500m
            # --- Health check probes ---
            # startup probe: Ensures the application in the container has started.
            startupProbe:
              httpGet:
                path: / # Accessing the default NGINX root path indicates a successful startup.
                port: 80
              # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
              # Here: 30 * 10 = 300 seconds.
              failureThreshold: 30
              periodSeconds: 10
            # readiness probe: Determines whether the container is ready to receive traffic.
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
              periodSeconds: 5        # Probe every 5 seconds.
              timeoutSeconds: 2       # Probe timeout duration.
              successThreshold: 1     # 1 success marks the Pod as ready.
              failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
            # --- Pod graceful shutdown configuration ---
            lifecycle:
              preStop:
                exec:
                  # For reliable graceful shutdown, define a custom hook that handles in-flight requests based on your application logic.
                  # Using sleep alone is not recommended as it does not guarantee a clean exit.
                  command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
    ---           
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-demo-service
      annotations:
        # Timeout for connection draining. This value should align with the application's preStop logic. Range: 10-900.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30" 
        # Enable connection draining.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
    spec:
      type: LoadBalancer
      selector:
        app: nginx-demo 
      ports:
        - protocol: TCP
          port: 80
  3. In the pop-up window, find the target stateless application, click View , and verify that the Pod status is Running

kubectl

  1. Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Save the following YAML content to a file named nginx-demo.yaml.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-demo
    spec:
      replicas: 1                 # Set to 2 or more for production HA. Set to 1 for demonstration purposes.
      selector:
        matchLabels:
          app: nginx-demo
      # Rolling update strategy: ensures service availability during updates.
      # strategy:
        # type: RollingUpdate     # Default strategy for Deployments.
        # rollingUpdate:
          # maxUnavailable: "25%" # Default. Max 25% of Pods can be unavailable during the update.
          # maxSurge: "25%"       # Default. Max 25% extra Pods can be created above the desired replica count.
      template:
        metadata:
          labels:
            app: nginx-demo 
        spec:
          # Pod-level graceful shutdown limit. Must be greater than the sum of preStop execution and app cleanup time.
          terminationGracePeriodSeconds: 60 
          readinessGates:
          - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Set the Readiness Gate for the nginx-demo-service Service.
          containers:
          - name: nginx
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
              limits:
                cpu: 500m
            # --- Health check probes ---
            # startup probe: Ensures the application in the container has started.
            startupProbe:
              httpGet:
                path: / # Accessing the default NGINX root path indicates a successful startup.
                port: 80
              # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
              # Here: 30 * 10 = 300 seconds.
              failureThreshold: 30
              periodSeconds: 10
            # readiness probe: Determines whether the container is ready to receive traffic.
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
              periodSeconds: 5        # Probe every 5 seconds.
              timeoutSeconds: 2       # Probe timeout duration.
              successThreshold: 1     # 1 success marks the Pod as ready.
              failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
            # --- Pod graceful shutdown configuration ---
            lifecycle:
              preStop:
                exec:
                  # For reliable graceful shutdown, define a custom hook that handles in-flight requests based on your application logic.
                  # Using sleep alone is not recommended as it does not guarantee a clean exit.
                  command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
    ---           
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-demo-service
      annotations:
        # Timeout for connection draining. This value should align with the application's preStop logic. Range: 10-900.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30" 
        # Enable connection draining.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
    spec:
      type: LoadBalancer
      selector:
        app: nginx-demo 
      ports:
        - protocol: TCP
          port: 80
  3. Deploy the NGINX application and create the Service.

    kubectl apply -f nginx-demo.yaml
  4. Verify that the target application Pod is Running.

    kubectl get pod | grep nginx-deployment-demo
  • Pod readiness checks

    • startupProbe (startup probe): Checks if slow-starting applications, such as Java applications, have finished launching. Before the startup probe succeeds, the readiness and liveness probes are not executed. This prevents Kubelet from misjudging a slow start as a failure and restarting the container.

    • readinessProbe (readiness probe): Determines if a container is ready to handle external requests. After the readiness check succeeds, the Pod's IP address is added to the Endpoints of all its associated Services. This indicates that the Pod can accept traffic.

    • readinessGates: In addition to the readinessProbe, a Pod is considered fully ready to accept traffic only after the readinessGates also indicate a ready status.

  • Graceful shutdown

    • Application graceful shutdown

      • preStop: A hook command that runs before a container terminates. Set a command for application graceful shutdown to ensure that all in-flight requests are processed. This guarantees a non-disruptive service shutdown.

        Set a custom hook method as needed. If you only use the sleep command, the graceful shutdown process might not exit correctly.
      • terminationGracePeriodSeconds: The total time from when a Pod is marked for termination until it is forcibly killed with a SIGKILL signal. The default is 30 seconds. This value must be long enough to cover the combined execution time of the preStop hook and the container's own cleanup time.

    • SLB connection draining

      • service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain annotation: Enables the connection draining feature for Server Load Balancer (SLB).

      • service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: The timeout period for connection draining, in seconds. We recommend that you set this value to be close to the time required to process in-flight requests in the preStop hook.

  • Rolling update strategy

    • strategy: The default update strategy for a Deployment is RollingUpdate. This strategy uses a progressive replacement method. It gradually creates new Pods and deletes the corresponding old Pods after the new ones are ready. This ensures service availability during the update process.

    • maxUnavailable: The maximum number of unavailable Pod replicas during a rolling update. The default value is 25%. You can also specify an absolute number.

    • maxSurge: The maximum number of Pods that can be created beyond the desired number of replicas during a rolling update. A higher value speeds up the update but consumes more resources. The default value is 25%. You can also specify an absolute number.

Verify the zero-downtime rolling deployment

  1. Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

  2. Obtain the access URL of the sample application.

    export NGINX_ENDPOINT=$(kubectl get service nginx-demo-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}{":"}{.spec.ports[0].port}')
    echo $NGINX_ENDPOINT
  3. Install the load testing tool hey. Run a load test with 200 concurrent connections and 50,000 total requests. With the resource configuration in this example, a single replica should complete the test in about one minute.

    hey -c 200 -n 50000  -disable-keepalive http://$NGINX_ENDPOINT

    While the test is running, open a new terminal window and immediately restart the Deployment.

    kubectl rollout restart deployment nginx-deployment-demo
  4. The following table describes the expected outputs.

    Deployment scenario

    Expected output

    Without zero-downtime configuration

    Sample YAML without zero-downtime configuration

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-demo
    spec:
      replicas: 1                 # Set to 2 or more for production HA. Set to 1 for demonstration purposes.
      selector:
        matchLabels:
          app: nginx-demo
      template:
        metadata:
          labels:
            app: nginx-demo 
        spec:
          containers:
          - name: nginx
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
              limits:
                cpu: 500m
    ---           
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-demo-service
    spec:
      type: LoadBalancer
      selector:
        app: nginx-demo 
      ports:
        - protocol: TCP
          port: 80

    Traffic loss is observed.

    Status code distribution:
      [200]	49644 responses
    
    Error distribution:
      [320]	Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: connection refused
      [18]	Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: no route to host
      [18]	Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: operation timed out

    With zero-downtime configuration

    Zero traffic loss is achieved.

    Status code distribution:
      [200]	50000 responses

FAQ

Pod status: Running but not ready

Cause: This issue is usually caused by a failed startup or readiness probe.

Solution:

  • Readiness probe configuration: On the Edit page of the target Workloads, verify that the health check request path (for example, /healthz) and port match those that the application provides. If the application has a long startup time, increase the Unhealthy Threshold to avoid premature failures.

    You can temporarily disable the Readiness, log on to the Pod's terminal or its host, and use a command, such as curl, to verify that the health check method responds correctly.
  • Troubleshoot application issues: Investigate the issue by checking the pod's Events and Logs. Select Show the log of the last container exit.

Related documents