All Products
Search
Document Center

Container Service for Kubernetes:Zero-downtime application deployment

Last Updated:Mar 26, 2026

To update applications in an Alibaba Cloud Container Service for Kubernetes (ACK) cluster without service interruptions, configure a Deployment with readiness probes, readinessGates, preStop hooks, and Server Load Balancer (SLB) connection draining. This setup enables smooth traffic migration and maintains high availability during upgrades.

How it works

The RollingUpdate strategy orchestrates upgrades for stateless workloads by incrementally replacing Pods while keeping enough replicas available to handle live traffic. The core process involves three stages:

  1. Startup: wait for the new Pod to pass its readiness probe. Kubernetes creates a new Pod (v2) and waits for its readiness probe to succeed. Until the probe passes, the Pod is isolated from Service traffic.

  2. Traffic switching: synchronize Kubernetes state with the load balancer. After the new Pod passes its internal probes, its IP is added to the Service's Endpoints. The configured readinessGates then prevent the Pod from being marked fully Ready until the Cloud Controller Manager has registered it with the SLB backend server group—guaranteeing the load balancer knows about the new instance before routing traffic to it. At the same time, the old Pod is deregistered from the SLB and receives a termination signal. > For more information about how readinessGates works, see How readinessGates works.

  3. Graceful shutdown: drain in-flight requests before termination. When the Pod receives a termination signal, the Kubelet invokes the preStop lifecycle hook, giving the application time to finish in-flight requests within the configured terminationGracePeriodSeconds. In parallel, the SLB performs connection draining: it keeps existing connections open while stopping new ones from being routed to the Pod. This coordinated shutdown helps ensure that no requests are dropped before the Pod is removed.

image

Prerequisites

Before you begin, ensure that you have:

Deploy a sample application

Use one of the following methods to deploy a stateless NGINX application.

Console

  1. On the ACK Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Deployments.

  2. On the Deployments page, click Create from YAML. Copy the following code into the editor and click Create.

    Sample application YAML

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-demo
    spec:
      replicas: 1                 # Set to 2 or more for production HA. It is set to 1 for demonstration purposes.
      selector:
        matchLabels:
          app: nginx-demo
      # Rolling update strategy: Ensures service availability during updates.
      # strategy:
        # type: RollingUpdate     # Default strategy for Deployments.
        # rollingUpdate:
          # maxUnavailable: "25%" # Default. Max 25% of Pods can be unavailable during the update.
          # maxSurge: "25%"       # Default. Max 25% extra Pods can be created above the desired replica count.
      template:
        metadata:
          labels:
            app: nginx-demo
        spec:
          # Pod-level graceful shutdown limit. Must exceed the sum of preStop execution and app cleanup time.
          terminationGracePeriodSeconds: 60
          readinessGates:
          - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Configures the Readiness Gate for nginx-demo-service.
          containers:
          - name: nginx
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
              limits:
                cpu: 500m
            # --- Health check probes ---
            # Startup probe: Verifies the application within the container has started.
            startupProbe:
              httpGet:
                path: / # Success indicates NGINX root path is accessible.
                port: 80
              # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
              # Here: 30 * 10 = 300 seconds.
              failureThreshold: 30
              periodSeconds: 10
            # Readiness probe: Verifies the container is ready to accept traffic.
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
              periodSeconds: 5        # Probes every 5 seconds.
              timeoutSeconds: 2       # Probe timeout.
              successThreshold: 1     # 1 success marks the Pod as ready.
              failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
            # --- Service graceful shutdown configuration ---
            lifecycle:
              preStop:
                exec:
                  # Define a custom hook to process in-flight connections before shutdown.
                  # Relying solely on 'sleep' may not ensure a proper graceful exit.
                  command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-demo-service
      annotations:
        # Timeout for SLB connection draining. Should align with the application's preStop logic. Range: 10-900 seconds.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30"
        # Enable SLB connection draining.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
    spec:
      type: LoadBalancer
      selector:
        app: nginx-demo
      ports:
        - protocol: TCP
          port: 80
  3. In the dialog box, locate the deployment and click View. Verify that the Pod status is Running.

kubectl

  1. Connect to the cluster using kubectl. For clusters without public access, click Manage Clusters Using Workbench on the Cluster Information page to connect over the internal network.

  2. Create a file named nginx-demo.yaml with the following content.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-demo
    spec:
      replicas: 1                 # For production environments, set this to 2 or more to ensure high availability. It is set to 1 here for easy verification of the rolling deployment.
      selector:
        matchLabels:
          app: nginx-demo
      # Rolling update strategy: Ensures no service interruptions during updates.
      # strategy:
        # type: RollingUpdate     # The default strategy for a deployment workload is RollingUpdate.
        # rollingUpdate:
          # maxUnavailable: "25%" # Default value. A maximum of 25% of the pods can be unavailable during the update process.
          # maxSurge: "25%"       # Default value. The number of pods can exceed the desired number of replicas by a maximum of 25% during the update process.
      template:
        metadata:
          labels:
            app: nginx-demo
        spec:
          # Pod-level graceful shutdown limit. Must exceed the sum of preStop execution and app cleanup time.
          terminationGracePeriodSeconds: 60
          readinessGates:
          - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Configures the Readiness Gate for nginx-demo-service.
          containers:
          - name: nginx
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 500m
                memory: 1Gi
              limits:
                cpu: 500m
            # --- Health check probes ---
            # Startup probe: Verifies the application within the container has started.
            startupProbe:
              httpGet:
                path: / # Success indicates NGINX root path is accessible.
                port: 80
              # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
              # Here: 30 * 10 = 300 seconds.
              failureThreshold: 30
              periodSeconds: 10
            # Readiness probe: Verifies the container is ready to accept traffic.
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
              periodSeconds: 5        # Probes every 5 seconds.
              timeoutSeconds: 2       # Probe timeout.
              successThreshold: 1     # 1 success marks the Pod as ready.
              failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
            # --- Service graceful shutdown configuration ---
            lifecycle:
              preStop:
                exec:
                  # Define a custom hook to process in-flight connections before shutdown.
                  # Relying solely on 'sleep' may not ensure a proper graceful exit.
                  command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-demo-service
      annotations:
        # Timeout for SLB connection draining. Should align with the application's preStop logic. Range: 10-900 seconds.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30"
        # Enable SLB connection draining.
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
    spec:
      type: LoadBalancer
      selector:
        app: nginx-demo
      ports:
        - protocol: TCP
          port: 80
  3. Apply the configuration to deploy the application and Service.

    kubectl apply -f nginx-demo.yaml
  4. Verify that the Pod's status is Running.

    kubectl get pod | grep nginx-deployment-demo

Key configuration parameters

Pod readiness probes

Parameter Role Description
startupProbe Confirm container initialization Blocks liveness and readiness probes until it succeeds, preventing Kubelet from prematurely restarting slow-starting containers (such as Java applications).
readinessProbe Gate traffic admission When successful, adds the Pod's IP to the Service's Endpoints. When it fails, removes the Pod from the Endpoints to stop traffic.
readinessGates Synchronize with the load balancer Adds an extra readiness condition: the Pod is not marked fully Ready until Cloud Controller Manager registers it with the SLB backend server group. This prevents traffic from arriving before the load balancer is ready.

Graceful shutdown

Application graceful shutdown

Parameter Description
preStop A lifecycle hook executed immediately before container termination. Use this hook to trigger a graceful application shutdown, ensuring in-flight requests are finalized before the process stops. Define a custom hook tailored to your application logic—relying solely on a sleep command is unreliable and may result in an incomplete shutdown.
terminationGracePeriodSeconds How long Kubernetes waits for a Pod to shut down gracefully before forcibly killing it (SIGKILL). Default: 30 seconds. Set this value to exceed the combined duration of the preStop hook and the application's internal cleanup time. For example, if your preStop hook takes 30 seconds and your application needs 20 seconds to finish cleanup, set this to at least 50 seconds. In this sample, preStop runs for 30 seconds, so terminationGracePeriodSeconds is set to 60 seconds as a safe margin.
The grace period applies to the total time for both the preStop hook and container stop. If the sum of preStop execution time and application cleanup time exceeds terminationGracePeriodSeconds, Kubernetes forcibly kills the container before it finishes.

SLB connection draining

Annotation Description
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain Set to "on" to enable connection draining on the SLB instance.
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout Draining timeout in seconds. Align this value with the time required to process in-flight requests in the preStop hook. Valid range: 10–900 seconds.

Rolling update strategy

Parameter Default Description
strategy.type RollingUpdate Incrementally replaces old Pods with new ones. New Pods are created and verified before the corresponding old Pods are terminated.
maxUnavailable 25% Maximum number (or percentage) of Pods that can be unavailable during the update. Reduce this value to maintain higher availability during rollouts.
maxSurge 25% Maximum number (or percentage) of extra Pods that can be created above the desired replica count during the update. Higher values accelerate the rollout but increase resource consumption.

Verify the zero-downtime rolling update

  1. Connect to the cluster using kubectl.

  2. Retrieve the external endpoint of the sample application.

    export NGINX_ENDPOINT=$(kubectl get service nginx-demo-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}{":"}{.spec.ports[0].port}')
    echo $NGINX_ENDPOINT
  3. Install the stress testing tool hey. Run a load test with 200 concurrent connections and 50,000 total requests. Based on the sample resource configuration, a single replica typically completes this in about 1 minute.

    hey -c 200 -n 50000 -disable-keepalive http://$NGINX_ENDPOINT
  4. In a new terminal window, trigger a rolling restart of the Deployment.

    kubectl rollout restart deployment nginx-deployment-demo

    To watch the rollout status in real time:

    kubectl rollout status deployment nginx-deployment-demo

    The output is similar to:

    Waiting for deployment "nginx-deployment-demo" rollout to finish: 0 of 1 updated replicas are available...
    deployment "nginx-deployment-demo" successfully rolled out
  5. After the load test completes, compare the results against the expected outputs. Without zero-downtime configuration — sample output showing traffic loss:

    Configuration scenario Expected output
    Without zero-downtime configuration Traffic loss observed.
    With zero-downtime configuration Zero downtime achieved: all 50,000 requests return HTTP 200.
    Status code distribution:
      [200] 49644 responses
    
    Error distribution:
      [320] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: connection refused
      [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: no route to host
      [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: operation timed out

    With zero-downtime configuration — sample output showing zero traffic loss:

    Status code distribution:
      [200] 50000 responses

FAQ

Why is my Pod stuck in the Running state but not ready? (Snipaste_2025-11-05_13-57-58)

The startup probe or readiness probe is most likely failing. Check the probe configuration on the Edit page for the target Workload: confirm that the health check path (such as /healthz) and port match your application's actual settings. If your application starts slowly, increase the Unhealthy Threshold to prevent premature failures.

To investigate further, check the Pod's Events and Logs. Select Show the log of the last container exit to review previous crash details. For manual verification, temporarily disable the probe, open a terminal inside the Pod, and use curl to confirm the health check endpoint responds correctly.

References