Achieve Zero-Downtime Rolling Deployments in ACK - Container Service for Kubernetes

How it works

The RollingUpdate strategy orchestrates upgrades for stateless workloads by incrementally replacing Pods while keeping enough replicas available to handle live traffic. The core process involves three stages:

Startup: wait for the new Pod to pass its readiness probe. Kubernetes creates a new Pod (v2) and waits for its readiness probe to succeed. Until the probe passes, the Pod is isolated from Service traffic.
Traffic switching: synchronize Kubernetes state with the load balancer. After the new Pod passes its internal probes, its IP is added to the Service's Endpoints. The configured readinessGates then prevent the Pod from being marked fully Ready until the Cloud Controller Manager has registered it with the SLB backend server group—guaranteeing the load balancer knows about the new instance before routing traffic to it. At the same time, the old Pod is deregistered from the SLB and receives a termination signal. > For more information about how readinessGates works, see How readinessGates works.
Graceful shutdown: drain in-flight requests before termination. When the Pod receives a termination signal, the Kubelet invokes the preStop lifecycle hook, giving the application time to finish in-flight requests within the configured terminationGracePeriodSeconds. In parallel, the SLB performs connection draining: it keeps existing connections open while stopping new ones from being routed to the Pod. This coordinated shutdown helps ensure that no requests are dropped before the Pod is removed.

Prerequisites

Before you begin, ensure that you have:

A cluster running version 1.24 or later. For upgrade instructions, see Upgrade a cluster.
cloud-controller-manager version v2.10.0 or later. For component details, see Cloud Controller Manager.

Deploy a sample application

Use one of the following methods to deploy a stateless NGINX application.

Console

On the ACK Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Deployments.

On the Deployments page, click Create from YAML. Copy the following code into the editor and click Create.

Sample application YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-demo
spec:
  replicas: 1                 # Set to 2 or more for production HA. It is set to 1 for demonstration purposes.
  selector:
    matchLabels:
      app: nginx-demo
  # Rolling update strategy: Ensures service availability during updates.
  # strategy:
    # type: RollingUpdate     # Default strategy for Deployments.
    # rollingUpdate:
      # maxUnavailable: "25%" # Default. Max 25% of Pods can be unavailable during the update.
      # maxSurge: "25%"       # Default. Max 25% extra Pods can be created above the desired replica count.
  template:
    metadata:
      labels:
        app: nginx-demo
    spec:
      # Pod-level graceful shutdown limit. Must exceed the sum of preStop execution and app cleanup time.
      terminationGracePeriodSeconds: 60
      readinessGates:
      - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Configures the Readiness Gate for nginx-demo-service.
      containers:
      - name: nginx
        image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 500m
        # --- Health check probes ---
        # Startup probe: Verifies the application within the container has started.
        startupProbe:
          httpGet:
            path: / # Success indicates NGINX root path is accessible.
            port: 80
          # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
          # Here: 30 * 10 = 300 seconds.
          failureThreshold: 30
          periodSeconds: 10
        # Readiness probe: Verifies the container is ready to accept traffic.
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
          periodSeconds: 5        # Probes every 5 seconds.
          timeoutSeconds: 2       # Probe timeout.
          successThreshold: 1     # 1 success marks the Pod as ready.
          failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
        # --- Service graceful shutdown configuration ---
        lifecycle:
          preStop:
            exec:
              # Define a custom hook to process in-flight connections before shutdown.
              # Relying solely on 'sleep' may not ensure a proper graceful exit.
              command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-demo-service
  annotations:
    # Timeout for SLB connection draining. Should align with the application's preStop logic. Range: 10-900 seconds.
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30"
    # Enable SLB connection draining.
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
spec:
  type: LoadBalancer
  selector:
    app: nginx-demo
  ports:
    - protocol: TCP
      port: 80

In the dialog box, locate the deployment and click View. Verify that the Pod status is Running.

kubectl

Connect to the cluster using kubectl. For clusters without public access, click Manage Clusters Using Workbench on the Cluster Information page to connect over the internal network.

Create a file named nginx-demo.yaml with the following content.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-demo
spec:
  replicas: 1                 # For production environments, set this to 2 or more to ensure high availability. It is set to 1 here for easy verification of the rolling deployment.
  selector:
    matchLabels:
      app: nginx-demo
  # Rolling update strategy: Ensures no service interruptions during updates.
  # strategy:
    # type: RollingUpdate     # The default strategy for a deployment workload is RollingUpdate.
    # rollingUpdate:
      # maxUnavailable: "25%" # Default value. A maximum of 25% of the pods can be unavailable during the update process.
      # maxSurge: "25%"       # Default value. The number of pods can exceed the desired number of replicas by a maximum of 25% during the update process.
  template:
    metadata:
      labels:
        app: nginx-demo
    spec:
      # Pod-level graceful shutdown limit. Must exceed the sum of preStop execution and app cleanup time.
      terminationGracePeriodSeconds: 60
      readinessGates:
      - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Configures the Readiness Gate for nginx-demo-service.
      containers:
      - name: nginx
        image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 500m
        # --- Health check probes ---
        # Startup probe: Verifies the application within the container has started.
        startupProbe:
          httpGet:
            path: / # Success indicates NGINX root path is accessible.
            port: 80
          # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds.
          # Here: 30 * 10 = 300 seconds.
          failureThreshold: 30
          periodSeconds: 10
        # Readiness probe: Verifies the container is ready to accept traffic.
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5  # Probing starts 5 seconds after the container starts.
          periodSeconds: 5        # Probes every 5 seconds.
          timeoutSeconds: 2       # Probe timeout.
          successThreshold: 1     # 1 success marks the Pod as ready.
          failureThreshold: 3     # 3 consecutive failures mark the Pod as not ready.
        # --- Service graceful shutdown configuration ---
        lifecycle:
          preStop:
            exec:
              # Define a custom hook to process in-flight connections before shutdown.
              # Relying solely on 'sleep' may not ensure a proper graceful exit.
              command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"]
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-demo-service
  annotations:
    # Timeout for SLB connection draining. Should align with the application's preStop logic. Range: 10-900 seconds.
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30"
    # Enable SLB connection draining.
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on"
spec:
  type: LoadBalancer
  selector:
    app: nginx-demo
  ports:
    - protocol: TCP
      port: 80

Apply the configuration to deploy the application and Service.
```
kubectl apply -f nginx-demo.yaml
```

Verify that the Pod's status is Running.

kubectl get pod | grep nginx-deployment-demo

Key configuration parameters

Pod readiness probes

Parameter	Role	Description
`startupProbe`	Confirm container initialization	Blocks liveness and readiness probes until it succeeds, preventing Kubelet from prematurely restarting slow-starting containers (such as Java applications).
`readinessProbe`	Gate traffic admission	When successful, adds the Pod's IP to the Service's Endpoints. When it fails, removes the Pod from the Endpoints to stop traffic.
`readinessGates`	Synchronize with the load balancer	Adds an extra readiness condition: the Pod is not marked fully `Ready` until Cloud Controller Manager registers it with the SLB backend server group. This prevents traffic from arriving before the load balancer is ready.

Graceful shutdown

Application graceful shutdown

Parameter Description

preStop A lifecycle hook executed immediately before container termination. Use this hook to trigger a graceful application shutdown, ensuring in-flight requests are finalized before the process stops. Define a custom hook tailored to your application logic—relying solely on a sleep command is unreliable and may result in an incomplete shutdown.

terminationGracePeriodSeconds How long Kubernetes waits for a Pod to shut down gracefully before forcibly killing it (SIGKILL). Default: 30 seconds. Set this value to exceed the combined duration of the preStop hook and the application's internal cleanup time. For example, if your preStop hook takes 30 seconds and your application needs 20 seconds to finish cleanup, set this to at least 50 seconds. In this sample, preStop runs for 30 seconds, so terminationGracePeriodSeconds is set to 60 seconds as a safe margin.

Parameter	Description
`preStop`	A lifecycle hook executed immediately before container termination. Use this hook to trigger a graceful application shutdown, ensuring in-flight requests are finalized before the process stops. Define a custom hook tailored to your application logic—relying solely on a `sleep` command is unreliable and may result in an incomplete shutdown.
`terminationGracePeriodSeconds`	How long Kubernetes waits for a Pod to shut down gracefully before forcibly killing it (SIGKILL). Default: 30 seconds. Set this value to exceed the combined duration of the `preStop` hook and the application's internal cleanup time. For example, if your `preStop` hook takes 30 seconds and your application needs 20 seconds to finish cleanup, set this to at least 50 seconds. In this sample, `preStop` runs for 30 seconds, so `terminationGracePeriodSeconds` is set to 60 seconds as a safe margin.

The grace period applies to the total time for both the preStop hook and container stop. If the sum of preStop execution time and application cleanup time exceeds terminationGracePeriodSeconds, Kubernetes forcibly kills the container before it finishes.

SLB connection draining

Annotation	Description
`service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain`	Set to `"on"` to enable connection draining on the SLB instance.
`service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout`	Draining timeout in seconds. Align this value with the time required to process in-flight requests in the `preStop` hook. Valid range: 10–900 seconds.

Rolling update strategy

Parameter	Default	Description
`strategy.type`	`RollingUpdate`	Incrementally replaces old Pods with new ones. New Pods are created and verified before the corresponding old Pods are terminated.
`maxUnavailable`	25%	Maximum number (or percentage) of Pods that can be unavailable during the update. Reduce this value to maintain higher availability during rollouts.
`maxSurge`	25%	Maximum number (or percentage) of extra Pods that can be created above the desired replica count during the update. Higher values accelerate the rollout but increase resource consumption.

Verify the zero-downtime rolling update

Connect to the cluster using kubectl.

Retrieve the external endpoint of the sample application.

export NGINX_ENDPOINT=$(kubectl get service nginx-demo-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}{":"}{.spec.ports[0].port}')
echo $NGINX_ENDPOINT

Install the stress testing tool hey. Run a load test with 200 concurrent connections and 50,000 total requests. Based on the sample resource configuration, a single replica typically completes this in about 1 minute.
```
hey -c 200 -n 50000 -disable-keepalive http://$NGINX_ENDPOINT
```

In a new terminal window, trigger a rolling restart of the Deployment.

kubectl rollout restart deployment nginx-deployment-demo

To watch the rollout status in real time:

kubectl rollout status deployment nginx-deployment-demo

The output is similar to:

Waiting for deployment "nginx-deployment-demo" rollout to finish: 0 of 1 updated replicas are available...
deployment "nginx-deployment-demo" successfully rolled out

After the load test completes, compare the results against the expected outputs. Without zero-downtime configuration — sample output showing traffic loss:

Configuration scenario	Expected output
Without zero-downtime configuration	Traffic loss observed.
With zero-downtime configuration	Zero downtime achieved: all 50,000 requests return HTTP 200.

Status code distribution:
  [200] 49644 responses

Error distribution:
  [320] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: connection refused
  [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: no route to host
  [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: operation timed out

With zero-downtime configuration — sample output showing zero traffic loss:

Status code distribution:
  [200] 50000 responses

FAQ

Why is my Pod stuck in the Running state but not ready? ()

The startup probe or readiness probe is most likely failing. Check the probe configuration on the Edit page for the target Workload: confirm that the health check path (such as /healthz) and port match your application's actual settings. If your application starts slowly, increase the Unhealthy Threshold to prevent premature failures.

To investigate further, check the Pod's Events and Logs. Select Show the log of the last container exit to review previous crash details. For manual verification, temporarily disable the probe, open a terminal inside the Pod, and use curl to confirm the health check endpoint responds correctly.

Container Service for Kubernetes:Zero-downtime application deployment