To update applications in a Container Service for Kubernetes (ACK) cluster without service interruptions, you can configure a Deployment with a readiness probe, readinessGates, a preStop hook, and Server Load Balancer (SLB) connection draining. This configuration ensures smooth traffic migration and continuous high availability.
How it works
To ensure high availability during service upgrades, you can use the Rolling Update strategy for stateless applications (Deployments). This strategy replaces Pods one by one to ensure continuous Pod availability for incoming traffic. The core process is divided into the following phases:
Startup phase: First, a new version (v2) of the Pod is created. Kubernetes waits for the new Pod to pass its readiness probe, confirming it can process requests. Until then, the Pod does not receive any traffic from the Service.
Traffic shifting phase: After
readinessGatesis enabled, a new Pod must first pass its readiness check. Its IP is then registered with the Endpoints of the associated Service and synchronized with the backend server group of the load balancer (SLB) to start receiving traffic. Subsequently, the system sends a termination signal to the old version (v1) Pod and removes its IP from the Endpoints so that it no longer receives new requests.For more information, see How readinessGates works.
Graceful shutdown phase: Before an old Pod is deleted, it executes a predefined preStop hook and uses the termination grace period (
terminationGracePeriodSeconds) to finish processing established connections, while the SLB performs connection draining for in-flight requests. This process ensures that all in-progress requests are completed, which achieves a zero-downtime rolling update.
Prerequisites
The cluster version must be 1.24 or later. For more information, see Upgrade a cluster.
The cloud-controller-manager component is v2.10.0 or later. For more information, see Cloud Controller Manager.
Deploy a sample application
The following example shows how to deploy a stateless NGINX application.
Console
On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click .
On the Deployments page, click Create from YAML. Copy the following content to the template editor and click Create.
In the pop-up window, find the target stateless application, click View , and verify that the Pod status is
Running.
kubectl
Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Save the following YAML content to a file named nginx-demo.yaml.
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment-demo spec: replicas: 1 # Set to 2 or more for production HA. Set to 1 for demonstration purposes. selector: matchLabels: app: nginx-demo # Rolling update strategy: ensures service availability during updates. # strategy: # type: RollingUpdate # Default strategy for Deployments. # rollingUpdate: # maxUnavailable: "25%" # Default. Max 25% of Pods can be unavailable during the update. # maxSurge: "25%" # Default. Max 25% extra Pods can be created above the desired replica count. template: metadata: labels: app: nginx-demo spec: # Pod-level graceful shutdown limit. Must be greater than the sum of preStop execution and app cleanup time. terminationGracePeriodSeconds: 60 readinessGates: - conditionType: service.readiness.alibabacloud.com/nginx-demo-service # Set the Readiness Gate for the nginx-demo-service Service. containers: - name: nginx image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6 ports: - containerPort: 80 resources: requests: cpu: 500m memory: 1Gi limits: cpu: 500m # --- Health check probes --- # startup probe: Ensures the application in the container has started. startupProbe: httpGet: path: / # Accessing the default NGINX root path indicates a successful startup. port: 80 # Allow sufficient time for startup. Total timeout = failureThreshold * periodSeconds. # Here: 30 * 10 = 300 seconds. failureThreshold: 30 periodSeconds: 10 # readiness probe: Determines whether the container is ready to receive traffic. readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 5 # Probing starts 5 seconds after the container starts. periodSeconds: 5 # Probe every 5 seconds. timeoutSeconds: 2 # Probe timeout duration. successThreshold: 1 # 1 success marks the Pod as ready. failureThreshold: 3 # 3 consecutive failures mark the Pod as not ready. # --- Pod graceful shutdown configuration --- lifecycle: preStop: exec: # For reliable graceful shutdown, define a custom hook that handles in-flight requests based on your application logic. # Using sleep alone is not recommended as it does not guarantee a clean exit. command: ["sh", "-c", "sleep 30 && /usr/sbin/nginx -s quit"] --- apiVersion: v1 kind: Service metadata: name: nginx-demo-service annotations: # Timeout for connection draining. This value should align with the application's preStop logic. Range: 10-900. service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: "30" # Enable connection draining. service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain: "on" spec: type: LoadBalancer selector: app: nginx-demo ports: - protocol: TCP port: 80Deploy the NGINX application and create the Service.
kubectl apply -f nginx-demo.yamlVerify that the target application Pod is
Running.kubectl get pod | grep nginx-deployment-demo
Pod readiness checks
startupProbe(startup probe): Checks if slow-starting applications, such as Java applications, have finished launching. Before the startup probe succeeds, the readiness and liveness probes are not executed. This prevents Kubelet from misjudging a slow start as a failure and restarting the container.readinessProbe(readiness probe): Determines if a container is ready to handle external requests. After the readiness check succeeds, the Pod's IP address is added to the Endpoints of all its associated Services. This indicates that the Pod can accept traffic.readinessGates: In addition to thereadinessProbe, a Pod is considered fully ready to accept traffic only after thereadinessGatesalso indicate a ready status.
Graceful shutdown
Application graceful shutdown
preStop: A hook command that runs before a container terminates. Set a command for application graceful shutdown to ensure that all in-flight requests are processed. This guarantees a non-disruptive service shutdown.Set a custom hook method as needed. If you only use the sleep command, the graceful shutdown process might not exit correctly.
terminationGracePeriodSeconds: The total time from when a Pod is marked for termination until it is forcibly killed with aSIGKILLsignal. The default is 30 seconds. This value must be long enough to cover the combined execution time of thepreStophook and the container's own cleanup time.
SLB connection draining
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drainannotation: Enables the connection draining feature for Server Load Balancer (SLB).service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain-timeout: The timeout period for connection draining, in seconds. We recommend that you set this value to be close to the time required to process in-flight requests in the preStop hook.
Rolling update strategy
strategy: The default update strategy for a Deployment is RollingUpdate. This strategy uses a progressive replacement method. It gradually creates new Pods and deletes the corresponding old Pods after the new ones are ready. This ensures service availability during the update process.maxUnavailable: The maximum number of unavailable Pod replicas during a rolling update. The default value is 25%. You can also specify an absolute number.maxSurge: The maximum number of Pods that can be created beyond the desired number of replicas during a rolling update. A higher value speeds up the update but consumes more resources. The default value is 25%. You can also specify an absolute number.
Verify the zero-downtime rolling deployment
Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Obtain the access URL of the sample application.
export NGINX_ENDPOINT=$(kubectl get service nginx-demo-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}{":"}{.spec.ports[0].port}') echo $NGINX_ENDPOINTInstall the load testing tool hey. Run a load test with 200 concurrent connections and 50,000 total requests. With the resource configuration in this example, a single replica should complete the test in about one minute.
hey -c 200 -n 50000 -disable-keepalive http://$NGINX_ENDPOINTWhile the test is running, open a new terminal window and immediately restart the Deployment.
kubectl rollout restart deployment nginx-deployment-demoThe following table describes the expected outputs.
Deployment scenario
Expected output
Without zero-downtime configuration
Traffic loss is observed.
Status code distribution: [200] 49644 responses Error distribution: [320] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: connection refused [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: no route to host [18] Get "http://114.215.XXX.XXX": dial tcp 114.215.XXX.XXX:80: connect: operation timed outWith zero-downtime configuration
Zero traffic loss is achieved.
Status code distribution: [200] 50000 responses
FAQ
Pod status: Running but not ready
Cause: This issue is usually caused by a failed startup or readiness probe.
Solution:
Readiness probe configuration: On the Edit page of the target Workloads, verify that the health check request path (for example, /healthz) and port match those that the application provides. If the application has a long startup time, increase the Unhealthy Threshold to avoid premature failures.
You can temporarily disable the Readiness, log on to the Pod's terminal or its host, and use a command, such as
curl, to verify that the health check method responds correctly.Troubleshoot application issues: Investigate the issue by checking the pod's Events and Logs. Select Show the log of the last container exit.