Releasing a new version of a microservice risks traffic loss at two stages: a terminating pod drops in-flight requests, and a freshly started pod receives full traffic before its connection pools and caches are ready.
Microservices Engine (MSE) solves both problems:
Graceful shutdown -- Drains in-flight requests before a pod terminates by deregistering the instance, waiting adaptively, and proactively notifying upstream callers. The pod shuts down only after all pending requests complete.
Graceful start (service prefetching) -- Gradually ramps traffic to a new pod over a configurable prefetching duration (default: 120 seconds), giving the application time to warm up before receiving full production load.
This tutorial deploys a demo Spring Cloud application on Container Service for Kubernetes (ACK) and shows how these features prevent traffic loss during pod scaling.
How it works
Graceful shutdown sequence
When a pod scales down, MSE runs the following sequence:
| Step | Action | Purpose |
|---|---|---|
| 1 | Deregister the instance from the Nacos service registry | Upstream callers stop routing new requests to the pod |
| 2 | Wait adaptively for in-flight requests | MSE monitors active request counts and adjusts the wait period, rather than using a fixed timeout |
| 3 | Notify upstream callers proactively | Eliminates the delay caused by registry polling intervals -- callers remove the instance from their local service lists immediately |
| 4 | Allow the pod to terminate | Only after all requests have been processed |
Without graceful shutdown, Kubernetes sends a SIGTERM and the pod begins terminating immediately while upstream callers may still route requests to it.
Graceful start sequence
When a new pod starts and registers with the service registry, MSE controls the traffic shift:
| Step | Action |
|---|---|
| 1 | The new pod registers with Nacos and passes its readiness probe |
| 2 | MSE assigns a low initial traffic weight to the pod |
| 3 | Over the configured prefetching duration (default: 120 seconds), MSE gradually increases the traffic weight until the pod receives its full share |
This staged ramp-up prevents cold-start failures in applications that initialize connection pools, thread pools, and local caches at startup. Without prefetching, the new pod immediately receives its proportional share of traffic and risks becoming overwhelmed.
Prerequisites
Before you begin, make sure that you have:
An ACK managed cluster. For more information, see Create an ACK managed cluster
Microservices Governance activated. For more information, see Activate Microservices Governance
Demo architecture
The demo deploys the following components:
| Component | Role | Details |
|---|---|---|
| Zuul gateway | API gateway | Routes external traffic to backend services |
| Application A (spring-cloud-a) | Shopping cart | Base version + canary version |
| Application B (spring-cloud-b) | Transaction center | Base version (graceful shutdown disabled) + canary version (graceful shutdown enabled) |
| Application C (spring-cloud-c) | Inventory center | Service prefetching enabled, prefetching duration = 120 seconds |
| Nacos server | Service registry | Handles service registration and discovery for all applications |
Application B is deployed with graceful shutdown disabled on the base version and enabled on the canary version. This creates a side-by-side comparison: during pod scaling, the base version loses traffic while the canary version handles scaling without errors.
Deploy the demo applications
This demo uses Cron Horizontal Pod Autoscaler (CronHPA) to simulate scheduled scaling. Install the ack-kubernetes-cronhpa-controller component in your cluster before proceeding. For instructions, see the "Step 1: Install the CronHPA component" section in Use CronHPA for scheduled horizontal scaling.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the target cluster and click its name. In the left-side pane, choose Workloads > Deployments.
On the Deployments page, click Create from YAML.
Select Custom from the Sample Template drop-down list, paste the following YAML into Template, and then click Create. This YAML (mse-demo.yaml) deploys the Nacos server, the Zuul gateway, Applications A, B, and C (each with base and canary versions where applicable), CronHPA resources for scheduled scaling, and the required Kubernetes Services. Configuration highlights:
Configuration Setting Effect Application B base version micro.service.shutdown.server.enable=falseExplicitly disables graceful shutdown Application B canary version MSE default settings Graceful shutdown enabled automatically Application C Service prefetching enabled 120-second prefetching duration CronHPA Scales between 1 and 2 replicas every 5 minutes Simulates pod scaling events for Applications B and C Enable Microservices Governance for the applications. For more information, see Enable Microservices Governance for Java microservice applications in an ACK or ACS cluster.
Verify graceful shutdown
After deployment, CronHPA scales both spring-cloud-b (graceful shutdown disabled) and spring-cloud-b-gray (graceful shutdown enabled) between 1 and 2 replicas every 5 minutes. This creates repeated scale-down events that let you compare traffic loss behavior.
View pod scaling activity
Log on to the MSE console and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. Click the resource card of the spring-cloud-a application.
On the Application Overview page, click the Pod Scaling tab to view scaling events and request error data.
Compare results
The pod scaling data shows a clear difference between the two versions:
spring-cloud-b (graceful shutdown disabled):

spring-cloud-b-gray (graceful shutdown enabled):

The data reveals the following:
No request errors are returned for the spring-cloud-a-gray version during pod scaling. No traffic loss occurs.
The graceful shutdown feature is disabled for the spring-cloud-a version. Errors are returned for 20 requests sent from spring-cloud-a to spring-cloud-b during pod scaling. Traffic loss occurs.
Enable and verify graceful start
CronHPA also scales spring-cloud-c between 1 and 2 replicas every 5 minutes, remaining one node available at the 60th second and two nodes available at the 70th second. To enable service prefetching for this application:
Log on to the MSE console and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. Click the resource card of the spring-cloud-c application.
In the left-side navigation pane, click Traffic management. On the Graceful Start/Shutdown tab, turn on Graceful Start.
In the Prompt message dialog, click OK. The default prefetching duration is 120 seconds.
After you enable graceful start, traffic to the newly started pod increases gradually over the 120-second prefetching duration rather than spiking immediately:

This staged ramp-up is useful for applications with slow startup characteristics -- those that initialize connection pools, populate caches, or load large datasets into memory. Gradual traffic distribution prevents the cold-start pod from being overwhelmed and avoids request failures during warm-up.