All Products
Search
Document Center

Microservices Engine:Implement graceful start and shutdown for microservice applications

Last Updated:Mar 11, 2026

Releasing a new version of a microservice risks traffic loss at two stages: a terminating pod drops in-flight requests, and a freshly started pod receives full traffic before its connection pools and caches are ready.

Microservices Engine (MSE) solves both problems:

  • Graceful shutdown -- Drains in-flight requests before a pod terminates by deregistering the instance, waiting adaptively, and proactively notifying upstream callers. The pod shuts down only after all pending requests complete.

  • Graceful start (service prefetching) -- Gradually ramps traffic to a new pod over a configurable prefetching duration (default: 120 seconds), giving the application time to warm up before receiving full production load.

This tutorial deploys a demo Spring Cloud application on Container Service for Kubernetes (ACK) and shows how these features prevent traffic loss during pod scaling.

How it works

Graceful shutdown sequence

When a pod scales down, MSE runs the following sequence:

StepActionPurpose
1Deregister the instance from the Nacos service registryUpstream callers stop routing new requests to the pod
2Wait adaptively for in-flight requestsMSE monitors active request counts and adjusts the wait period, rather than using a fixed timeout
3Notify upstream callers proactivelyEliminates the delay caused by registry polling intervals -- callers remove the instance from their local service lists immediately
4Allow the pod to terminateOnly after all requests have been processed

Without graceful shutdown, Kubernetes sends a SIGTERM and the pod begins terminating immediately while upstream callers may still route requests to it.

Graceful start sequence

When a new pod starts and registers with the service registry, MSE controls the traffic shift:

StepAction
1The new pod registers with Nacos and passes its readiness probe
2MSE assigns a low initial traffic weight to the pod
3Over the configured prefetching duration (default: 120 seconds), MSE gradually increases the traffic weight until the pod receives its full share

This staged ramp-up prevents cold-start failures in applications that initialize connection pools, thread pools, and local caches at startup. Without prefetching, the new pod immediately receives its proportional share of traffic and risks becoming overwhelmed.

Prerequisites

Before you begin, make sure that you have:

Demo architecture

The demo deploys the following components:

ComponentRoleDetails
Zuul gatewayAPI gatewayRoutes external traffic to backend services
Application A (spring-cloud-a)Shopping cartBase version + canary version
Application B (spring-cloud-b)Transaction centerBase version (graceful shutdown disabled) + canary version (graceful shutdown enabled)
Application C (spring-cloud-c)Inventory centerService prefetching enabled, prefetching duration = 120 seconds
Nacos serverService registryHandles service registration and discovery for all applications

Application B is deployed with graceful shutdown disabled on the base version and enabled on the canary version. This creates a side-by-side comparison: during pod scaling, the base version loses traffic while the canary version handles scaling without errors.

Deploy the demo applications

Important

This demo uses Cron Horizontal Pod Autoscaler (CronHPA) to simulate scheduled scaling. Install the ack-kubernetes-cronhpa-controller component in your cluster before proceeding. For instructions, see the "Step 1: Install the CronHPA component" section in Use CronHPA for scheduled horizontal scaling.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the target cluster and click its name. In the left-side pane, choose Workloads > Deployments.

  3. On the Deployments page, click Create from YAML.

  4. Select Custom from the Sample Template drop-down list, paste the following YAML into Template, and then click Create. This YAML (mse-demo.yaml) deploys the Nacos server, the Zuul gateway, Applications A, B, and C (each with base and canary versions where applicable), CronHPA resources for scheduled scaling, and the required Kubernetes Services. Configuration highlights:

    ConfigurationSettingEffect
    Application B base versionmicro.service.shutdown.server.enable=falseExplicitly disables graceful shutdown
    Application B canary versionMSE default settingsGraceful shutdown enabled automatically
    Application CService prefetching enabled120-second prefetching duration
    CronHPAScales between 1 and 2 replicas every 5 minutesSimulates pod scaling events for Applications B and C

    Show YAML code

       # Nacos Server
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: nacos-server
         name: nacos-server
       spec:
         replicas: 1
         selector:
           matchLabels:
             app: nacos-server
         template:
           metadata:
             labels:
               app: nacos-server
               msePilotCreateAppName: nacos-server
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: MODE
                 value: standalone
               image: registry.cn-shanghai.aliyuncs.com/yizhan/nacos-server:latest
               imagePullPolicy: Always
               name: nacos-server
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
             dnsPolicy: ClusterFirst
             restartPolicy: Always
    
       # Nacos Server Service
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: nacos-server
       spec:
         ports:
         - port: 8848
           protocol: TCP
           targetPort: 8848
         selector:
           app: nacos-server
         type: ClusterIP
    
       # Zuul Gateway
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         name: spring-cloud-zuul
       spec:
         replicas: 1
         selector:
           matchLabels:
             app: spring-cloud-zuul
         template:
           metadata:
             labels:
               app: spring-cloud-zuul
               msePilotCreateAppName: spring-cloud-zuul
               msePilotAutoEnable: "on"
           spec:
             containers:
               - env:
                   - name: JAVA_HOME
                     value: /usr/lib/jvm/java-1.8-openjdk/jre
                   - name: LANG
                     value: C.UTF-8
                 image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-zuul:1.0.1
                 imagePullPolicy: Always
                 name: spring-cloud-zuul
                 ports:
                   - containerPort: 20000
    
       # Application A - Base version (end-to-end pass-through at the machine level)
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: spring-cloud-a
         name: spring-cloud-a
       spec:
         replicas: 2
         selector:
           matchLabels:
             app: spring-cloud-a
         template:
           metadata:
             labels:
               app: spring-cloud-a
               msePilotCreateAppName: spring-cloud-a
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: LANG
                 value: C.UTF-8
               - name: JAVA_HOME
                 value: /usr/lib/jvm/java-1.8-openjdk/jre
               - name: profiler.micro.service.tag.trace.enable
                 value: "true"
               image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
               imagePullPolicy: Always
               name: spring-cloud-a
               ports:
               - containerPort: 20001
                 protocol: TCP
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
               livenessProbe:
                 tcpSocket:
                   port: 20001
                 initialDelaySeconds: 10
                 periodSeconds: 30
    
       # Application A - Canary version (end-to-end pass-through at the machine level)
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: spring-cloud-a-gray
         name: spring-cloud-a-gray
       spec:
         replicas: 2
         selector:
           matchLabels:
             app: spring-cloud-a-gray
         strategy:
         template:
           metadata:
             labels:
               alicloud.service.tag: gray
               app: spring-cloud-a-gray
               msePilotCreateAppName: spring-cloud-a
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: LANG
                 value: C.UTF-8
               - name: JAVA_HOME
                 value: /usr/lib/jvm/java-1.8-openjdk/jre
               - name: profiler.micro.service.tag.trace.enable
                 value: "true"
               image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
               imagePullPolicy: Always
               name: spring-cloud-a-gray
               ports:
               - containerPort: 20001
                 protocol: TCP
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
               livenessProbe:
                 tcpSocket:
                   port: 20001
                 initialDelaySeconds: 10
                 periodSeconds: 30
    
       # Application B - Base version (graceful shutdown DISABLED)
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: spring-cloud-b
         name: spring-cloud-b
       spec:
         replicas: 2
         selector:
           matchLabels:
             app: spring-cloud-b
         strategy:
         template:
           metadata:
             labels:
               app: spring-cloud-b
               msePilotCreateAppName: spring-cloud-b
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: LANG
                 value: C.UTF-8
               - name: JAVA_HOME
                 value: /usr/lib/jvm/java-1.8-openjdk/jre
               - name: micro.service.shutdown.server.enable
                 value: "false"
               - name: profiler.micro.service.http.server.enable
                 value: "false"
               image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
               imagePullPolicy: Always
               name: spring-cloud-b
               ports:
               - containerPort: 8080
                 protocol: TCP
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
               livenessProbe:
                 tcpSocket:
                   port: 20002
                 initialDelaySeconds: 10
                 periodSeconds: 30
    
       # Application B - Canary version (graceful shutdown ENABLED by default)
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: spring-cloud-b-gray
         name: spring-cloud-b-gray
       spec:
         replicas: 2
         selector:
           matchLabels:
             app: spring-cloud-b-gray
         template:
           metadata:
             labels:
               alicloud.service.tag: gray
               app: spring-cloud-b-gray
               msePilotCreateAppName: spring-cloud-b
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: LANG
                 value: C.UTF-8
               - name: JAVA_HOME
                 value: /usr/lib/jvm/java-1.8-openjdk/jre
               image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
               imagePullPolicy: Always
               name: spring-cloud-b-gray
               ports:
               - containerPort: 8080
                 protocol: TCP
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
               livenessProbe:
                 tcpSocket:
                   port: 20002
                 initialDelaySeconds: 10
                 periodSeconds: 30
    
       # Application C
       ---
       apiVersion: apps/v1
       kind: Deployment
       metadata:
         labels:
           app: spring-cloud-c
         name: spring-cloud-c
       spec:
         replicas: 2
         selector:
           matchLabels:
             app: spring-cloud-c
         template:
           metadata:
             labels:
               app: spring-cloud-c
               msePilotCreateAppName: spring-cloud-c
               msePilotAutoEnable: "on"
           spec:
             containers:
             - env:
               - name: LANG
                 value: C.UTF-8
               - name: JAVA_HOME
                 value: /usr/lib/jvm/java-1.8-openjdk/jre
               image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT
               imagePullPolicy: Always
               name: spring-cloud-c
               ports:
               - containerPort: 8080
                 protocol: TCP
               resources:
                 requests:
                   cpu: 250m
                   memory: 512Mi
               livenessProbe:
                 tcpSocket:
                   port: 20003
                 initialDelaySeconds: 10
                 periodSeconds: 30
    
       # CronHPA - Application B base version
       ---
       apiVersion: autoscaling.alibabacloud.com/v1beta1
       kind: CronHorizontalPodAutoscaler
       metadata:
         labels:
           controller-tools.k8s.io: "1.0"
         name: spring-cloud-b
       spec:
          scaleTargetRef:
             apiVersion: apps/v1beta2
             kind: Deployment
             name: spring-cloud-b
          jobs:
          - name: "scale-down"
            schedule: "0 0/5 * * * *"
            targetSize: 1
          - name: "scale-up"
            schedule: "10 0/5 * * * *"
            targetSize: 2
    
       # CronHPA - Application B canary version
       ---
       apiVersion: autoscaling.alibabacloud.com/v1beta1
       kind: CronHorizontalPodAutoscaler
       metadata:
         labels:
           controller-tools.k8s.io: "1.0"
         name: spring-cloud-b-gray
       spec:
          scaleTargetRef:
             apiVersion: apps/v1beta2
             kind: Deployment
             name: spring-cloud-b-gray
          jobs:
          - name: "scale-down"
            schedule: "0 0/5 * * * *"
            targetSize: 1
          - name: "scale-up"
            schedule: "10 0/5 * * * *"
            targetSize: 2
    
       # CronHPA - Application C
       ---
       apiVersion: autoscaling.alibabacloud.com/v1beta1
       kind: CronHorizontalPodAutoscaler
       metadata:
         labels:
           controller-tools.k8s.io: "1.0"
         name: spring-cloud-c
       spec:
          scaleTargetRef:
             apiVersion: apps/v1beta2
             kind: Deployment
             name: spring-cloud-c
          jobs:
          - name: "scale-down"
            schedule: "0 2/5 * * * *"
            targetSize: 1
          - name: "scale-up"
            schedule: "10 2/5 * * * *"
            targetSize: 2
    
       # Zuul Gateway Service
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: zuul-slb
       spec:
         ports:
           - port: 80
             protocol: TCP
             targetPort: 20000
         selector:
           app: spring-cloud-zuul
         type: ClusterIP
    
       # Application A - Base version Service
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: spring-cloud-a-base
       spec:
         ports:
           - name: http
             port: 20001
             protocol: TCP
             targetPort: 20001
         selector:
           app: spring-cloud-a
    
       # Application A - Canary version Service
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: spring-cloud-a-gray
       spec:
         ports:
           - name: http
             port: 20001
             protocol: TCP
             targetPort: 20001
         selector:
           app: spring-cloud-a-gray
    
       # Nacos external Service
       ---
       apiVersion: v1
       kind: Service
       metadata:
         name: nacos-slb
       spec:
         ports:
         - port: 8848
           protocol: TCP
           targetPort: 8848
         selector:
           app: nacos-server
         type: LoadBalancer
  5. Enable Microservices Governance for the applications. For more information, see Enable Microservices Governance for Java microservice applications in an ACK or ACS cluster.

Verify graceful shutdown

After deployment, CronHPA scales both spring-cloud-b (graceful shutdown disabled) and spring-cloud-b-gray (graceful shutdown enabled) between 1 and 2 replicas every 5 minutes. This creates repeated scale-down events that let you compare traffic loss behavior.

View pod scaling activity

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Governance > Application Governance. Click the resource card of the spring-cloud-a application.

  3. On the Application Overview page, click the Pod Scaling tab to view scaling events and request error data.

Compare results

The pod scaling data shows a clear difference between the two versions:

  • spring-cloud-b (graceful shutdown disabled):

    spring-cloud-b pod scaling

  • spring-cloud-b-gray (graceful shutdown enabled):

    spring-cloud-b-gray pod scaling

The data reveals the following:

  • No request errors are returned for the spring-cloud-a-gray version during pod scaling. No traffic loss occurs.

  • The graceful shutdown feature is disabled for the spring-cloud-a version. Errors are returned for 20 requests sent from spring-cloud-a to spring-cloud-b during pod scaling. Traffic loss occurs.

Enable and verify graceful start

CronHPA also scales spring-cloud-c between 1 and 2 replicas every 5 minutes, remaining one node available at the 60th second and two nodes available at the 70th second. To enable service prefetching for this application:

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Governance > Application Governance. Click the resource card of the spring-cloud-c application.

  3. In the left-side navigation pane, click Traffic management. On the Graceful Start/Shutdown tab, turn on Graceful Start.

  4. In the Prompt message dialog, click OK. The default prefetching duration is 120 seconds.

After you enable graceful start, traffic to the newly started pod increases gradually over the 120-second prefetching duration rather than spiking immediately:

spring-cloud-c graceful start

This staged ramp-up is useful for applications with slow startup characteristics -- those that initialize connection pools, populate caches, or load large datasets into memory. Gradual traffic distribution prevents the cold-start pod from being overwhelmed and avoids request failures during warm-up.

What's next