Implement graceful start and shutdown of microservice applications by using MSE to prevent traffic loss - Microservices Engine

If an application needs to process a large number of requests at the same time, new versions of the application are released during off-peak hours in most cases. This prevents traffic loss during the release, but increases O&M costs due to unexpected situations. To address these issues, Microservices Engine (MSE) provides technologies to support the graceful start and shutdown of applications during the release of application versions. For the graceful shutdown of applications, MSE provides adaptive waits and proactive notifications. This ensures that the applications are shut down only after all requests are processed and prevents unexpected service downtime. For the graceful start of applications, MSE uses readiness probe and accurately aligns the stages of lifecycle management and release of microservice applications. This ensures that applications of new versions stably run.

Prerequisites

A Container Service for Kubernetes (ACK) cluster is created. For more information, see Create an ACK managed cluster.
Microservices Governance is activated. For more information, see Activate Microservices Governance.

Demo architecture

In this example, the application architecture consists of a Zuul gateway and backend Spring Cloud applications. Backend service calls involve three applications: a shopping cart (Application A), a transaction center (Application B), and an inventory center (Application C). An MSE Nacos instance is provided for service registration and discovery of the applications.

The spring-cloud-zuul application initiates service calls to both the canary version and base version of the spring-cloud-a application at 100 queries per second (QPS).

Deploy demo applications and enable Microservices Governance for the applications

Important

In this demo, a Cron Horizontal Pod Autoscaler (CronHPA) is used. Therefore, you must install the ack-kubernetes-cronhpa-controller component in the cluster before you deploy demo applications. For more information, see the "Install the CronHPA controller" section in CronHPA.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage and choose Workloads > Deployments in the left-side navigation pane.
On the Deployments page, click Create from YAML.

Select Custom from the Sample Template drop-down list, enter the following YAML code in Template, and then click Create.

The demo file in this example is named mse-demo.yaml. In this example, a Zuul gateway and Applications A, B, and C are deployed. A base version and a canary version are deployed for Application A and Application B. For Application B, the graceful shutdown feature is disabled for the base version and is enabled for the canary version. For Application C, the service prefetching feature is enabled and the prefetching duration is 120 seconds.

Show YAML code

# Nacos Server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nacos-server
  name: nacos-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nacos-server
  template:
    metadata:
      labels:
        app: nacos-server
        msePilotCreateAppName: nacos-server
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: MODE
          value: standalone
        image: registry.cn-shanghai.aliyuncs.com/yizhan/nacos-server:latest
        imagePullPolicy: Always
        name: nacos-server
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
      dnsPolicy: ClusterFirst
      restartPolicy: Always

# The configuration of the nacos-server service.
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-server
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: ClusterIP

# The spring-cloud-zuul application.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-zuul
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spring-cloud-zuul
  template:
    metadata:
      labels:
        app: spring-cloud-zuul
        msePilotCreateAppName: spring-cloud-zuul
        msePilotAutoEnable: "on"
    spec:
      containers:
        - env:
            - name: JAVA_HOME
              value: /usr/lib/jvm/java-1.8-openjdk/jre
            - name: LANG
              value: C.UTF-8
          image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-zuul:1.0.1
          imagePullPolicy: Always
          name: spring-cloud-zuul
          ports:
            - containerPort: 20000

# Enable end-to-end pass-through at the machine level for the base version of Application A. 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-a
  name: spring-cloud-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata: 
      labels:
        app: spring-cloud-a
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30

# Enable end-to-end pass-through at the machine level for the canary version of Application A. 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-a-gray
  name: spring-cloud-a-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a-gray
  strategy:
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-a-gray
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a-gray
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30

# Disable graceful shutdown for the base version of Application B. 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-b
  name: spring-cloud-b
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b
  strategy:
  template:
    metadata:
      labels:
        app: spring-cloud-b
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: micro.service.shutdown.server.enable
          value: "false"
        - name: profiler.micro.service.http.server.enable
          value: "false"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30

# By default, graceful shutdown is enabled for the canary version of Application B. 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-b-gray
  name: spring-cloud-b-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b-gray
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-b-gray
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b-gray
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    wget http://127.0.0.1:54199/offline 2>/tmp/null;sleep
                    30;exit 0
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30

# The base version of Application C.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-c
  name: spring-cloud-c
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      labels:
        app: spring-cloud-c
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: "on"
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-c
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 10
          periodSeconds: 30

# The HPA configuration.
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-b
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-b
   jobs:
   - name: "scale-down"
     schedule: "0 0/5 * * * *"
     targetSize: 1
   - name: "scale-up"
     schedule: "10 0/5 * * * *"
     targetSize: 2
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-b-gray
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-b-gray
   jobs:
   - name: "scale-down"
     schedule: "0 0/5 * * * *"
     targetSize: 1
   - name: "scale-up"
     schedule: "10 0/5 * * * *"
     targetSize: 2
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-c
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-c
   jobs:
   - name: "scale-down"
     schedule: "0 2/5 * * * *"
     targetSize: 1
   - name: "scale-up"
     schedule: "10 2/5 * * * *"
     targetSize: 2


# Create a Server Load Balancer (SLB) instance for the spring-cloud-zuul application.
---
apiVersion: v1
kind: Service
metadata:
  name: zuul-slb
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 20000
  selector:
    app: spring-cloud-zuul
  type: ClusterIP

# Use Application A to expose Kubernetes services.
---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-base
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a

---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-gray
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a-gray

# The configuration of the nacos-slb service.
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-slb
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: LoadBalancer

Enable Microservices Governance for the applications. For more information, see Enable Microservices Governance for microservice applications in an ACK cluster.

View visibility data

A CronHPA is enabled for both spring-cloud-b and spring-cloud-b-gray to simulate scaling that is scheduled at an interval of 5 minutes. You can click the application name and then click the Pod Scaling tab to view the relevant information.

spring-cloud-b
spring-cloud-b-gray

Log on to the MSE console, and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. On the page that appears, click the resource card of the spring-cloud-a application.
On the Application overview page, view the relevant visibility data of the application.
The following information is obtained from the data:
- No request errors are returned for the spring-cloud-a-gray version during pod scaling. In this case, no traffic loss occurs.
- The graceful shutdown feature is disabled for the spring-cloud-a version. Errors are returned for 20 requests that are sent from spring-cloud-a to spring-cloud-b during pod scaling. In this case, traffic loss occurs.

Enable graceful start

Enable a CronHPA for spring-cloud-c to simulate application startup. Perform scaling at an interval of 5 minutes. Remain one node available at the 60th second and remain two nodes available at the 70th second.

Log on to the MSE console, and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. On the page that appears, click the resource card of the spring-cloud-c application.
In the left-side navigation pane, click Traffic management. On the Graceful Start/Shutdown tab, turn on Graceful Start. In the Prompt message that appears, click OK. The default prefetching duration is 120 seconds.
Traffic slowly increases with time after the application for which the service prefetching feature is enabled restarts. In slow application startup scenarios in which resources such as connection pools and caches must be provided in advance, you can enable the service prefetching feature for the applications. This way, cache resources are created in order, the applications are started in a secure manner, and no traffic loss occurs.