Alibaba Cloud Service Mesh FAQ (2): Use ASM to Implement Service Slow-Start Mode to Support the Warm-Up

Part 2 of this series introduces the ASM warm-up feature and explains implementation step by step.

By Xining Wang

When the slow-start warm-up capability is not enabled, the requester sends a certain percentage of traffic to the pod whenever a new target pod joins. Progressive traffic increase for new pods is not supported. This may be undesirable for services that require warm-up time to provide the full load. It may result in request timeouts, data loss, and a degraded user experience.

As a practical example, the problem is manifested in JVM-based Web applications that use horizontal pod autoscaling. When the service is started, it will be flooded with a large number of requests, which will cause the application timeout when warming up. As a result, each time the service is extended, data is lost, or the response time for these requests increases. The basic principle of warm-up is to gradually connect the newly started machines to traffic.

An Introduction to the ASM Warm-Up Feature

ASM starts to support slow-start warm-up in version 1.14. This is a simple introduction:

  • Slow-start mode (also known as progressive traffic increase) helps overcome the preceding problems. Users can configure a period for their service so that whenever a service instance starts, the requester sends a portion of the request load to the instance and gradually increases the request volume over the configured period. When the slow-start window duration is reached, it exits the slow-start mode.
  • In slow-start mode, newly added target service pods can be prevented from being overwhelmed by a large number of requests. These new target services can be warmed up before accepting requests from their load balancing strategy, based on the specified acceleration period.
  • Slow-start is useful for applications that rely on caching and require a warm-up period to respond to requests with optimal performance.

In ASM, you only need to configure trafficPolicy/loadBalancer in the DestinationRule corresponding to the service.


  • The type of loadBalancer is limited to ROUND_ROBIN and LEAST_REQUEST Server Load Balancer.
  • WarmupDurationSecs indicates the warm-up duration of the Service. If it is set, a newly created service endpoint remains in warm-up mode from its creation time during this window, and Istio gradually increases the traffic to the endpoint instead of sending proportional traffic.


apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
  name: mocka
  host: mocka
      simple: ROUND_ROBIN
      warmupDurationSecs: 100s
  - name: v1
      version: v1
  - name: v2
      version: v2


Step 1: Configure Routing Rules and Confirm that the Application Runs

In this example, the number of deployment replicas of v2 and v3 in the review is reduced to 0 for the convenience of demonstration. The number of deployment replicas of the reviews-v2 and reviews-v3 in the Kubernetes cluster is reduced to 0.

The following rule configurations are created in the ASM.

  • Configurations used to define access ingress:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
  name: bookinfo-gateway
    istio: ingressgateway # use istio default controller
  - port:
      number: 80
      name: http
      protocol: HTTP
    - "*"
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
  name: bookinfo
  - "*"
  - bookinfo-gateway
  - match:
    - uri:
        exact: /productpage
    - uri:
        prefix: /static
    - uri:
        exact: /login
    - uri:
        exact: /logout
    - uri:
        prefix: /api/v1/products
    - destination:
        host: productpage
          number: 9080
  • Define the configurations of the reviews service:
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
  name: reviews
  host: reviews
    - labels:
        version: v1
      name: v1
    - labels:
        version: v2
      name: v2
      simple: ROUND_ROBIN
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
  name: reviews
    - reviews
    - route:
        - destination:
            host: reviews

After the mesh topology is enabled, requests are sent by continuously accessing the ingress gateway address. For example, use the hey command to send a stress testing request for ten seconds.

The hey command download and installation reference link: https://github.com/rakyll/hey

hey -z 10s http://${gateway address}/productpage

The following figure shows the normal call topology. Please visit this link for more information.


Step 2: View the Pod Startup Based on the Observability Data

Select the target ASM instance in the ASM console. Choose Prometheus Monitoring under Observability Management on the left-side navigation pane. On the right side of the page, select Mesh Service-level Monitoring and select the reviews service.

If the warm-up feature is not enabled, scale out the number of replicas for a Deployment named reviews-v2 from 0 to 1 in the Kubernetes cluster. Run the hey command to send a stress testing request for the ingress gateway. Pay attention to the dashboard data monitored by Prometheus. It takes about 15s for the reviews-v2 Pod to receive the balanced request (the specific time depends on the stress testing environment).


Then, in the Kubernetes cluster, scale down the number of replicas to 0 for the Deployment named reviews-v2. After 1 minute, the warm-up feature is enabled.

Step 3: Enable the Warm-Up Feature

Update the DestinationRule named reviews and add the warmupDurationSecs value to 120s, which specifies the warm-up duration to 120s.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
  name: reviews
  host: reviews
    - labels:
        version: v1
      name: v1
    - labels:
        version: v2
      name: v2
      simple: ROUND_ROBIN
      warmupDurationSecs: 120s

Step 4: View the Warm-Up Effect through Observability Data

Similarly, if the warm-up feature is enabled, scale out the number of replicas for a Deployment named reviews-v2 in the Kubernetes cluster from 0 to 1. Run the hey command to send a stress testing request for the ingress gateway. Pay attention to the dashboard data monitored by Prometheus. It takes about 45s for the reviews-v2 Pod to receive the balanced request (the specific time depends on the stress testing environment).


When the warm-up feature is enabled, each time a service instance is started, the requester sends a portion of the request load to the instance and gradually increases the request volume within the configured period. When the slow-start window duration is reached, it exits the slow-start mode.

After the warm-up function is enabled, it takes about 150s to evenly distribute the traffic to v1 and v2.


Step 5: View the Warm-Up Effect by Proxy Counter

Before you view the service warm-up effect, perform the following two steps:

1.  In order to clear the existing data of the counter, run the following command to view the new count value:

## Sidecar Proxy Counters in Pods of reviews-v1 Version
kubectl exec reviews-v1-55b668fc65-jhxlj  -c istio-proxy -- curl localhost:15000/reset_counters -X POST

Run the following command to check how many times the counter statistics in the sidecar proxy are processed. After the counter statistics are cleared, it should be 0.

kubectl exec reviews-v1-55b668fc65-jhxlj  -c istio-proxy -- curl localhost:15000/stats |grep inbound |grep upstream_rq_200

You can see a similar result output.

cluster.inbound|8000||.upstream_rq_200: 0

2.  Adjust the number of replicas of the v2 version of the review to 0, which means reducing the number of replicas of the Deployment named reviews-v2 to 0 in the Kubernetes cluster.

Then, re-scale the number of replicas of the Deployment named reviews-v2 in the Kubernetes cluster from 0 to 1 and use the hey command to send a stress testing request for the ingress gateway. Run the hey command immediately to send a stress test request for the 20s.

hey -z 20s http://${gateway address}/productpage
Status code distribution:
  [200]    3260 responses

It indicates that 3260 requests were sent by the hey command within the 20s.

Then compare the reviews-v1 and reviews-v2 pods. You can see the reviews-v2 POD instance only receives a small proportion of requests.

Run the following command to check how many times the counter statistics in the sidecar proxy in the reviews-v1 are successfully processed.

kubectl exec reviews-v1-55b668fc65-jhxlj  -c istio-proxy -- curl localhost:15000/stats |grep inbound |grep upstream_rq_200
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20873    0 20873    0     0  19.9M      0 --:--:-- --:--:-- --:--:-- 19.9M
cluster.inbound|9080||.external.upstream_rq_200: 2927
cluster.inbound|9080||.upstream_rq_200: 2927

Run the following command to check how many times the counter statistics in the sidecar proxy in the reviews-v2 are successfully processed.

kubectl exec reviews-v2-858f99c99-j6jww   -c istio-proxy -- curl localhost:15000/stats |grep inbound |grep upstream_rq_200
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30149    0 30149    0     0  28.7M      0 --:--:-- --:--:-- --:--:-- 28.7M
cluster.inbound|9080||.external.upstream_rq_200: 333
cluster.inbound|9080||.upstream_rq_200: 333

According to the statistics, in the first 20s, the reviews-v2 Pod processed a small number of requests (about 10%) during the 120s slow-start window.

Test it after the slow-start window period and clear the reviews-v1 and reviews-v2 counters again. Run the following command:

## Sidecar Proxy Counters in Pods of reviews-v1 Version
kubectl exec reviews-v1-55b668fc65-jhxlj  -c istio-proxy -- curl localhost:15000/reset_counters -X POST
## Sidecar Proxy Counters in Pods of reviews-v2 Version
kubectl exec reviews-v2-858f99c99-j6jww  -c istio-proxy -- curl localhost:15000/reset_counters -X POST

Then, run the hey command to send the stress test traffic for 20s. Then, run the following command again to check how many times the counter statistics of the reviews-v1 and the sidecar proxy in the reviews-v2 are successfully processed.

kubectl exec reviews-v1-55b668fc65-jhxlj  -c istio-proxy -- curl localhost:15000/stats |grep inbound |grep upstream_rq_200
cluster.inbound|9080||.external.upstream_rq_200: 1600
cluster.inbound|9080||.upstream_rq_200: 1600
kubectl exec reviews-v2-858f99c99-j6jww   -c istio-proxy -- curl localhost:15000/stats |grep inbound |grep upstream_rq_200
cluster.inbound|9080||.external.upstream_rq_200: 1600
cluster.inbound|9080||.upstream_rq_200: 1600

You can see that outside of the slow-start window time, reviews-v1 and reviews-v2 accepted and processed an equal number of requests.

