Use Mixerless Telemetry to automatically scale the pods of an application - Alibaba Cloud Service Mesh

The Mixerless Telemetry technology of Service Mesh (ASM) allows you to obtain telemetry data of application containers in a non-intrusive manner. When you encounter application performance fluctuations or unbalanced resource utilization, you can scale your applications by using the Mixerless Telemetry technology. You can use Prometheus to collect key metrics of an application, such as the number of requests, the average latency of requests, and the P99 latency of requests. Horizontal Pod Autoscalers (HPAs) can automatically adjust the number of pods based on these real-time data, ensuring optimal application performance in the case of load fluctuations and improving resource utilization.

Prerequisites

Application metrics are collected by Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances.

Step 1: Deploy a metrics adapter and a Flagger load tester

Use kubectl to connect to your cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Run the following command to deploy a metrics adapter:

Note

To obtain the complete script of a metrics adapter, visit GitHub.

helm --kubeconfig <Path of the kubeconfig file> -n kube-system install asm-custom-metrics 
$KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter
  --set prometheus.url=http://prometheus.istio-system.svc:9090

Verify whether the metrics adapter is deployed as expected.

Run the following command to view the pod of the metrics adapter:

kubectl --kubeconfig <Path of the kubeconfig file> get po -n kube-system | grep metrics-adapter

Expected output:

asm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv   1/1     Running     0          30s

Run the following command to view the CustomResourceDefinitions (CRDs) of autoscaling/v2beta:

kubectl --kubeconfig <Path of the kubeconfig file> api-versions | grep "autoscaling/v2beta"

Expected output:

autoscaling/v2beta1
autoscaling/v2beta2

Run the following command to view the metrics adapter:

kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": []
}

Deploy a Flagger load tester.

Download the required YAML files of the Flagger load tester. For more information, visit GitHub.

Run the following commands to deploy the Flagger load tester:

kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/deployment.yaml -n test
kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/service.yaml -n test

Step 2: Create different HPAs based on your business requirements

Create an HPA to scale the pods of an application based on the value of the istio_requests_total parameter. The istio_requests_total parameter indicates the number of requests that are sent to the application.

Use the following content to create the requests_total_hpa.yaml file:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo-total
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
      sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: processed-requests-per-second
        target:
          type: AverageValue
          averageValue: "10"

annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_requests_total parameter.
target: In this example, set the averageValue parameter to 10. If the average number of requests that are sent to the application is greater than or equal to 10, the HPA automatically scales out the pods of the application.

Run the following command to deploy the HPA:

kubectl --kubeconfig <Path of the kubeconfig file> apply -f resources_hpa/requests_total_hpa.yaml

Verify whether the HPA is deployed as expected.

kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "prometheus-query",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_sum parameter. The istio_request_duration_milliseconds_sum parameter indicates the average latency of requests that are sent to the application. Use the following content to create the podinfo-latency-avg.yaml file:

Repeat Substep b in Step 1 to deploy the HPA.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo-latency-avg
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/latency-average: |
      sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m]))
      /sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: latency-average
        target:
          type: AverageValue
          averageValue: "0.005"

annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_sum parameter.
target: In this example, set the averageValue parameter to 0.005. If the average latency of requests that are sent to the application is greater than or equal to 0.005s, the HPA automatically scales out the pods of the application.

Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_bucket parameter. The istio_request_duration_milliseconds_bucket parameter indicates the P95 latency of requests that are sent to the application. Use the following content to create the podinfo-p95.yaml file:
Repeat Substep b in Step 1 to deploy the HPA.
```
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo-p95
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/p95-latency: |
      histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le))
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: p95-latency
        target:
          type: AverageValue
          averageValue: "4"
```
- annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_bucket parameter.
- target: In this example, set the averageValue parameter to 4. If the average P95 latency of requests that are sent to the application is greater than or equal to 4 ms, the HPA automatically scales out the pods of the application.

Verify whether the pods of an application can be scaled as expected

In this example, verify the HPA that is deployed to scale the pods of an application based on the number of requests sent to the application. Verify whether the HPA works as expected if the number of requests that are sent to the application is greater than or equal to 10.

Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 10 and the number of concurrent requests to 2.
```
alias k="kubectl --kubeconfig $USER_CONFIG"
loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898
```
- -z: the duration within which requests are initiated.
- -c: the number of concurrent requests.
- -q: the number of requests per second.

Run the following command to check whether the pods are scaled out as expected:

watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total

Expected output:

Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo                                            East6C16G: Tue Jan 26 18:01:30 2021

NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         2          4m45s

A value of 2 appears in the REPLICAS column, which indicates that the current number of pods of the application is 2.

Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 15 and the number of concurrent requests that are processed at a time to 2.

alias k="kubectl --kubeconfig $USER_CONFIG"
loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898

Run the following command to check whether the pods are scaled out as expected:
```
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
```
Expected output:
```
Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo                                            East6C16G: Tue Jan 26 18:01:30 2021

NAME      REFERENCE            TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   10056m/10 (avg)   1         5         3         4m45s
```
A value of 3 appears in the REPLICAS column, which indicates that the current number of pods of the application is 3. The result shows that the pods of the application are scaled out when the number of requests that are sent to the application increases. If you decrease the number of requests that are sent to the application to a specific level, a value of 1 appears in the REPLICAS column. The result shows that the pods of the application are scaled in when the number of requests that are sent to the application decreases.