The Mixerless Telemetry technology of Service Mesh (ASM) allows you to obtain telemetry data of application containers in a non-intrusive manner. When you encounter application performance fluctuations or unbalanced resource utilization, you can scale your applications by using the Mixerless Telemetry technology. You can use Prometheus to collect key metrics of an application, such as the number of requests, the average latency of requests, and the P99 latency of requests. Horizontal Pod Autoscalers (HPAs) can automatically adjust the number of pods based on these real-time data, ensuring optimal application performance in the case of load fluctuations and improving resource utilization.
Prerequisites
Application metrics are collected by Prometheus. For more information, see Use Mixerless Telemetry to observe ASM instances.
Step 1: Deploy a metrics adapter and a Flagger load tester
Use kubectl to connect to your cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Run the following command to deploy a metrics adapter:
NoteTo obtain the complete script of a metrics adapter, visit GitHub.
helm --kubeconfig <Path of the kubeconfig file> -n kube-system install asm-custom-metrics $KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter --set prometheus.url=http://prometheus.istio-system.svc:9090
Verify whether the metrics adapter is deployed as expected.
Run the following command to view the pod of the metrics adapter:
kubectl --kubeconfig <Path of the kubeconfig file> get po -n kube-system | grep metrics-adapter
Expected output:
asm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv 1/1 Running 0 30s
Run the following command to view the CustomResourceDefinitions (CRDs) of autoscaling/v2beta:
kubectl --kubeconfig <Path of the kubeconfig file> api-versions | grep "autoscaling/v2beta"
Expected output:
autoscaling/v2beta1 autoscaling/v2beta2
Run the following command to view the metrics adapter:
kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Deploy a Flagger load tester.
Download the required YAML files of the Flagger load tester. For more information, visit GitHub.
Run the following commands to deploy the Flagger load tester:
kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/deployment.yaml -n test kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/service.yaml -n test
Step 2: Create different HPAs based on your business requirements
Create an HPA to scale the pods of an application based on the value of the istio_requests_total parameter. The istio_requests_total parameter indicates the number of requests that are sent to the application.
Use the following content to create the requests_total_hpa.yaml file:
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo-total namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m])) spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"
annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_requests_total parameter.
target: In this example, set the averageValue parameter to 10. If the average number of requests that are sent to the application is greater than or equal to 10, the HPA automatically scales out the pods of the application.
Run the following command to deploy the HPA:
kubectl --kubeconfig <Path of the kubeconfig file> apply -f resources_hpa/requests_total_hpa.yaml
Verify whether the HPA is deployed as expected.
kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }
Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_sum parameter. The istio_request_duration_milliseconds_sum parameter indicates the average latency of requests that are sent to the application. Use the following content to create the podinfo-latency-avg.yaml file:
Repeat Substep b in Step 1 to deploy the HPA.
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo-latency-avg namespace: test annotations: metric-config.external.prometheus-query.prometheus/latency-average: | sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m])) /sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m])) spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: latency-average target: type: AverageValue averageValue: "0.005"
annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_sum parameter.
target: In this example, set the averageValue parameter to 0.005. If the average latency of requests that are sent to the application is greater than or equal to 0.005s, the HPA automatically scales out the pods of the application.
Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_bucket parameter. The istio_request_duration_milliseconds_bucket parameter indicates the P95 latency of requests that are sent to the application. Use the following content to create the podinfo-p95.yaml file:
Repeat Substep b in Step 1 to deploy the HPA.
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo-p95 namespace: test annotations: metric-config.external.prometheus-query.prometheus/p95-latency: | histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le)) spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: p95-latency target: type: AverageValue averageValue: "4"
annotations: Add annotations to configure the HPA to scale the pods of the application based on the value of the istio_request_duration_milliseconds_bucket parameter.
target: In this example, set the averageValue parameter to 4. If the average P95 latency of requests that are sent to the application is greater than or equal to 4 ms, the HPA automatically scales out the pods of the application.
Verify whether the pods of an application can be scaled as expected
In this example, verify the HPA that is deployed to scale the pods of an application based on the number of requests sent to the application. Verify whether the HPA works as expected if the number of requests that are sent to the application is greater than or equal to 10.
Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 10 and the number of concurrent requests to 2.
alias k="kubectl --kubeconfig $USER_CONFIG" loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}') k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898
-z: the duration within which requests are initiated.
-c
: the number of concurrent requests.-q
: the number of requests per second.
Run the following command to check whether the pods are scaled out as expected:
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
Expected output:
Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo East6C16G: Tue Jan 26 18:01:30 2021 NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 10056m/10 (avg) 1 5 2 4m45s
A value of 2 appears in the REPLICAS column, which indicates that the current number of pods of the application is 2.
Run the following command to initiate requests for 5 minutes. Set the number of requests per second to 15 and the number of concurrent requests that are processed at a time to 2.
alias k="kubectl --kubeconfig $USER_CONFIG" loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}') k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898
Run the following command to check whether the pods are scaled out as expected:
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
Expected output:
Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo East6C16G: Tue Jan 26 18:01:30 2021 NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 10056m/10 (avg) 1 5 3 4m45s
A value of 3 appears in the REPLICAS column, which indicates that the current number of pods of the application is 3.
The result shows that the pods of the application are scaled out when the number of requests that are sent to the application increases. If you decrease the number of requests that are sent to the application to a specific level, a value of 1 appears in the REPLICAS column.
The result shows that the pods of the application are scaled in when the number of requests that are sent to the application decreases.