Service Mesh (ASM) collects telemetry data for Container Service for Kubernetes (ACK) clusters and Container Compute Service (ACS) clusters in a non-intrusive manner, which makes the service communication in the clusters observable. This telemetry feature makes service behaviors observable and helps O&M staff troubleshoot, maintain, and optimize applications without increasing maintenance costs. Based on the four key metrics, including latency, traffic, errors, and saturation, ASM generates a series of metrics for the services that it manages. This topic describes how to implement auto scaling for workloads by using ASM metrics.
Prerequisites
An ACK cluster or ACS cluster is created. For more information, see Create an ACK managed cluster or Create an ACS cluster.
An ASM instance is created. For more information, see Create an ASM instance.
A Prometheus instance and a Grafana instance are deployed in the clusters. For more information, see Use open source Prometheus to monitor an ACK cluster.
A Prometheus instance is deployed to monitor the ASM instance. For more information, see Monitor ASM instances by using a self-managed Prometheus instance.
Background
ASM generates a series of metrics for the services that it manages. For more information, see Istio Standard Metrics.
Auto scaling is an approach that is used to automatically scale up or down workloads based on resource usage. In Kubernetes, two autoscalers are used to implement auto scaling.
Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes in a cluster.
Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number of pods that are used to deploy applications.
The aggregation layer of Kubernetes allows third-party applications to extend the Kubernetes API by registering themselves as API add-ons. These add-ons can be used to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically query core metrics such as CPU utilization and memory usage by using the resource metrics API. In addition, HPAs use the custom metrics API to query application-specific metrics, such as the observability metrics that are provided by ASM.
Step 1: Enable Prometheus monitoring for the ASM instance
For more information, see Collect metrics to Managed Service for Prometheus.
Step 2: Deploy the adapter for the custom metrics API
Download and install Adapter to ACK cluster.
helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter --set prometheus.url=http://prometheus.istio-system.svc:9090Check whether kube-metrics-adapter is enabled.
Run the following command to verify that
autoscaling/v2betaexists:kubectl api-versions |grep "autoscaling/v2beta"Expected output:
autoscaling/v2betaRun the following command to check the status of the pod of kube-metrics-adapter:
kubectl get po -n kube-system |grep metrics-adapterExpected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19sRun the following command to query the custom metrics that are provided by kube-metrics-adapter:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Step 3: Deploy a sample application
Create a namespace named test. Enable automatic sidecar proxy injection for the namespace. For more information, see Manage namespaces and resource quotas and Manage global namespaces.
Deploy a sample application.
Create a file named podinfo.yaml and copy the following content to the file:
Run the following command to deploy a podinfo application:
kubectl apply -n test -f podinfo.yaml
To trigger auto scaling, you must deploy a load testing service in the test namespace for triggering requests.
Create a file named loadtester.yaml and copy the following content to the file:
Run the following command to deploy the load testing service:
kubectl apply -n test -f loadtester.yaml
Check whether the sample application and the load testing service are deployed.
Run the following command to check the pod status:
kubectl get pod -n testExpected output:
NAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11mRun the following commands to log on to the container for load testing and run the hey command to generate loads:
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898A load is generated, which indicates that the sample application and the load testing service are deployed.
Step 4: Configure an HPA by using ASM metrics
Define an HPA to scale the workloads of the podinfo application based on the number of requests that the podinfo application receives per second. When more than 10 requests are received per second on average, the HPA increases the number of replicas.
Create a file named hpa.yaml and copy the following code to the file:
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum( rate( istio_requests_total{ destination_workload="podinfo", destination_workload_namespace="test", reporter="destination" }[1m] ) ) spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"Run the following command to deploy the HPA:
kubectl apply -f hpa.yamlRun the following command to check whether the HPA is deployed.
Run the following command to query the custom metrics that are provided by kube-metrics-adapter:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }The output contains the list of custom ASM metrics, which indicates that the HPA is deployed.
Verify auto scaling
Run the following command to log on to the container for load testing and run the hey command to generate loads:
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 10 -q 5 http://podinfo.test:9898Run the following command to check the effect of auto scaling:
NoteMetrics are synchronized every 30 seconds by default. The container can be scaled only once every 3 to 5 minutes. This way, the HPA can reserve time for automatic scaling before the conflict strategy is executed.
watch kubectl -n test get hpa/podinfoExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mThe HPA starts to scale up workloads in 1 minute until the number of requests per second decreases under the specified threshold. After the load testing is complete, the number of requests per second decreases to zero. Then, the HPA starts to decrease the number of pods. A few minutes later, the number of replicas decreases from the value in the preceding output to one.