Service Mesh (ASM) provides a non-intrusive method to generate telemetry data for service communication within Alibaba Cloud Container Service for Kubernetes (ACK) and Alibaba Cloud Container Service (ACS) clusters. This telemetry feature provides observability into service behavior. It helps operations and maintenance (O&M) engineers troubleshoot, maintain, and optimize applications without requiring changes to the application code. Based on the four golden signals of monitoring (latency, traffic, errors, and saturation), ASM generates a series of metrics for the services that it manages. This topic describes how to use ASM metrics to implement automatic scaling for workloads.
Prerequisites
An ACK cluster or ACS cluster is created. For more information, see Create an ACK managed cluster or Create an ACS cluster.
An ASM instance is created. For more information, see Create an ASM instance.
A Prometheus instance and a Grafana instance are created in the cluster. For more information, see Open source Prometheus monitoring.
Prometheus is integrated for mesh monitoring. For more information, see Integrate a self-managed Prometheus instance for mesh monitoring.
Background information
Service Mesh generates a series of metrics for the services that it manages. For more information, see Istio standard metrics.
Automatic scaling is a method to automatically scale workloads up or down based on resource usage. Kubernetes provides two dimensions for automatic scaling:
Cluster Autoscaler (CA): handles node scaling operations to increase or decrease the number of nodes.
Horizontal Pod Autoscaler (HPA): automatically scales the number of pods in a deployment.
The aggregation layer in Kubernetes allows third-party applications to extend the Kubernetes API by registering as API add-on components. These add-on components can implement the Custom Metrics API and allow the HPA to access any metric. The HPA periodically queries core metrics, such as CPU or memory, through the Resource Metrics API. It also retrieves application-specific metrics, including the observability metrics that ASM provides, through the Custom Metrics API.
Step 1: Enable collection of Prometheus monitoring metrics
For more information, see Collect monitoring metrics in Managed Service for Prometheus.
Step 2: Deploy the custom metrics API adapter
Download and install the kube-metrics-adapter to the ACK cluster.
helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter --set prometheus.url=http://prometheus.istio-system.svc:9090Confirm that kube-metrics-adapter is enabled.
Confirm that
autoscaling/v2exists.kubectl api-versions | grep "autoscaling/v2"Expected output:
autoscaling/v2Check the status of the kube-metrics-adapter pod.
kubectl get po -n kube-system |grep metrics-adapterExpected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19sList the custom external metrics that the Prometheus adapter provides.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Step 3: Deploy the sample application
Create the test namespace and enable automatic sidecar injection. For more information, see Manage namespaces and quotas and Enable automatic injection.
Deploy the sample application.
Create a file named podinfo.yaml with the following content.
Deploy podinfo.
kubectl apply -n test -f podinfo.yaml
Deploy a load testing service in the test namespace to trigger automatic scaling.
Create a file named loadtester.yaml.
Deploy the load testing service.
kubectl apply -n test -f loadtester.yaml
Verify that the sample application and the load testing service are deployed.
Check the pod status.
kubectl get pod -n testExpected output:
NAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11mLog on to the load tester container and generate a load.
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898A successful response indicates that a load is generated and that the sample application and the load testing service are deployed.
Step 4: Configure an HPA using ASM metrics
Define an HPA that scales the podinfo workload based on the number of requests received per second. When the average traffic load exceeds 10 requests per second, the HPA scales out the deployment.
Note: This example uses HPA API version autoscaling/v2, which applies to Kubernetes 1.23 and later. For clusters that run Kubernetes 1.26 or later, use version v2. The v2beta2 version was removed in Kubernetes 1.26.
Create a file named hpa.yaml.
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: podinfo namespace: test annotations: metric-config.external.prometheus-query.prometheus/processed-requests-per-second: | sum( rate( istio_requests_total{ destination_workload="podinfo", destination_workload_namespace="test", reporter="destination" }[1m] ) ) spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo metrics: - type: External external: metric: name: prometheus-query selector: matchLabels: query-name: processed-requests-per-second target: type: AverageValue averageValue: "10"Deploy the HPA.
kubectl apply -f hpa.yamlVerify that the HPA is deployed.
List the custom external metrics that the Prometheus adapter provides.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [ { "name": "prometheus-query", "singularName": "", "namespaced": true, "kind": "ExternalMetricValueList", "verbs": [ "get" ] } ] }The output contains the resource list of custom ASM metrics. This indicates that the HPA is deployed successfully.
Verify automatic scaling
Log on to the load tester container to generate workload requests.
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 10 -q 5 http://podinfo.test:9898Check the automatic scaling status.
NoteBy default, metrics are synchronized every 30 seconds. A scaling operation can occur only if the workload has not been rescaled in the last 3 to 5 minutes. This prevents the HPA from making rapid, conflicting decisions and allows time for the cluster autoscaler to operate.
watch kubectl -n test get hpa/podinfoExpected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mAfter one minute, the HPA starts to scale up the workload until the number of requests per second falls below the target value. When the load test is complete, the number of requests per second drops to zero, and the HPA starts to scale down the number of workload pods. After a few minutes, the number of replicas in the command output returns to one.