Alibaba Cloud Service Mesh (ASM) collects telemetry data for Container Service for
Kubernetes (ACK) clusters in a non-intrusive manner, which makes the service communication
in the clusters observable. This telemetry feature makes service behaviors observable
and helps O&M staff troubleshoot, maintain, and optimize applications without increasing
maintenance costs. Based on the four key monitoring metrics, including latency, traffic,
errors, and saturation, ASM generates a series of metrics for the services that it
manages. This topic describes how to implement auto scaling for workloads by using
ASM metrics.
Background information
ASM generates a series of metrics for the services that it manages. For more information,
visit Istio Standard Metrics.
Auto scaling is an approach that is used to automatically scale up or down workloads
based on the resource usage. In Kubernetes, two autoscalers are used to implement
auto scaling.
- Cluster Autoscaler (CA): CAs are used to increase or decrease the number of nodes
in a cluster.
- Horizontal Pod Autoscaler (HPA): HPAs are used to increase or decrease the number
of pods that are used to deploy applications.
The aggregation layer of Kubernetes allows third-party applications to extend the
Kubernetes API by registering themselves as API add-ons. These add-ons can be used
to implement the custom metrics API and allow HPAs to query any metrics. HPAs periodically
query core metrics such as CPU utilization and memory usage by using the resource
metrics API. In addition, HPAs use the custom metrics API to query application-specific
metrics, such as the observability metrics that are provided by ASM.

Step 1: Enable Prometheus monitoring for the ASM instance
- Log on to the ASM console.
- In the left-side navigation pane, choose .
- On the Mesh Management page, find the ASM instance that you want to configure. Click the name of the ASM
instance or click Manage in the Actions column.
- On the details page of the ASM instance, choose in the left-side navigation pane. On the Basic Information page, click Settings.
Note Make sure that the Istio version of the ASM instance is 1.6.8.4 or later.
- In the Settings Update panel, select Enable Prometheus, select Enable Self-managed Prometheus, enter the endpoint of the Prometheus instance, and then click OK.
After you enable Prometheus monitoring for the ASM instance, ASM automatically configures
the Envoy filters that are required for Prometheus.
Step 2: Deploy the adapter for the custom metrics API
- Download the installation package of the adapter. For more information, visit kube-metrics-adapter. Then, install and deploy the adapter for the custom metrics API in the ACK cluster.
## Use Helm 3.
helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter --set prometheus.url=http://prometheus.istio-system.svc:9090
- After the installation is completed, run the following commands to check whether kube-metrics-adapter
is enabled.
- Check whether the autoscaling/v2beta API group exists.
kubectl api-versions |grep "autoscaling/v2beta"
Expected output:
autoscaling/v2beta
- Check the status of the pod of kube-metrics-adapter.
kubectl get po -n kube-system |grep metrics-adapter
Expected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2cm57 1/1 Running 0 19s
- Query the custom metrics that are provided by kube-metrics-adapter.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": []
}
Step 3: Deploy a sample application
- Create a namespace named test. For more information, see Manage namespaces.
- Enable automatic sidecar injection. For more information, see Install a sidecar proxy.
- Deploy a sample application.
- Create a file named podinfo.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
minReadySeconds: 5
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: podinfo
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo
spec:
containers:
- name: podinfod
image: stefanprodan/podinfo:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
type: ClusterIP
ports:
- name: http
port: 9898
targetPort: 9898
protocol: TCP
selector:
app: podinfo
- Deploy the podinfo application.
kubectl apply -n test -f podinfo.yaml
- To trigger auto scaling, you must deploy a load testing service in the test namespace
for triggering requests.
- Create a file named loadtester.yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
name: loadtester
namespace: test
labels:
app: loadtester
spec:
selector:
matchLabels:
app: loadtester
template:
metadata:
labels:
app: loadtester
annotations:
prometheus.io/scrape: "true"
spec:
containers:
- name: loadtester
image: weaveworks/flagger-loadtester:0.18.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
command:
- ./loadtester
- -port=8080
- -log-level=info
- -timeout=1h
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
limits:
memory: "512Mi"
cpu: "1000m"
requests:
memory: "32Mi"
cpu: "10m"
securityContext:
readOnlyRootFilesystem: true
runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
name: loadtester
namespace: test
labels:
app: loadtester
spec:
type: ClusterIP
selector:
app: loadtester
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
- Deploy the load testing service.
kubectl apply -n test -f loadtester.yaml
- Check whether the sample application and the load testing service are deployed.
- Check the pod status.
kubectl get pod -n test
Expected output:
NAME READY STATUS RESTARTS AGE
loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s
podinfo-6d845cc8fc-26xbq 2/2 Running 0 11m
- Log on to the container for load testing and run the hey command to generate loads.
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898
A load is generated, which indicates that the sample application and the load testing
service are deployed.
Step 4: Configure an HPA by using ASM metrics
Define an HPA to scale the workloads of the Podinfo application based on the number
of requests that the Podinfo application receives per second. When more than 10 requests
are received per second on average, the HPA increases the number of replicas.
- Create a file named hpa.yaml and copy the following code to the file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
sum(
rate(
istio_requests_total{
destination_workload="podinfo",
destination_workload_namespace="test",
reporter="destination"
}[1m]
)
)
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: processed-requests-per-second
target:
type: AverageValue
averageValue: "10"
- Deploy the HPA.
kubectl apply -f hpa.yaml
- Check whether the HPA is deployed.
Query the custom metrics that are provided by kube-metrics-adapter.
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "prometheus-query",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}
The output contains the resource list of custom ASM metrics, which indicates that
the HPA is deployed.
Verify auto scaling
- Log on to the container for load testing and run the hey command to generate loads.
kubectl -n test exec -it ${loadtester} -c loadtester -- sh
~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898
- View the effect of auto scaling.
Note Metrics are synchronized every 30 seconds by default. The container can be scaled
only once in every 3 to 5 minutes. This way, the HPA can reserve time for automatic
scaling before the conflict strategy is executed.
watch kubectl -n test get hpa/podinfo
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124m
The HPA starts to scale up workloads in 1 minute until the number of requests per
second decreases under the specified threshold. After the load testing is completed,
the number of requests per second decreases to zero. Then, the HPA starts to decrease
the number of pods. A few minutes later, the number of replicas decreases from the
value in the preceding output to one.