The Mixerless Telemetry technology of Alibaba Cloud Service Mesh (ASM) allows you
to obtain telemetry data on containers in a non-intrusive manner. You can use Prometheus
to collect the monitoring metrics of an application, such as the number of requests,
the average latency of requests, and the P99 latency of requests. Then, a Horizontal
Pod Autoscaler (HPA) automatically scales the pods of the application based on the
collected metrics. This topic describes how to use Mixerless Telemetry to scale the
pods of an application.
Step 1: Deploy a metrics adapter and a Flagger load tester
- Use kubectl to connect to a Container Service for Kubernetes (ACK) cluster. For more
information, see Connect to Kubernetes clusters by using kubectl.
- Run the following command to deploy a metrics adapter:
Note To obtain the complete script of a metrics adapter, visit
GitHub.
helm --kubeconfig <Path of the kubeconfig file> -n kube-system install asm-custom-metrics \ $KUBE_METRICS_ADAPTER_SRC/deploy/charts/kube-metrics-adapter \
--set prometheus.url=http://prometheus.istio-system.svc:9090
- Verify whether the metrics adapter is deployed as expected.
- Run the following command to view the pod of the metrics adapter:
kubectl --kubeconfig <Path of the kubeconfig file> get po -n kube-system | grep metrics-adapter
Expected output:
asm-custom-metrics-kube-metrics-adapter-6fb4949988-ht8pv 1/1 Running 0 30s
- Run the following command to view the custom resource definitions (CRDs) of autoscaling/v2beta:
kubectl --kubeconfig <Path of the kubeconfig file> api-versions | grep "autoscaling/v2beta"
Expected output:
autoscaling/v2beta1
autoscaling/v2beta2
- Run the following command to view the metrics adapter:
kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": []
}
- Deploy a Flagger load tester.
- Download the required YAML files of the Flagger load tester. For more information,
visit GitHub.
- Run the following commands to deploy the Flagger load tester:
kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/deployment.yaml -n test
kubectl --kubeconfig <Path of the kubeconfig file> apply -f <Path of the Flagger load tester>/kustomize/tester/service.yaml -n test
Step 2: Create different HPAs based on your business requirements
- Create an HPA to scale the pods of an application based on the value of the istio_requests_total
parameter. The istio_requests_total parameter indicates the number of requests that
are sent to the application.
- Use the following content to create the requests_total_hpa.yaml file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-total
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
sum(rate(istio_requests_total{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: processed-requests-per-second
target:
type: AverageValue
averageValue: "10"
- annotations: Add annotations to configure the HPA to scale the pods of the application based
on the value of the istio_requests_total parameter.
- target: In this example, set the averageValue parameter to 10. If the average number of
requests that are sent to the application is greater than or equal to 10, the HPA
automatically scales out the pods of the application.
- Run the following command to deploy the HPA:
kubectl --kubeconfig <Path of the kubeconfig file> apply -f resources_hpa/requests_total_hpa.yaml
- Verify whether the HPA is deployed as expected.
kubectl --kubeconfig <Path of the kubeconfig file> get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "prometheus-query",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}
- Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_sum
parameter. The istio_request_duration_milliseconds_sum parameter indicates the average
latency of requests that are sent to the application. Use the following content to
create the podinfo-latency-avg.yaml file:
Repeat Substep b in Step 1 to deploy the HPA.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-latency-avg
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/latency-average: |
sum(rate(istio_request_duration_milliseconds_sum{destination_workload_namespace="test",reporter="destination"}[1m]))
/sum(rate(istio_request_duration_milliseconds_count{destination_workload_namespace="test",reporter="destination"}[1m]))
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: latency-average
target:
type: AverageValue
averageValue: "0.005"
- annotations: Add annotations to configure the HPA to scale the pods of the application based
on the value of the istio_request_duration_milliseconds_sum parameter.
- target: In this example, set the averageValue parameter to 0.005. If the average latency
of requests that are sent to the application is greater than or equal to 0.005s, the
HPA automatically scales out the pods of the application.
- Create an HPA to scale the pods of an application based on the value of the istio_request_duration_milliseconds_bucket
parameter. The istio_request_duration_milliseconds_bucket parameter indicates the
P95 latency of requests that are sent to the application. Use the following content
to create the podinfo-p95.yaml file:
Repeat Substep b in Step 1 to deploy the HPA.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-p95
namespace: test
annotations:
metric-config.external.prometheus-query.prometheus/p95-latency: |
histogram_quantile(0.95,sum(irate(istio_request_duration_milliseconds_bucket{destination_workload_namespace="test",destination_canonical_service="podinfo"}[5m]))by (le))
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: p95-latency
target:
type: AverageValue
averageValue: "4"
- annotations: Add annotations to configure the HPA to scale the pods of the application based
on the value of the istio_request_duration_milliseconds_bucket parameter.
- target: In this example, set the averageValue parameter to 4. If the average P95 latency
of requests that are sent to the application is greater than or equal to 4 ms, the
HPA automatically scales out the pods of the application.
Verify whether the pods of an application can be scaled as expected
In this example, verify the HPA that is deployed to scale the pods of an application
based on the number of requests sent to the application. Verify whether the HPA works
as expected if the number of requests that are sent to the application is greater
than or equal to 10.
- Run the following command to initiate requests for 5 minutes. Set the number of requests
per second to 10 and the number of concurrent requests that are processed at a time
to 2.
alias k="kubectl --kubeconfig $USER_CONFIG"
loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 10 http://podinfo:9898
-z
: the duration within which requests are initiated.
-c
: the number of concurrent requests that are processed at a time.
-q
: the number of requests per second.
- Run the following command to check whether the pods are scaled out as expected:
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
Expected output:
Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo East6C16G: Tue Jan 26 18:01:30 2021
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
podinfo Deployment/podinfo 10056m/10 (avg) 1 5 2 4m45s
A value of 2 appears in the REPLICAS
column, which indicates that the current number of pods of the application is 2.
- Run the following command to initiate requests for 5 minutes. Set the number of requests
per second to 15 and the number of concurrent requests that are processed at a time
to 2.
alias k="kubectl --kubeconfig $USER_CONFIG"
loadtester=$(k -n test get pod -l "app=flagger-loadtester" -o jsonpath='{.items..metadata.name}')
k -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 2 -q 15 http://podinfo:9898
- Run the following command to check whether the pods are scaled out as expected:
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
Expected output:
Every 2.0s: kubectl --kubeconfig /Users/han/shop_config/ack_zjk -n test get hpa/podinfo East6C16G: Tue Jan 26 18:01:30 2021
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
podinfo Deployment/podinfo 10056m/10 (avg) 1 5 3 4m45s
A value of 3 appears in the REPLICAS
column, which indicates that the current number of pods of the application is 3.
The result shows that the pods of the application are scaled out when the number of
requests that are sent to the application increases. If you decrease the number of
requests that are sent to the application to a specific level, a value of 1 appears
in the REPLICAS
column. The result shows that the pods of the application are scaled in when the
number of requests that are sent to the application decreases.