Implement auto scaling for workloads by using ASM metrics - Alibaba Cloud Service Mesh

Service Mesh (ASM) provides a non-intrusive method to generate telemetry data for service communication within Alibaba Cloud Container Service for Kubernetes (ACK) and Alibaba Cloud Container Service (ACS) clusters. This telemetry feature provides observability into service behavior. It helps operations and maintenance (O&M) engineers troubleshoot, maintain, and optimize applications without requiring changes to the application code. Based on the four golden signals of monitoring (latency, traffic, errors, and saturation), ASM generates a series of metrics for the services that it manages. This topic describes how to use ASM metrics to implement automatic scaling for workloads.

Prerequisites

An ACK cluster or ACS cluster is created. For more information, see Create an ACK managed cluster or Create an ACS cluster.
An ASM instance is created. For more information, see Create an ASM instance.
A Prometheus instance and a Grafana instance are created in the cluster. For more information, see Open source Prometheus monitoring.
Prometheus is integrated for mesh monitoring. For more information, see Integrate a self-managed Prometheus instance for mesh monitoring.

Background information

Service Mesh generates a series of metrics for the services that it manages. For more information, see Istio standard metrics.

Automatic scaling is a method to automatically scale workloads up or down based on resource usage. Kubernetes provides two dimensions for automatic scaling:

Cluster Autoscaler (CA): handles node scaling operations to increase or decrease the number of nodes.
Horizontal Pod Autoscaler (HPA): automatically scales the number of pods in a deployment.

The aggregation layer in Kubernetes allows third-party applications to extend the Kubernetes API by registering as API add-on components. These add-on components can implement the Custom Metrics API and allow the HPA to access any metric. The HPA periodically queries core metrics, such as CPU or memory, through the Resource Metrics API. It also retrieves application-specific metrics, including the observability metrics that ASM provides, through the Custom Metrics API.

Step 1: Enable collection of Prometheus monitoring metrics

For more information, see Collect monitoring metrics in Managed Service for Prometheus.

Step 2: Deploy the custom metrics API adapter

Download and install the kube-metrics-adapter to the ACK cluster.

helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter  --set prometheus.url=http://prometheus.istio-system.svc:9090

Confirm that kube-metrics-adapter is enabled.

Confirm that autoscaling/v2 exists.

kubectl api-versions | grep "autoscaling/v2"

Expected output:

autoscaling/v2

Check the status of the kube-metrics-adapter pod.

kubectl get po -n kube-system |grep metrics-adapter

Expected output:

asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2****          1/1     Running   0          19s

List the custom external metrics that the Prometheus adapter provides.

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": []
}

Step 3: Deploy the sample application

Create the test namespace and enable automatic sidecar injection. For more information, see Manage namespaces and quotas and Enable automatic injection.

Deploy the sample application.

Create a file named podinfo.yaml with the following content.

Click to view the YAML content

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  minReadySeconds: 5
  strategy:
    rollingUpdate:
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: podinfo
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      labels:
        app: podinfo
    spec:
      containers:
      - name: podinfod
        image: stefanprodan/podinfo:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9898
          name: http
          protocol: TCP
        command:
        - ./podinfo
        - --port=9898
        - --level=info
        livenessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/healthz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - podcli
            - check
            - http
            - localhost:9898/readyz
          initialDelaySeconds: 5
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 2000m
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 64Mi
---
apiVersion: v1
kind: Service
metadata:
  name: podinfo
  namespace: test
  labels:
    app: podinfo
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 9898
      targetPort: 9898
      protocol: TCP
  selector:
    app: podinfo

Deploy podinfo.
```
kubectl apply -n test -f podinfo.yaml
```

Deploy a load testing service in the test namespace to trigger automatic scaling.

Create a file named loadtester.yaml.

Click to view the YAML content

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  selector:
    matchLabels:
      app: loadtester
  template:
    metadata:
      labels:
        app: loadtester
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
        - name: loadtester
          image: weaveworks/flagger-loadtester:0.18.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          command:
            - ./loadtester
            - -port=8080
            - -log-level=info
            - -timeout=1h
          livenessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          readinessProbe:
            exec:
              command:
                - wget
                - --quiet
                - --tries=1
                - --timeout=4
                - --spider
                - http://localhost:8080/healthz
            timeoutSeconds: 5
          resources:
            limits:
              memory: "512Mi"
              cpu: "1000m"
            requests:
              memory: "32Mi"
              cpu: "10m"
          securityContext:
            readOnlyRootFilesystem: true
            runAsUser: 10001
---
apiVersion: v1
kind: Service
metadata:
  name: loadtester
  namespace: test
  labels:
    app: loadtester
spec:
  type: ClusterIP
  selector:
    app: loadtester
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http

Deploy the load testing service.

kubectl apply -n test -f loadtester.yaml

Verify that the sample application and the load testing service are deployed.

Check the pod status.

kubectl get pod -n test

Expected output:

NAME                          READY   STATUS    RESTARTS   AGE
loadtester-64df4846b9-nxhvv   2/2     Running   0          2m8s
podinfo-6d845cc8fc-26xbq      2/2     Running   0          11m

Log on to the load tester container and generate a load.

export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}')
kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898

A successful response indicates that a load is generated and that the sample application and the load testing service are deployed.

Step 4: Configure an HPA using ASM metrics

Define an HPA that scales the podinfo workload based on the number of requests received per second. When the average traffic load exceeds 10 requests per second, the HPA scales out the deployment.

Note

Note: This example uses HPA API version autoscaling/v2, which applies to Kubernetes 1.23 and later. For clusters that run Kubernetes 1.26 or later, use version v2. The v2beta2 version was removed in Kubernetes 1.26.

Create a file named hpa.yaml.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
  namespace: test
  annotations:
    metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
      sum(
          rate(
              istio_requests_total{
                destination_workload="podinfo",
                destination_workload_namespace="test",
                reporter="destination"
              }[1m]
          )
      ) 
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  metrics:
    - type: External
      external:
        metric:
          name: prometheus-query
          selector:
            matchLabels:
              query-name: processed-requests-per-second
        target:
          type: AverageValue
          averageValue: "10"

Deploy the HPA.
```
kubectl apply -f hpa.yaml
```

Verify that the HPA is deployed.

List the custom external metrics that the Prometheus adapter provides.

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Expected output:

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "prometheus-query",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

The output contains the resource list of custom ASM metrics. This indicates that the HPA is deployed successfully.

Verify automatic scaling

Log on to the load tester container to generate workload requests.

kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5m -c 10 -q 5 http://podinfo.test:9898

Check the automatic scaling status.
Note
By default, metrics are synchronized every 30 seconds. A scaling operation can occur only if the workload has not been rescaled in the last 3 to 5 minutes. This prevents the HPA from making rapid, conflicting decisions and allows time for the cluster autoscaler to operate.
```
watch kubectl -n test get hpa/podinfo
```
Expected output:
```
NAME      REFERENCE            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
podinfo   Deployment/podinfo   8308m/10 (avg)   1         10        6          124m
```
After one minute, the HPA starts to scale up the workload until the number of requests per second falls below the target value. When the load test is complete, the number of requests per second drops to zero, and the HPA starts to scale down the number of workload pods. After a few minutes, the number of replicas in the command output returns to one.