All Products
Search
Document Center

Container Compute Service:Use HPA in Knative to implement auto scaling based on CPU and memory metrics

Last Updated:Jan 14, 2025

Alibaba Cloud Knative can integrate with the Horizontal Pod Autoscaler (HPA) to enable auto scaling based on resource load. Although Knative natively supports auto scaling based on the number of requests, integrating with HPA allows for fine-grained scaling using additional metrics such as CPU and memory usage.

Prerequisites

Step 1: Deploy a Knative Service

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Applications > Knative.

  3. On the Services tab of the Knative page, select default from the Namespace drop-down list, click Create from Template, copy the following YAML content to the code editor, and then click Create.

    The following sample code creates a Knative Service named helloworld-go-hpa:

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: helloworld-go-hpa # Specify the name of the Knative Service. 
    spec:
      template:
        metadata:
          labels:
            app: helloworld-go-hpa
          annotations:
            autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev" # Specify HPA as the scaler. 
            autoscaling.knative.dev/metric: "cpu" # The metrics supported by HPA include CPU utilization and memory utilization. In this example, HPA is configured to work based on CPU utilization. 
            autoscaling.knative.dev/target: "30" # Specify the threshold of CPU utilization. HPA automatically scales pods for the Knative Service when the threshold is exceeded. 
            autoscaling.knative.dev/minScale: "1" # Specify the minimum number of pods that must be guaranteed. 
            autoscaling.knative.dev/maxScale: "4" # Specify the maximum number of pods that are allowed. 
        spec:
          containers:
            - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/autoscale-go:v1024
              resources:
                requests:
                  cpu: '200m'
  4. Run the following command to check whether the Knative Service runs as expected:

    kubectl get ksvc

    Expected output:

    NAME                   URL                                               LATESTCREATED                LATESTREADY                  READY   REASON
    helloworld-go-hpa      http://helloworld-go-hpa.default.example.com      helloworld-go-hpa-00001      helloworld-go-hpa-00001      True        

    If True is displayed in the READY column, the Knative Service runs as expected.

Step 2: Implement auto scaling based on the CPU metric

  1. Install the load testing tool hey.

    For more information about hey, see Hey.

  2. Run the following command to perform a load test by sending 100 queries per second (QPS) for 60 seconds:

    Note

    Replace121.XX.XX.10with the IP address or domain name of the gateway.

    hey -z 60s -q 100   -host "helloworld-go-hpa.default.example.com"   "http://121.XX.XX.10?prime=40000000" # 121.199.XXX.XXX is the IP address or domain name of the gateway.

    During the load test, you can run the following command to check whether pods are scaled in real time:

    kubectl get pods --watch

    Expected output:

    NAME                                                     READY   STATUS    RESTARTS   AGE
    # The pod is running as expected and containers are in the ready state. 
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxfl5      2/2     Running   0          101m
    # The number of pods is scaled out to four. The READY column displays 0/2 for each pod and the STATUS column displays Pending for each pod. This means that the pods are pending and resources are not allocated to the pods. 
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      0/2     Pending   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      0/2     Pending   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      0/2     Pending   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      0/2     Pending   0          0s
    # The READY column displays 0/2 for each pod and the STATUS column displays ContainerCreating for each pod. This means that the containers in the pods are being created. 
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      0/2     ContainerCreating   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      0/2     ContainerCreating   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      0/2     ContainerCreating   0          0s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      0/2     ContainerCreating   0          0s
    # The READY column displays 1/2 for two pods and 2/2 for two pods, and the STATUS column displays Running for each pod. This means that at least one container is created and runs as expected for each pod. 
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      1/2     Running             0          1s
    helloworld-go-hpa-00001-deployment-67cc8f979b-kv6rj      2/2     Running             0          1s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      1/2     Running             0          1s
    helloworld-go-hpa-00001-deployment-67cc8f979b-fxq85      2/2     Running             0          1s

    The output shows that HPA can automatically scale pods for the Knative Service. When the load increases, HPA scales the number of pods from one to four to improve the processing capability and throughput of the Knative Service.

(Optional) Step 3: View the Knative monitoring dashboard

Knative provides out-of-the-box monitoring features. On the Knative page, click the Monitoring Dashboards tab to view the monitoring data of the specified Service. For more information about the Knative dashboard, see View the Knative dashboard.

image.png

References