All Products
Search
Document Center

Container Service for Kubernetes:Implement elastic scaling for applications with HPA based on QPS data

Last Updated:Mar 07, 2026

If your application must dynamically adjust its compute resources based on request volume, use queries per second (QPS) data from your Application Load Balancer (ALB) instances to configure elastic scaling for the application's pods.

Before you begin

Before you begin, read Create and use an ALB Ingress to expose a service to learn the basics of using an ALB Ingress.

How it works

Queries per second (QPS) is the number of requests received per second. An Application Load Balancer (ALB) instance records client access data using Simple Log Service (SLS). A Horizontal Pod Autoscaler (HPA) monitors the QPS data for the service from these records and scales the corresponding workloads, such as deployments and StatefulSets.

Prerequisites

Step 1: Create an AlbConfig and associate a Simple Log Service project

  1. View the Simple Log Service project associated with the cluster.

    1. Log on to the Container Service Management Console . In the navigation pane on the left, click Clusters.

    2. On the Clusters page, click the name of your cluster. In the navigation pane on the left, click Cluster Information.

    3. On the Basic Information tab, locate the Log Service Project resource and record the project name on the right.

  2. Create an AlbConfig.

    1. Create a file named alb-qps.yaml, copy the following content into the file, and enter the Simple Log Service project information in the accessLogConfig field.

      apiVersion: alibabacloud.com/v1
      kind: AlbConfig
      metadata:
        name: alb-qps
      spec:
        config:
          name: alb-qps
          addressType: Internet
          zoneMappings:
          - vSwitchId: vsw-uf6ccg2a9g71hx8go**** # The ID of the vSwitch
          - vSwitchId: vsw-uf6nun9tql5t8nh15****
          accessLogConfig:
            logProject: <LOG_PROJECT> # The name of the Simple Log Service project associated with the cluster
            logStore: <LOG_STORE> # A custom Logstore name. The name must start with "alb_".
        listeners:
          - port: 80
            protocol: HTTP

      The following table describes the fields.

      Field

      Type

      Description

      logProject

      string

      The name of the Simple Log Service project.

      Default value: "".

      logStore

      string

      The name of the Simple Log Service Logstore. The name must start with alb_. If the Logstore does not exist, it is automatically created. For a configuration example of a Simple Log Service Logstore, see Enable access logs.

      Default value: "alb_****".

    2. Run the following command to create the AlbConfig.

       kubectl apply -f alb-qps.yaml

      Expected output:

      albconfig.alibabacloud.com/alb-qps created

Step 2: Create sample resources

In addition to an AlbConfig, an ALB Ingress requires a deployment, a service, an IngressClass, and an Ingress to function. Follow these steps to quickly create these resources.

  1. Create a file named qps-quickstart.yaml with the following content.

    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: qps-ingressclass
    spec:
      controller: ingress.k8s.alibabacloud/alb
      parameters:
        apiGroup: alibabacloud.com
        kind: AlbConfig
        name: alb-qps # Must be the same as the name of the AlbConfig
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: qps-ingress
    spec:
      ingressClassName: qps-ingressclass # Must be the same as the name of the IngressClass
      rules:
       - host: demo.alb.ingress.top # Replace with your domain name
         http:
          paths:
          - path: /qps
            pathType: Prefix
            backend:
              service:
                name: qps-svc
                port:
                  number: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: qps-svc
      namespace: default
    spec:
      ports:
        - port: 80
          protocol: TCP
          targetPort: 80
      selector:
        app: qps-deploy
      type: NodePort
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: qps-deploy
      labels:
        app: qps-deploy
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: qps-deploy
      template:
        metadata:
          labels:
            app: qps-deploy
        spec:
          containers:
          - name: qps-container
            image: nginx:1.7.9
            ports:
            - containerPort: 80
  2. Run the following command to create the sample resources.

    kubectl apply -f qps-quickstart.yaml

Step 3: Create an HPA

  1. Create and save a file named qps-hpa.yaml with the following content.

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: qps-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: qps-deploy # The name of the workload controlled by the HPA
      minReplicas: 2 # The minimum number of pods
      maxReplicas: 10 # The maximum number of pods
      metrics:
        - type: External   # Use external metrics (non-native Kubernetes metrics)
          external:
            metric:
              name: sls_alb_ingress_qps # The metric name (QPS of Alibaba Cloud ALB Ingress). Do not modify this value.
              selector:
                matchLabels:
                  sls.project: <LOG_PROJECT> # The name of the Simple Log Service project. Replace with your actual project name.
                  sls.logstore: <LOG_STORE> # The name of the Logstore. Replace with your actual Logstore name.
                  sls.ingress.route: default-qps-svc-80 # The path of the service. The format is <namespace>-<svc>-<port>.
            target:
              type: AverageValue  # The target metric type (average value)
              averageValue: "2"     # The target value for the metric. In this example, the average QPS for all pods is 2.

    The following table describes the fields.

    Field

    Description

    scaleTargetRef

    The workload used by the application. This example uses the Deployment named qps-deploy created in Step 1.

    minReplicas

    The minimum number of pods to which the deployment can be scaled in. This value must be an integer greater than or equal to 1.

    maxRaplicas

    The maximum number of pods to which the deployment can be scaled out. This value must be greater than the minimum number of replicas.

    external.metric.name

    The metric for QPS data that the HPA uses. Do not modify this value.

    sls.project

    The Simple Log Service project that provides the metric data. This must be the same as the project specified in the AlbConfig.

    sls.logstore

    The Logstore that provides the metric data. This must be the same as the Logstore specified in the AlbConfig.

    sls.ingress.route

    The Service path, formatted as <namespace>-<svc>-<port>, is the qps-svc Service you created in Step 1.

    external.target

    The target value for the metric. In this example, the average QPS for all pods is 2. The HPA adjusts the number of pods to keep the QPS as close to the target value as possible.

  2. Run the following command to create the HPA.

    kubectl apply -f qps-hpa.yaml
  3. Run the following command to view the HPA deployment status.

    kubectl get hpa

    Expected output:

    NAME      REFERENCE               TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
    qps-hpa   Deployment/qps-deploy   0/2 (avg)   2         10        2          5m41s
  4. Run the following command to view the HPA configuration details.

    kubectl describe hpa qps-hpa

    Expected output:

    Name:                                            qps-hpa
    Namespace:                                       default
    Labels:                                          <none>
    Annotations:                                     <none>
    CreationTimestamp:                               ******** # The timestamp of the HPA. You can ignore this.
    Reference:                                       Deployment/qps-deploy
    Metrics:                                         ( current / target )
      "sls_alb_ingress_qps" (target average value):  0 / 2
    Min replicas:                                    2
    Max replicas:                                    10
    Deployment pods:                                 2 current / 2 desired

Optional: Step 4: Verify the result

  1. Verify application scale-out.

    1. Run the following command to view the Ingress information.

      kubectl get ingress

      Expected output:

      NAME            CLASS                HOSTS                  ADDRESS                         PORTS     AGE
      qps-ingress     qps-ingressclass     demo.alb.ingress.top   alb-********.alb.aliyuncs.com   80        10m31s

      Record the values of HOSTS and ADDRESS for later use.

    2. Run the following command to perform a stress test on the application.

      Replace demo.alb.ingress.top and alb-********.alb.aliyuncs.com with the values you recorded in the previous step.

      ab -r -c 5 -n 10000 -H Host:demo.alb.ingress.top http://alb-********.alb.aliyuncs.com/qps
    3. Run the following command to view the scaling status of the application.

      kubectl get hpa

      Expected output:

      NAME      REFERENCE               TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
      qps-hpa   Deployment/qps-deploy   14375m/2 (avg)   2         10        10         15m

      The REPLICAS value in the output is 10. This indicates that as the QPS increased, the application scaled out to the maximum of 10 pods specified by MAXPODS.

  2. Verify application scale-in.

    After the stress test completes, run the following command to view the scaling status of the application.

    kubectl get hpa

    Expected output:

    NAME      REFERENCE               TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
    qps-hpa   Deployment/qps-deploy   0/2 (avg)   2         10        2          28m

    The REPLICAS value in the output is 2. This indicates that after the QPS dropped to 0, the application scaled in to the minimum of 2 pods specified by MINPODS.

References