All Products
Search
Document Center

Container Service for Kubernetes:Implement elastic scaling for applications with HPA based on QPS data

Last Updated:Mar 26, 2026

Application Load Balancer (ALB) Ingresses can automatically scale your application based on the queries-per-second (QPS) values collected by the ALB instance. This keeps your application stable under variable load while controlling resource costs.

Prerequisites

Before you begin, ensure that you have:

How it works

  1. Create a Deployment and Service for your application.

  2. Create an ALB Ingress to route external traffic to the Service.

  3. Create a Horizontal Pod Autoscaler (HPA) that watches the sls_alb_ingress_qps metric from Log Service.

  4. When QPS exceeds the per-pod threshold, the HPA scales out the Deployment. When QPS drops, the HPA scales it back in.

Step 1: Create an application and a service

  1. Create a file named tea.yaml with the following content:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-basic
      labels:
        app: tea
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: tea
      template:
        metadata:
          labels:
            app: tea
        spec:
          containers:
          - name: tea
            image: nginx:1.7.9
            ports:
            - containerPort: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: tea-svc
      namespace: default
    spec:
      ports:
        - port: 80
          protocol: TCP
          targetPort: 80
      selector:
        app: tea
      type: NodePort
  2. Apply the manifest:

    kubectl apply -f tea.yaml

Step 2: Create an ALB Ingress

Create an AlbConfig object

  1. Create a file named alb-test.yaml with the following content:

    Field Description
    zoneMappings Specify at least two vSwitch IDs from different zones in the same VPC.
    logProject The name of the Log Service project you created.
    logStore The Logstore name. Must start with alb_. If the Logstore does not exist, the system creates it automatically.
    apiVersion: alibabacloud.com/v1
    kind: AlbConfig
    metadata:
      name: alb-demo
    spec:
      config:
        name: alb-test
        addressType: Internet        # Internet-facing ALB
        zoneMappings:
        - vSwitchId: vsw-uf6ccg2a9g71hx8go****   # Replace with your first vSwitch ID
        - vSwitchId: vsw-uf6nun9tql5t8nh15****   # Replace with your second vSwitch ID (different zone)
        accessLogConfig:
          logProject: "****"         # Replace with your Log Service project name
          logStore: "alb_****"       # Replace with your Logstore name; must start with alb_
  2. Apply the manifest:

    kubectl apply -f alb-test.yaml

Create an IngressClass

  1. Create a file named alb.yaml with the following content:

    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: alb
    spec:
      controller: ingress.k8s.alibabacloud/alb
      parameters:
        apiGroup: alibabacloud.com
        kind: AlbConfig
        name: alb-demo              # Must match the AlbConfig metadata.name above
  2. Apply the manifest:

    kubectl apply -f alb.yaml

Create the Ingress

  1. Create a file named tea-ingress.yaml with the following content:

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: tea-ingress
    spec:
      ingressClassName: alb
      rules:
       - host: demo.ingress.top
         http:
          paths:
          - path: /tea
            pathType: Prefix
            backend:
              service:
                name: tea-svc
                port:
                  number: 80
  2. Apply the manifest:

    kubectl apply -f tea-ingress.yaml
  3. Get the ALB address assigned to the Ingress:

    kubectl get ingress

    Expected output:

    NAME          CLASS   HOSTS              ADDRESS                                            PORTS   AGE
    tea-ingress   alb     demo.ingress.top   alb-110zvs5nhsvfv*****.cn-chengdu.alb.aliyuncs.com   80      7m5s

    Note the ADDRESS value. You will use it in the stress test command in Step 4.

Step 3: Create an HPA

  1. Create a file named hpa.yaml with the following content:

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: ingress-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx-deployment-basic   # The Deployment to scale
      minReplicas: 2                   # Minimum number of pods
      maxReplicas: 10                  # Maximum number of pods
      metrics:
        - type: External
          external:
            metric:
              name: sls_alb_ingress_qps   # ALB QPS metric from Log Service
              selector:
                matchLabels:
                  sls.project: "****"              # Replace with your Log Service project name
                  sls.logstore: "alb_****"         # Replace with your Logstore name
                  sls.ingress.route: "default-tea-svc-80"
                  # Format: <namespace>-<service-name>-<port>
                  # Example: default-nginx-80
            target:
              type: AverageValue        # Scale based on average QPS per pod
              averageValue: 2           # Scale out when average QPS per pod exceeds 2

    This HPA scales nginx-deployment-basic between 2 and 10 pods. It triggers a scale-out whenever the average QPS per pod exceeds 2, and scales back in when QPS drops below the threshold.

  2. Apply the manifest:

    kubectl apply -f hpa.yaml
  3. Verify the HPA was created:

    kubectl get hpa

    Expected output:

    NAME          REFERENCE                           TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
    ingress-hpa   Deployment/nginx-deployment-basic   0/2 (avg)   2         10        2          4h34m

    The current QPS is 0 because no traffic is hitting the application yet. The pod count is at its minimum (2), which is the expected baseline state.

  4. (Optional) Inspect the HPA details:

    kubectl describe hpa ingress-hpa

    Expected output:

    Name:                                            ingress-hpa
    Namespace:                                       default
    Labels:                                          <none>
    Annotations:                                     <none>
    CreationTimestamp:                               Tue, 31 Jan 2023 11:35:01 +0800
    Reference:                                       Deployment/nginx-deployment-basic
    Metrics:                                         ( current / target )
    "sls_alb_ingress_qps" (target average value):    0 / 2
    Min replicas:                                    2
    Max replicas:                                    10
    Deployment pods:                                 2 current / 2 desired

Step 4: Verify auto scaling

Verify scale-out

  1. Run the following stress test against the ALB address from Step 2. Replace the address with your actual ALB address.

    ab -c 5 -n 5000 -H Host:demo.ingress.top http://alb-110zvs5nhsvfv*****.cn-chengdu.alb.aliyuncs.com/tea
  2. While the test is running, watch the HPA status in real time:

    kubectl get hpa ingress-hpa --watch

    As QPS climbs above the threshold, you will see the replica count increase. Press Ctrl+C to stop watching when you are done.

  3. After the stress test completes, confirm the scale-out result:

    kubectl get hpa

    Expected output:

    NAME          REFERENCE                           TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
    ingress-hpa   Deployment/nginx-deployment-basic   12500m/2 (avg)   2         10        10         15m

    REPLICAS is 10, confirming the Deployment scaled out to the maximum as QPS exceeded the per-pod threshold.

Verify scale-in

After the stress test ends, QPS drops to 0. The HPA automatically scales the Deployment back in. Scale-in has a default stabilization window of approximately 5 minutes, so wait a few minutes before checking.

kubectl get hpa

Expected output:

NAME          REFERENCE                           TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
ingress-hpa   Deployment/nginx-deployment-basic   0/2 (avg)    2         10        2          60m

REPLICAS is back to 2, confirming the Deployment scaled in once QPS dropped below the threshold.

What's next