All Products
Search
Document Center

Container Service for Kubernetes:Implement horizontal pod autoscaling

Last Updated:Feb 27, 2026

When workloads surge, your application needs more replicas to handle the load. When demand drops, excess replicas waste resources. Horizontal Pod Autoscaler (HPA) solves this by automatically adjusting pod replica counts based on CPU utilization, memory usage, or other metrics -- no manual intervention required.

HPA suits services with fluctuating demand, frequent scaling needs, or large numbers of workloads. Common use cases include e-commerce platforms, online education, and financial services.

How HPA works

HPA runs as a control loop that periodically checks metric values against the targets you define. Every 15 seconds, the HPA controller queries the Metrics API and compares current resource usage against target thresholds. The Metrics API retrieves data from the kubelet every 60 seconds, so HPA effectively evaluates metrics on a 60-second cycle.

The core scaling formula:

desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))

For example, if current CPU utilization is 80% and the target is 50%, HPA calculates ceil(currentReplicas * 80/50) and scales the Deployment accordingly. A 10% tolerance band prevents thrashing -- HPA does not scale when the ratio is within 0.1 of 1.0.

BehaviorDetail
Scale-outImmediate. HPA increases replicas as soon as a metric exceeds the target (plus tolerance).
Scale-in5-minute default cooldown to avoid premature scale-down during transient dips.
Multiple metricsHPA scales when *any* specified metric exceeds its threshold.
Resource requests requiredHPA calculates utilization as currentUsage / requests. Without resource requests on containers, HPA cannot compute utilization and will not function.

For the full algorithm specification, see Algorithm details.

Prerequisites

Before you begin, ensure that you have:

Create an HPA-enabled application in the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the target cluster and click its name or click Details in the Actions column.

  3. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments.

  4. On the Deployments page, click Create from Image.

  5. On the Create page, configure the following sections:

    • Basic Information: Set the application name and number of replicas.

    • Container: Select the image and specify the required CPU and memory resources. > Important: Set resource requests for the application. Otherwise, HPA does not take effect.

    • Advanced:

      • In the Access Control section, click Create next to Services to configure the Service.

      • In the Scaling section, set HPA to Enable and configure the scaling parameters: | Parameter | Description | |-----------|-------------| | Metrics | Select CPU Usage or Memory Usage. The metric type must match the resource type specified in Required Resources. If you specify both, HPA scales when either metric exceeds its threshold. | | Condition | The resource usage threshold that triggers scaling. | | Max. Replicas | The maximum number of replicas. Must be greater than the minimum. | | Min. Replicas | The minimum number of replicas. Must be an integer greater than or equal to 1. |

For detailed steps and all configuration parameters, see Create a stateless application from an image.

Create an HPA-enabled application with kubectl

This section uses an NGINX Deployment to demonstrate HPA configuration with kubectl. Create only one HPA per workload.

Step 1: Create a Deployment

Create a file named nginx.yml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace with your actual image_name:tag.
        ports:
        - containerPort: 80
        resources:
          requests:       # Required for HPA to calculate utilization.
            cpu: 500m
Important

Define resources.requests for your containers. HPA calculates utilization as currentUsage / requests. Without requests, HPA cannot determine utilization and will not scale pods.

Apply the Deployment:

kubectl apply -f nginx.yml

Step 2: Create an HPA

Create a file named hpa.yml. The HPA uses scaleTargetRef to associate with the nginx Deployment and triggers scaling when average CPU utilization across all pods exceeds 50%.

For Kubernetes 1.24 and later (recommended -- uses autoscaling/v2):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx                # Target Deployment name.
  minReplicas: 1               # Minimum replica count. Integer >= 1.
  maxReplicas: 10              # Maximum replica count. Must exceed minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # Target average CPU utilization (percentage of requests).

For Kubernetes versions earlier than 1.24 (legacy)

Use autoscaling/v2beta2 instead:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
autoscaling/v2beta2 is deprecated in Kubernetes 1.23 and removed in 1.26. Upgrade to autoscaling/v2 when possible.

Apply the HPA:

kubectl apply -f hpa.yml

(Optional) Use multiple metrics

To scale based on both CPU and memory, specify both resource types under the metrics field in a single HPA. Do not create separate HPAs for each metric. HPA scales when *any* metric exceeds its threshold.

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 50
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 50

Verify HPA status

After applying the HPA, initial metric collection takes a few moments. During this period, kubectl describe hpa may show warnings like the following:

Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7

Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5

These warnings indicate that HPA is still initializing and metrics have not yet been collected.

Check HPA status:

kubectl get hpa

Check scaling events:

kubectl describe hpa nginx-hpa

When HPA operates correctly, the Events section shows output similar to:

Type    Reason             Age   From                       Message
----    ------             ----  ----                       -------
Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Clean up

To remove the resources created in this tutorial:

kubectl delete hpa nginx-hpa
kubectl delete deployment nginx

What's next