When workloads surge, your application needs more replicas to handle the load. When demand drops, excess replicas waste resources. Horizontal Pod Autoscaler (HPA) solves this by automatically adjusting pod replica counts based on CPU utilization, memory usage, or other metrics -- no manual intervention required.
HPA suits services with fluctuating demand, frequent scaling needs, or large numbers of workloads. Common use cases include e-commerce platforms, online education, and financial services.
How HPA works
HPA runs as a control loop that periodically checks metric values against the targets you define. Every 15 seconds, the HPA controller queries the Metrics API and compares current resource usage against target thresholds. The Metrics API retrieves data from the kubelet every 60 seconds, so HPA effectively evaluates metrics on a 60-second cycle.
The core scaling formula:
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))For example, if current CPU utilization is 80% and the target is 50%, HPA calculates ceil(currentReplicas * 80/50) and scales the Deployment accordingly. A 10% tolerance band prevents thrashing -- HPA does not scale when the ratio is within 0.1 of 1.0.
| Behavior | Detail |
|---|---|
| Scale-out | Immediate. HPA increases replicas as soon as a metric exceeds the target (plus tolerance). |
| Scale-in | 5-minute default cooldown to avoid premature scale-down during transient dips. |
| Multiple metrics | HPA scales when *any* specified metric exceeds its threshold. |
| Resource requests required | HPA calculates utilization as currentUsage / requests. Without resource requests on containers, HPA cannot compute utilization and will not function. |
For the full algorithm specification, see Algorithm details.
Prerequisites
Before you begin, ensure that you have:
Metrics Server installed in your cluster. You can install it from the Add-ons page in the ACK console.
Create an HPA-enabled application in the ACK console
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the target cluster and click its name or click Details in the Actions column.
In the left-side navigation pane of the cluster details page, choose Workloads > Deployments.
On the Deployments page, click Create from Image.
On the Create page, configure the following sections:
Basic Information: Set the application name and number of replicas.
Container: Select the image and specify the required CPU and memory resources. > Important: Set resource requests for the application. Otherwise, HPA does not take effect.
Advanced:
In the Access Control section, click Create next to Services to configure the Service.
In the Scaling section, set HPA to Enable and configure the scaling parameters: | Parameter | Description | |-----------|-------------| | Metrics | Select CPU Usage or Memory Usage. The metric type must match the resource type specified in Required Resources. If you specify both, HPA scales when either metric exceeds its threshold. | | Condition | The resource usage threshold that triggers scaling. | | Max. Replicas | The maximum number of replicas. Must be greater than the minimum. | | Min. Replicas | The minimum number of replicas. Must be an integer greater than or equal to 1. |
For detailed steps and all configuration parameters, see Create a stateless application from an image.
Create an HPA-enabled application with kubectl
This section uses an NGINX Deployment to demonstrate HPA configuration with kubectl. Create only one HPA per workload.
Step 1: Create a Deployment
Create a file named nginx.yml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9 # Replace with your actual image_name:tag.
ports:
- containerPort: 80
resources:
requests: # Required for HPA to calculate utilization.
cpu: 500mDefine resources.requests for your containers. HPA calculates utilization as currentUsage / requests. Without requests, HPA cannot determine utilization and will not scale pods.
Apply the Deployment:
kubectl apply -f nginx.ymlStep 2: Create an HPA
Create a file named hpa.yml. The HPA uses scaleTargetRef to associate with the nginx Deployment and triggers scaling when average CPU utilization across all pods exceeds 50%.
For Kubernetes 1.24 and later (recommended -- uses autoscaling/v2):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx # Target Deployment name.
minReplicas: 1 # Minimum replica count. Integer >= 1.
maxReplicas: 10 # Maximum replica count. Must exceed minReplicas.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Target average CPU utilization (percentage of requests).For Kubernetes versions earlier than 1.24 (legacy)
Use autoscaling/v2beta2 instead:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50autoscaling/v2beta2is deprecated in Kubernetes 1.23 and removed in 1.26. Upgrade toautoscaling/v2when possible.
Apply the HPA:
kubectl apply -f hpa.yml(Optional) Use multiple metrics
To scale based on both CPU and memory, specify both resource types under the metrics field in a single HPA. Do not create separate HPAs for each metric. HPA scales when *any* metric exceeds its threshold.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50Verify HPA status
After applying the HPA, initial metric collection takes a few moments. During this period, kubectl describe hpa may show warnings like the following:
Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7
Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5These warnings indicate that HPA is still initializing and metrics have not yet been collected.
Check HPA status:
kubectl get hpaCheck scaling events:
kubectl describe hpa nginx-hpaWhen HPA operates correctly, the Events section shows output similar to:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below targetClean up
To remove the resources created in this tutorial:
kubectl delete hpa nginx-hpa
kubectl delete deployment nginx