All Products
Search
Document Center

Container Service for Kubernetes:Configure elastic scaling for a cluster

Last Updated:Mar 26, 2026

ACK Serverless clusters are built on Alibaba Cloud Elastic Container Instance (ECI) and provide elastic scaling capabilities. The cluster can scale out severalfold within minutes, or scale in quickly to reduce costs when demand drops.

This tutorial covers two scaling approaches:

  • Manual scaling: set the exact number of Pods in a Deployment on demand

  • Automatic scaling: configure a Horizontal Pod Autoscaler (HPA) that adjusts Pod count based on CPU utilization

Important

Completing this tutorial costs approximately USD 0.5, assuming the resources run for 0.5 hours. Release the resources promptly after you finish.

Prerequisites

Before you begin, ensure that you have:

Step 1: Install the metrics-server add-on

The metrics-server add-on collects CPU and memory metrics from cluster nodes. HPA uses these metrics to make scaling decisions.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of your cluster. In the left navigation pane, click Add-ons.

  3. Click the Logs and Monitoring tab. Find the metrics-server card and click Install in the lower-right corner. Wait for the installation to complete.

Step 2: Scale a Deployment manually

Manual scaling lets you set an exact Pod count immediately, without waiting for metrics thresholds to trigger.

Console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of your cluster. In the left navigation pane, click Workloads > Deployments.

  3. Click the Deployment named nginx-deploy to open its details page.

  4. In the upper-right corner, click Scale. In the Scale panel, set Desired Pod Count to 10, then click OK. Refresh the page. Nine new Pods appear, bringing the total to 10, which confirms a successful scale-out.

  5. Repeat the previous step and set Desired Pod Count to 1. Refresh the page. The count drops to 1, which confirms a successful scale-in.

kubectl

  1. View the current state of the Deployment.

    kubectl get deploy

    Expected output:

    NAME           READY   UP-TO-DATE   AVAILABLE   AGE
    nginx-deploy   1/1     1            1           9m32s
  2. Scale out to 10 Pods.

    kubectl scale deploy nginx-deploy --replicas=10

    Expected output:

    deployment.extensions/nginx-deploy scaled
  3. Verify the Pods are running.

    kubectl get pod

    Expected output (10 Pods, all Running):

    NAME                            READY   STATUS    RESTARTS   AGE
    nginx-deploy-55d8dcf755-8jlz2   1/1     Running   0          39s
    nginx-deploy-55d8dcf755-9jbzk   1/1     Running   0          39s
    nginx-deploy-55d8dcf755-bqhcz   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-bxk8n   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-cn6x9   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-jsqjn   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-lhp8l   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-r2clb   1/1     Running   0          38s
    nginx-deploy-55d8dcf755-rchhq   1/1     Running   0          10m
    nginx-deploy-55d8dcf755-xspnt   1/1     Running   0          39s
  4. Scale in to 1 Pod.

    kubectl scale deploy nginx-deploy --replicas=1
  5. Verify the scale-in is complete.

    kubectl get pod

    Expected output:

    NAME                            READY   STATUS    RESTARTS   AGE
    nginx-deploy-55d8dcf755-bqhcz   1/1     Running   0          1m

Step 3: Configure automatic scaling with HPA

HPA automatically adjusts the number of Pods based on observed CPU utilization, keeping the average close to your target threshold.

Console

  1. On the nginx-deploy details page, click the Pod Scaling tab.

  2. In the HPA section, click Create.

  3. In the Create panel, enter the following values and click OK.

    Configuration item Sample value
    Name nginx-deploy
    Metric Click Add: Metric = CPU Usage, Threshold = 20%
    Max. Containers 10
    Min. Containers 1

kubectl

Create an HPA that targets 20% average CPU utilization, with a minimum of 1 Pod and a maximum of 10 Pods.

kubectl autoscale deployment nginx-deploy --cpu-percent=20 --min=1 --max=10

Expected output:

horizontalpodautoscaler.autoscaling/nginx-deploy autoscaled

Verify the HPA is created.

kubectl get hpa

Expected output:

NAME           REFERENCE                 TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-deploy   Deployment/nginx-deploy   0%/20%    1         10        1          35s

The TARGETS column shows 0%/20% — current CPU utilization is 0% because no load has hit the server yet. The Pod count is already at the minimum (1), so the HPA will not scale in further.

Step 4 (Optional): Test the automatic scaling policy

Generate artificial CPU load to verify the HPA scales out and then scales back in.

Generate load

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. Click the name of your cluster. In the left navigation pane, click Workloads > Pods.

  3. Find the test Pod and click Terminal in the Actions column. Select the test container to open a command-line interface.

  4. Run the following command to create an infinite CPU load loop. Run this only on a test container to avoid disrupting production traffic.

    while : ; do : ; done
  5. Return to the console. Click More > Monitor in the Actions column to observe the CPU load.

Observe scale-out

Wait a few minutes and refresh the page. Four new Pods are created, bringing the total to 5.

At this point, one Pod runs at 100% CPU utilization while the other four Pods are at approximately 0%, bringing the average across all five Pods to approximately 20%. The scale-out is complete.

Stop load and verify scale-in

  1. Return to the container terminal. Press Ctrl+C to stop the loop.

    If you closed the terminal page, open a terminal for the Pod again, run top to find the process, then run kill -9 <PID> to terminate it.
  2. After 5–10 minutes, refresh the page. The number of Pods is reduced to 1, which confirms the automatic scale-in is complete.

Step 5: Release resources

Delete the application and Service

  1. In the ACK console, go to the Clusters page and click the name of your cluster.

  2. In the left navigation pane, choose Workloads > Deployments. Select the Nginx application, click Batch Delete, and follow the on-screen instructions.

Delete the cluster

ACK Serverless clusters do not incur cluster management fees. However, other Alibaba Cloud resources used by the cluster — such as ECI — are billed separately according to each product's billing rules.

  • To delete the cluster: Go to the ACK console, open the Clusters page, click More > Delete next to the target cluster, select the associated cloud resources to delete, and follow the on-screen instructions. For details, see Delete a cluster.

  • To keep the cluster: Maintain a sufficient account balance to avoid service interruption. For billing details, see Billing for cloud product resources.