All Products
Search
Document Center

Container Service for Kubernetes:Elastic scaling based on the Ray autoscaler and ACK autoscaler

Last Updated:Mar 26, 2026

Elastic scaling combines the Ray autoscaler and the ACK autoscaler to dynamically adjust both Ray worker pods and the underlying Kubernetes nodes in response to workload demand. This two-layer coordination lets your Ray cluster on ACK grow and shrink automatically, improving resource efficiency and reducing cost without manual intervention.

How it works

Elastic scaling operates across two layers:

Layer Component Responsibility Configuration required
Pod layer Ray autoscaler Monitors logical resource requests declared in @ray.remote decorators and adds or removes Ray worker pods to satisfy them. Runs as a sidecar container inside the head pod. Automatic — included with the Ray cluster
Node layer ACK autoscaler Monitors pods stuck in the Pending state. When newly created worker pods cannot be scheduled due to insufficient node capacity, it provisions new nodes. Must be enabled separately on the ACK cluster

Scale-up sequence: When a job is submitted, the Ray autoscaler reads the logical resource requests (not physical CPU or memory utilization) and creates worker pods to satisfy them. If the ACK cluster lacks capacity to schedule those pods, the pending pods trigger the ACK autoscaler to provision new nodes.

Prerequisites

Before you begin, ensure that you have:

Set up elastic scaling

Step 1: Deploy the Ray cluster

Run the following commands to deploy a Ray cluster using Helm in the ACK cluster:

helm uninstall ${RAY_CLUSTER_NAME} -n ${RAY_CLUSTER_NS}
helm install ${RAY_CLUSTER_NAME} aliyunhub/ack-ray-cluster -n ${RAY_CLUSTER_NS}

Step 2: Verify the cluster is running

Run the following command to check that the head pod is ready:

kubectl get pod -n ${RAY_CLUSTER_NS}

Expected output:

NAME                                           READY   STATUS    RESTARTS   AGE
myfirst-ray-cluster-head-kvvdf                 2/2     Running   0          22m

Log in to the head pod and run ray status to confirm the autoscaler is active and no resource demands are pending:

kubectl -n ${RAY_CLUSTER_NS} exec -it myfirst-ray-cluster-head-kvvdf -- bash
ray status

Expected output:

======== Autoscaler status: 2024-01-25 00:00:19.879963 ========
Node status
---------------------------------------------------------------
Healthy:
 1 head-group
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0B/1.86GiB memory
 0B/452.00MiB object_store_memory

Demands:
 (no resource demands)

Replace myfirst-ray-cluster-head-kvvdf with the actual head pod name from your cluster.

Step 3: Submit a workload to trigger scaling

The following script submits 15 tasks, each requesting 1 vCPU. Because the head pod has --num-cpus set to 0, it does not accept task scheduling. Each worker pod provides 1 vCPU and 1 GB memory by default, so the Ray autoscaler creates 15 worker pods to satisfy the demand. If the ACK cluster lacks sufficient node capacity, the pending worker pods trigger the ACK autoscaler to add nodes.

import time
import ray
import socket

ray.init()

@ray.remote(num_cpus=1)
def get_task_hostname():
    time.sleep(120)
    host = socket.gethostbyname(socket.gethostname())
    return host

object_refs = []
for _ in range(15):
    object_refs.append(get_task_hostname.remote())

ray.wait(object_refs)

for t in object_refs:
    print(ray.get(t))

Step 4: Monitor the scaling process

Watch pod creation as the Ray autoscaler provisions worker pods:

kubectl get pod -n ${RAY_CLUSTER_NS} -w

Expected output:

NAME                                           READY   STATUS    RESTARTS   AGE
myfirst-ray-cluster-head-kvvdf                 2/2     Running   0          47m
myfirst-ray-cluster-worker-workergroup-btgmm   1/1     Running   0          30s
myfirst-ray-cluster-worker-workergroup-c2lmq   0/1     Pending   0          30s
myfirst-ray-cluster-worker-workergroup-gstcc   0/1     Pending   0          30s
myfirst-ray-cluster-worker-workergroup-hfshs   0/1     Pending   0          30s
myfirst-ray-cluster-worker-workergroup-nrfh8   1/1     Running   0          30s
myfirst-ray-cluster-worker-workergroup-pjbdw   0/1     Pending   0          29s
myfirst-ray-cluster-worker-workergroup-qxq7v   0/1     Pending   0          30s
myfirst-ray-cluster-worker-workergroup-sm8mt   1/1     Running   0          30s
myfirst-ray-cluster-worker-workergroup-wr87d   0/1     Pending   0          30s
myfirst-ray-cluster-worker-workergroup-xc4kn   1/1     Running   0          30s
...

Watch node provisioning as the ACK autoscaler adds nodes for the pending pods:

kubectl get node -w

Expected output:

cn-hangzhou.172.16.0.204   Ready      <none>   44h   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   0s    v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   0s    v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   0s    v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   1s    v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   11s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.16    NotReady   <none>   10s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.16    NotReady   <none>   14s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   31s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    NotReady   <none>   60s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.17    Ready      <none>   61s   v1.24.6-aliyun.1
cn-hangzhou.172.16.0.16    Ready      <none>   64s   v1.24.6-aliyun.1
...

As shown in the sample output above, nodes transitioned from NotReady to Ready within approximately 61–64 seconds.

What's next