All Products
Search
Document Center

Container Service for Kubernetes:Enable serverless use of cloud CPU/GPU resources for K8s clusters

Last Updated:Mar 26, 2026

When your on-premises Kubernetes cluster can't absorb traffic spikes or batch workloads without expensive fixed cloud capacity, you can burst pods directly to Alibaba Cloud Elastic Container Instance (ECI) in a serverless manner. No cloud node pool to provision or maintain—pods start on ECI, run, and are billed only for their actual runtime.

How it works

Installing the ack-virtual-node add-on on your ACK One registered cluster creates a virtual node backed by ECI. Pods scheduled to this virtual node run as ECI instances: they start quickly, their lifecycle matches that of the pod, and you pay only while they're running.

The ack-co-scheduler add-on introduces a ResourcePolicy custom resource (CR) that enables multilevel scheduling. The scheduler prioritizes your on-premises nodes. When on-premises capacity is exhausted, it automatically overflows to ECI—without any manual intervention.

Architecture diagram showing an on-premises Kubernetes cluster extended with serverless ECI via virtual-kubelet

Use cases

Scenario Description
Online services with fluctuating traffic Burst to ECI during peak hours in e-commerce or online education, then release resources immediately after.
Data processing jobs Run Spark, Presto, or Argo Workflows on serverless ECI. The pay-as-you-go model—billed per pod runtime—keeps compute costs low.
CI/CD pipelines Scale Jenkins or GitLab Runner jobs to ECI without maintaining a dedicated node pool.
Batch and AI training jobs Submit scheduled tasks and AI training jobs to ECI for elastic, cost-effective execution.

Prerequisites

Before you begin, ensure that you have:

Step 1: Install add-ons

Install the following add-ons on your registered cluster:

Add-on Purpose
ack-virtual-node Creates a virtual node that connects your cluster to ECI for serverless pod execution.
ack-co-scheduler Enables the ResourcePolicy CR and multilevel resource scheduling.

Choose one of the following installation methods.

Use onectl (recommended)

  1. Install onectl on your on-premises machine. For more information, see Use onectl to manage registered clusters.

  2. Run the following commands to install both add-ons.

    onectl addon install ack-virtual-node
    onectl addon install ack-co-scheduler

    Expected output:

    Addon ack-virtual-node, version **** installed.
    Addon ack-co-scheduler, version **** installed.

Use the console

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your target cluster. In the left-side navigation pane, click Add-ons.

  3. On the Add-ons page, click the Others tab. Find the ack-virtual-node and ack-co-scheduler add-ons and click Install in the lower-right corner of each card.

  4. In the dialog box that appears, click OK.

Step 2: Verify the virtual node

After installing ack-virtual-node, run the following command to confirm that the virtual node appears in your cluster.

kubectl get node

Expected output:

NAME                               STATUS   ROLES    AGE    VERSION
iz8vb1xtnuu0ne6b58hvx0z            Ready    master   4d3h   v1.20.9   # An on-premises node. In this example, the master node also runs application workloads.
virtual-kubelet-cn-zhangjiakou-a   Ready    agent    99s    v1.20.9   # The virtual node created by ack-virtual-node, backed by ECI.

The virtual-kubelet node connects your cluster to ECI. Pods scheduled to this node run as ECI instances.

Step 3: Run pods on ECI

Use one of the following methods to schedule pods to ECI.

Method 1: Label individual pods

Add the alibabacloud.com/eci: "true" label to a pod to schedule it on ECI. This method gives you fine-grained control over which pods run on ECI.

The following example runs a CUDA task on a GPU-accelerated ECI instance. No NVIDIA driver or runtime installation is required—the experience is fully serverless.

  1. Apply the following manifest to create the pod.

    apiVersion: v1
    kind: Pod
    metadata:
      name: gpu-pod
      labels:
        alibabacloud.com/eci: "true"  # Schedule this pod on ECI.
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn5-c4g1.xlarge  # GPU instance type with one NVIDIA P100.
    spec:
      restartPolicy: Never
      containers:
        - name: cuda-container
          image: acr-multiple-clusters-registry.cn-hangzhou.cr.aliyuncs.com/ack-multiple-clusters/cuda10.2-vectoradd
          resources:
            limits:
              nvidia.com/gpu: 1
    kubectl apply -f <your-manifest-file>.yaml
  2. Check the pod status.

    kubectl get pod -o wide

    Expected output:

    NAME       READY   STATUS      RESTARTS   AGE     IP              NODE
    gpu-pod    0/1     Completed   0          5m30s   172.16.XX.XX   virtual-kubelet-cn-zhangjiakou-a
  3. Check the pod logs to confirm the CUDA task ran successfully.

    kubectl logs gpu-pod

    Expected output:

    Using CUDA Device [0]: Tesla P100-PCIE-16GB
    GPU Device has SM 6.0 compute capability
    [Vector addition of 50000 elements]
    Copy input data from the host memory to the CUDA device
    CUDA kernel launch with 196 blocks of 256 threads
    Copy output data from the CUDA device to the host memory
    Test PASSED
    Done

The pod ran on the virtual-kubelet virtual node, backed by a serverless GPU ECI instance.

Method 2: Label a namespace

Apply the alibabacloud.com/eci=true label to a namespace to schedule all new pods in that namespace on ECI automatically.

kubectl label namespace <namespace-name> alibabacloud.com/eci=true

Step 4: Use multilevel resource scheduling

The multilevel resource scheduling feature of ACK One registered clusters prioritizes your on-premises nodes and automatically overflows to ECI when on-premises capacity is exhausted.

The ack-co-scheduler add-on implements this through a ResourcePolicy CR. ResourcePolicy is a namespaced resource.

`ResourcePolicy` parameters

Parameter Description
selector Selects pods in the same namespace with matching labels.
strategy The scheduling strategy. Only prefer is supported.
units Ordered list of scheduling targets. The scheduler selects resources top-to-bottom when scaling out, and releases them bottom-to-top when scaling in.
units[].resource The resource type. Supported values: idc (on-premises nodes), ecs (ECS nodes), eci (serverless ECI).
units[].nodeSelector Selects nodes by labels. Applies to ecs resources only.
units[].max Maximum number of replicas that can run on this resource group.

The following example demonstrates multilevel resource scheduling. The setup uses a 4-replica Nginx deployment on a data center cluster with a single 6-CPU node (which can run a maximum of 2 Nginx pods after system resource reservations).

  1. Create a ResourcePolicy that prioritizes on-premises nodes and falls back to ECI.

    apiVersion: scheduling.alibabacloud.com/v1alpha1
    kind: ResourcePolicy
    metadata:
      name: cost-balance-policy
    spec:
      selector:
        app: nginx       # Selects pods with this label.
        key1: value1
      strategy: prefer
      units:
      - resource: idc    # Use on-premises nodes first.
        max: 3
      - resource: eci    # Overflow to ECI when on-premises capacity is exhausted.
        nodeSelector:
          key2: value2
    kubectl apply -f <your-policy-file>.yaml
  2. Create a Deployment with 2 replicas, each requesting 2 CPUs.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          annotations:
            addannotion: "true"
          labels:
            app: nginx    # Must match the selector in the ResourcePolicy above.
        spec:
          schedulerName: ack-co-scheduler
          containers:
          - name: nginx
            image: acr-multiple-clusters-registry.cn-hangzhou.cr.aliyuncs.com/ack-multiple-clusters/nginx
            resources:
              requests:
                cpu: 2
              limits:
                cpu: 2
    kubectl apply -f <your-deployment-file>.yaml
  3. Scale the Deployment to 4 replicas.

    kubectl scale deployment nginx --replicas 4

    The on-premises node can run at most 2 Nginx pods. The remaining 2 replicas automatically overflow to ECI.

  4. Verify pod placement.

    kubectl get pod -o wide

    Expected output:

    NAME                     READY   STATUS    RESTARTS   AGE     IP              NODE
    nginx-79cd98b4b5-97s47   1/1     Running   0          84s     10.100.XX.XX    iz8vb1xtnuu0ne6b58h****
    nginx-79cd98b4b5-gxd8z   1/1     Running   0          84s     10.100.XX.XX    iz8vb1xtnuu0ne6b58h****
    nginx-79cd98b4b5-k55rb   1/1     Running   0          58s     10.100.XX.XX    virtual-kubelet-cn-zhangjiakou-a
    nginx-79cd98b4b5-m9jxm   1/1     Running   0          58s     10.100.XX.XX    virtual-kubelet-cn-zhangjiakou-a

    Two pods are running on the on-premises node. The other two are running on the virtual node, backed by serverless ECI.