All Products
Search
Document Center

Container Service for Kubernetes:Configure node auto scaling for GPU applications

Last Updated:Apr 02, 2026

Creating a GPU node pool with auto scaling enabled lets you dynamically add or remove nodes based on actual resource needs. This on-demand, elastic scheduling improves GPU resource utilization and reduces operational costs.

Prerequisites

Step 1: Create an auto-scaling GPU node pool

To ensure independent scheduling and resource isolation for your GPU workloads, create a dedicated GPU node pool and enable auto scaling. This lets the system dynamically adjust compute resources based on workload changes.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Nodes > Node Pools.

  3. Click Create Node Pool and configure the node pool settings as prompted.

    The following are key parameters. For more information about all parameters, see Create and manage node pools.

    • Scaling Mode: Select Auto, and then configure the Instances. If the cluster has insufficient resources for pods, ACK automatically scales the number of nodes within the range you specify.

    • Instance configuration:

      Select Specify Instance Type.

      • Architecture: Select GPU-accelerated instance.

      • Instance Type: Select a suitable GPU instance family based on your business requirements, such as ecs.gn7i-c8g1.2xlarge (NVIDIA A10). To improve the success rate of scale-outs, we recommend selecting multiple instance types.

        image

    • Taints: To prevent non-target applications from being scheduled to GPU nodes, we recommend that you add a taint to the node pool. For example:

      • Key: scaler

      • Value: gpu

      • Effect: NoSchedule

      image

    • Node Labels: Add a unique label to the node pool, such as gpu-spec: NVIDIA-A10. This label ensures that GPU applications are scheduled only to the specified node pool.

      image

Step 2: Configure GPU resources and node affinity

To schedule your application to the GPU node pool, you must modify its Deployment configuration to request GPU resources and set node affinity.

  • Configure GPU resource requests.

    In the container's resources field, declare the number of GPUs to use.

    # ...
    spec:
      containers:
      - name: gpu-auto-scaler
        # ...
        resources:
          limits:
            nvidia.com/gpu: 1 # Request 1 GPU
    # ...
    
  • Configure node affinity.

    Use nodeAffinity to schedule pods to GPU nodes with the specified label.

    # ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: gpu-spec       # Match the label key on the node pool
                  operator: In
                  values:
                  - NVIDIA-A10    # Match the label value on the node pool
    # ...
    
  • Configure tolerations.

    Add a toleration that matches the node pool's taint to ensure that pods can be scheduled to the tainted GPU nodes.

    # ...
    spec:
       tolerations:
        - key: "scaler"          # Match the taint key on the node pool
          operator: "Equal"
          value: "gpu"           # Match the taint value on the node pool
          effect: "NoSchedule"   # Match the taint effect on the node pool
    # ...
    

Step 3: Deploy the application and verify node scaling

This section uses a Deployment to demonstrate dynamic node scaling.

  1. Create a file named gpu-deployment.yaml.

    This Deployment runs two pod replicas. The pods must be scheduled to GPU nodes that have the gpu-spec=NVIDIA-A10 label, and each pod uses one GPU.

    YAML example

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: gpu-auto-scaler
      namespace: default
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: gpu-auto-scaler
      template:
        metadata:
          labels:
            app: gpu-auto-scaler
        spec:
          containers:
            - name: gpu-auto-scaler
              image: registry.cn-hangzhou.aliyuncs.com/ack/ubuntu:22.04
              command: ["bash", "-c"]
              args: ["while [ 1 ]; do date; nvidia-smi -L; sleep 60; done"]
              resources:
                limits:
                  # Request 1 GPU
                  nvidia.com/gpu: 1 
          # Configure node affinity to ensure pods are scheduled to GPU nodes with the specified label.
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                    - key: gpu-spec
                      operator: In
                      values:
                      - NVIDIA-A10
          # Declare a toleration to ensure pods can be scheduled to the tainted GPU nodes.
          tolerations:
          - key: "scaler"
            operator: "Equal"
            value: "gpu"
            effect: "NoSchedule"
  2. Deploy the application and verify the initial scale-out.

    1. Deploy the application.

      kubectl apply -f gpu-deployment.yaml

      Because the cluster does not have any GPU nodes that meet the requirements, the pods enter the Pending state and trigger a scale-out of the node pool. It typically takes several minutes for the new GPU nodes to become ready.

    2. Check the pod events to confirm that the scale-out was triggered.

      kubectl describe pod <your-pod-name>

      Expected output:

      Events:
        Type     Reason            Age                    From                Message
        ----     ------            ----                   ----                -------
        Normal   TriggeredScaleUp  8m32s                  cluster-autoscaler  pod triggered scale-up: [{asg-uf646aomci1pkqya54y7 0->2 (max: 10)}]
        Normal   Scheduled         6m8s                   default-scheduler   Successfully assigned default/gpu-auto-scaler-565994fcf9-6nmz2 to cn-shanghai.10.XX.XX.244
        Normal   AllocIPSucceed    6m4s                   terway-daemon       Alloc IP 10.XX.XX.245/16 took 4.505870999s
        Normal   Pulling           6m4s                   kubelet             Pulling image "registry.cn-hangzhou.aliyuncs.com/ack/ubuntu:22.04"
        Normal   Pulled            6m2s                   kubelet             Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/ack/ubuntu:22.04" in 1.687s (1.687s including waiting).
        Normal   Created           6m2s                   kubelet             Created container gpu-auto-scaler
        Normal   Started           6m2s                   kubelet             Started container gpu-auto-scaler
    3. After the pods are running, check the GPU nodes in the cluster that have the corresponding label.

      kubectl get nodes -l gpu-spec=NVIDIA-A10

      The output should show two GPU nodes.

      NAME                       STATUS   ROLES    AGE     VERSION
      cn-shanghai.10.XX.XX.243   Ready    <none>   7m26s   v1.34.1-aliyun.1
      cn-shanghai.10.XX.XX.244   Ready    <none>   7m25s   v1.34.1-aliyun.1
  3. Verify auto scale-out.

    1. Scale up the number of application replicas to three.

      kubectl scale deployment gpu-auto-scaler --replicas=3

      Run kubectl get pod. At this point, two pods are running and one new pod is in the Pending state due to insufficient resources. This triggers another scale-out of the node pool.

    2. After a few minutes, run kubectl get nodes -l gpu-spec=NVIDIA-A10 again.

      The output shows that the number of nodes with the specified label in the cluster has increased to three.

      NAME                       STATUS   ROLES    AGE   VERSION
      cn-shanghai.10.XX.XX.243   Ready    <none>   11m   v1.34.1-aliyun.1
      cn-shanghai.10.XX.XX.244   Ready    <none>   11m   v1.34.1-aliyun.1
      cn-shanghai.10.XX.XX.247   Ready    <none>   45s   v1.34.1-aliyun.1
  4. Verify auto scale-in.

    1. Scale down the number of application replicas to one.

      kubectl scale deployment gpu-auto-scaler --replicas=1

      The two extra pods are terminated, leaving two GPU nodes idle. After a specified idle duration, the node scaling component automatically removes these nodes to save costs. This feature is part of node instant scaling.

    2. After waiting for the scale-in delay, run kubectl get nodes -l gpu-spec=NVIDIA-A10 again.

      The output should show that the number of nodes with the specified label in the cluster has been reduced to one.

      NAME                       STATUS   ROLES    AGE   VERSION
      cn-shanghai.10.XX.XX.243   Ready    <none>   31m   v1.34.1-aliyun.1

Production recommendations

  • Cost optimization: GPU-accelerated instances are costly, and scaled-out nodes are billed on a pay-as-you-go basis. We recommend adding spot instances to the node pool to significantly reduce compute costs. In addition, configure a reasonable value for Max. Instances to prevent unexpected cost overruns during traffic peaks.

  • High availability: To prevent scale-out failures caused by insufficient inventory in a single zone or of a single instance type, we recommend that you select vSwitches in multiple zones and specify multiple GPU instance types when you create the node pool.

  • Monitoring and alerting: Enable GPU monitoring to track GPU-related metrics. This helps you monitor GPU usage, health, and workload performance, which enables rapid issue diagnosis and resource optimization.

FAQ

Pod pending without node pool scale-out

Possible reasons include:

  • Affinity configuration error: Check that the labels in the application's nodeAffinity configuration exactly match the node pool's labels.

  • Resource request mismatch: Confirm that the number of GPUs requested by the application (nvidia.com/gpu) does not exceed the capacity of a single node in the pool.

  • Node scaling component error: Check the component logs for error messages.

    To collect component logs:

  • Node pool configuration limit: Check whether the maximum number of nodes set for the node pool has reached the quota limit.

Use different GPU types in a cluster

You can create multiple node pools. Configure each node pool with a different GPU instance type and a unique node label, such as gpu-spec: NVIDIA-A10 and gpu-spec: NVIDIA-L20. When you deploy applications, use nodeAffinity to specify the corresponding labels to schedule different applications to different types of GPU nodes.

View GPU devices on a node

After a node pool is created, you can view the GPU devices attached to a node.