Configure the spread GPU selection policy for GPU sharing - Container Service for Kubernetes

By default, the scheduler fills one GPU completely before moving workloads to the next GPU on the same node. This prevents GPU memory fragmentation but concentrates risk: if that GPU fails, all pods sharing it are affected simultaneously. ACK's GPU sharing feature lets you choose between two GPU selection policies—binpack and spread—to match your fault-tolerance requirements.

Prerequisites

Before you begin, ensure that you have:

GPU selection policies

If a node with GPU sharing enabled has multiple GPUs, you can apply one of the following policies:

Policy	Behavior	When to use
Binpack (default)	Fills one GPU completely before allocating to the next GPU.	Maximizes GPU utilization; acceptable when GPU-level fault isolation is not required.
Spread	Distributes pods across all available GPUs on the node.	Limits the blast radius of a single GPU failure; required when fault isolation across GPUs matters.

The spread policy only takes effect when a node has more than one GPU. Select an instance type with multiple GPU cards when creating the node pool.

Example: A node has two GPUs, each with 15 GiB of GPU memory. Pod1 requests 2 GiB and Pod2 requests 3 GiB.

Configure the spread policy

By default, nodes use the binpack policy. To switch a node pool to the spread policy, complete the following steps:

Create a node pool and apply the required node labels.
Submit a GPU sharing job with a node selector.
Verify that pods are distributed across GPUs.

Step 1: Create a node pool

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose Nodes > Node Pools.
In the upper-right corner of the Node Pools page, click Create Node Pool.

In the Create Node Pool dialog box, configure the following key parameters, and then click Confirm Order. For all other parameters, see Create and manage a node pool.

Parameter	Description
Instance type	Set Architecture to GPU-accelerated and select an instance type with multiple GPUs. The spread policy only takes effect on nodes with more than one GPU.
Expected nodes	Specify the initial number of nodes in the node pool. Enter `0` if you do not want to create nodes immediately.
Node label	Add the following two labels:

Key	Value	Purpose
`ack.node.gpu.schedule`	`cgpu`	Enables GPU sharing and GPU memory isolation.
`ack.node.gpu.placement`	`spread`	Enables the spread policy.

Step 2: Submit a job

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose Workloads > Jobs.

In the upper-right corner of the page, click Create from YAML. Paste the following YAML into the Template editor, update the placeholder values based on the inline comments, and then click Create.

Click to view YAML content

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist-spread
spec:
  parallelism: 3
  template:
    metadata:
      labels:
        app: tensorflow-mnist-spread
    spec:
      nodeSelector:
        kubernetes.io/hostname: <NODE_NAME> # Replace with the name of a GPU-accelerated node, for example, cn-shanghai.192.0.2.109.
      containers:
      - name: tensorflow-mnist-spread
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            aliyun.com/gpu-mem: 4  # Each pod requests 4 GiB of GPU memory.
        workingDir: /root
      restartPolicy: Never

The YAML defines a TensorFlow MNIST job with the following behavior:

Creates 3 pods in parallel (parallelism: 3), each requesting 4 GiB of GPU memory (aliyun.com/gpu-mem: 4).
Pins all pods to a specific node using kubernetes.io/hostname: <NODE_NAME> so the GPU selection policy applies to that node.

Step 3: Verify the spread policy

Run the following command to query GPU allocation on the node:

kubectl inspect cgpu

Expected output:

NAME                       IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109    192.0.2.109    4/15                   4/15                   0/15                   4/15                   12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)

Each GPU<N>(Allocated/Total) column shows how much memory is allocated on that GPU. In this example, GPU0, GPU1, and GPU3 each have 4 GiB allocated (one pod per GPU), while GPU2 has none. The pods are spread across multiple GPUs rather than stacked on one, which confirms the spread policy is in effect.