Shared GPU scheduling uses NVIDIA MPS (Multi-Process Service) as the underlying GPU isolation module. This enables multiple application pods to share a single GPU while ensuring GPU memory isolation between pods. This topic describes how to enable NVIDIA MPS isolation and integrate it with the shared GPU scheduling component.
Background information
Using MPI (Message Passing Interface) to parallelize CPU cores balances resource allocation across CPU-intensive tasks. This allows multiple compute tasks to run concurrently and accelerates overall computation. However, when CUDA kernels accelerate MPI processes, each MPI process may underutilize the GPU. As a result, individual MPI processes run faster, but overall GPU efficiency remains low. When a single application sends insufficient work to the GPU, GPU resources remain idle. In such cases, use NVIDIA MPS (Multi-Process Service). MPS enables multiple CUDA applications to run on a single NVIDIA GPU. It works well in multi-user environments or when running many small tasks simultaneously. MPS improves GPU utilization and application throughput.
MPS enables different applications to run concurrently on the same GPU device, improving GPU resource utilization across your cluster. MPS uses a client-server architecture and maintains binary compatibility, so you do not need major changes to your existing CUDA applications. MPS consists of three main components.
Control Daemon Process: Starts and stops the MPS Server and manages connections between clients and the MPS Server. This ensures that clients can connect to the MPS service to request and use GPU resources.
Client Runtime: Built into the CUDA driver library. You do not need major code changes to use MPS in your CUDA applications. When an application uses the CUDA driver to access the GPU, the Client Runtime handles communication with the MPS Server. This enables multiple applications to share the GPU safely and efficiently.
Server Process: Receives requests from different clients and uses efficient scheduling to run those requests on a single GPU device. This enables concurrency between clients.
Important notes
In the NVIDIA MPS architecture, MPS Clients—your GPU applications that use MPS—must remain connected to the MPS Control Daemon. If the MPS Control Daemon restarts, these MPS Clients exit with errors.
In this example, the MPS Control Daemon runs as a container. A DaemonSet deploys one MPS Control Daemon pod on each GPU node. Here is what you need to know about the MPS Control Daemon pod.
Do not delete or restart the MPS Control Daemon pod. Deleting it makes GPU applications on that node unavailable. Run
kubectl get po -l app.aliyun.com/name=mps-control-daemon -Ato check the status of MPS Control Daemon pods in your cluster.The container that runs the MPS Control Daemon requires
privileged,hostIPC, andhostPIDpermissions. These permissions carry potential security risks. Assess them carefully before you use this solution.The MPS Control Daemon pod uses
priorityClassName: system-node-criticalto maintain high priority. This prevents the pod from being terminated when node resources run low. Without this, business applications cannot use the GPU. If node resources are low during deployment, the MPS Control Daemon may preempt lower-priority business pods, causing them to be evicted. Before you deploy the component, ensure your nodes have sufficient CPU and memory.
For GPU nodes that are managed in Container Service for Kubernetes (ACK) clusters, you need to pay attention to the following items when you request GPU resources for applications and use GPU resources.
Do not run GPU-heavy applications directly on nodes.
Do not use tools, such as
Docker,Podman, ornerdctl, to create containers and request GPU resources for the containers. For example, do not run thedocker run --gpus allordocker run -e NVIDIA_VISIBLE_DEVICES=allcommand and run GPU-heavy applications.Do not add the
NVIDIA_VISIBLE_DEVICES=allorNVIDIA_VISIBLE_DEVICES=<GPU ID>environment variable to theenvsection in the pod YAML file. Do not use theNVIDIA_VISIBLE_DEVICESenvironment variable to request GPU resources for pods and run GPU-heavy applications.Do not set
NVIDIA_VISIBLE_DEVICES=alland run GPU-heavy applications when you build container images if theNVIDIA_VISIBLE_DEVICESenvironment variable is not specified in the pod YAML file.Do not add
privileged: trueto thesecurityContextsection in the pod YAML file and run GPU-heavy applications.
The following potential risks may exist when you use the preceding methods to request GPU resources for your application:
If you use one of the preceding methods to request GPU resources on a node but do not specify the details in the device resource ledger of the scheduler, the actual GPU resource allocation information may be different from that in the device resource ledger of the scheduler. In this scenario, the scheduler can still schedule certain pods that request the GPU resources to the node. As a result, your applications may compete for resources provided by the same GPU, such as requesting resources from the same GPU, and some applications may fail to start up due to insufficient GPU resources.
Using the preceding methods may also cause other unknown issues, such as the issues reported by the NVIDIA community.
Applicable scope
You have created an ACK managed cluster Pro edition. Its version is 1.20 or later. If your cluster version is older, upgrade your cluster.
Procedure
Step 1: Install the MPS Control Daemon component
Log on to the ACK console. In the left navigation pane, click .
In the Marketplace, enter ack-mps-control in the search box. Click the component in the search results to open its installation page.
In the ack-mps-control installation interface, click Deploy, select the Cluster where you want to deploy components, and then click Next.
On the Create page, select the Chart Version. Click OK to complete the installation.
ImportantUninstalling or upgrading the MPS Control Daemon component ack-mps-control affects GPU applications already running on the node. These applications exit with errors. Perform these actions during off-peak hours.
The upgrade strategy is
OnDelete. The system does not restart pods automatically. After the upgrade, manually delete the old pods in the ack-mps-control DaemonSet to finish the update. For details, see How do I upgrade the MPS Control Daemon component?
Step 2: Install the shared GPU component
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
On the Cloud-native AI Suite page, click Deploy.
On the Deploy Now page for the Cloud-native AI Suite, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).
At the bottom of the Cloud-native AI Suite page, click Deploy Cloud-native AI Suite.
After the component installs successfully, find the installed shared GPU component ack-ai-installer in the component list on the Cloud-native AI Suite page.
Step 3: Enable GPU sharing scheduling and GPU memory isolation
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
On the Node Pools page, click Create Node Pool.
On the Create Node Pool page, configure the node pool settings. Click Confirm.
For details on other settings, see Create and manage node pools.
Setting
Description
Desired number of nodes
Set the initial number of nodes in the node pool.
NoteAfter you create the node pool, you can add GPU nodes to it. When you add GPU nodes, set the instance type architecture to Elastic GPU Service. For details, see Add existing nodes or Create and manage node pools.
Node labels
Click
for Node Label, set Key to ack.node.gpu.scheduleand Value tomps.ImportantYou must label each GPU node with
ack.node.gpu.schedule=mpsfor the MPS Control Daemon Pod to be deployed on the node. If your cluster includes the shared GPU scheduling component, labeling a node withack.node.gpu.schedule=mpsalso enables both shared GPU scheduling and MPS isolation capabilities on that node.-
After you add the shared GPU scheduling label, do not change the node GPU scheduling label using the
kubectl label nodescommand or the label management feature on the Nodes page in the console. This avoids potential issues. For more information, see Enable scheduling. We recommend that you enable scheduling.
Step 4: Deploy a sample application
Create a sample application using the following YAML file.
apiVersion: batch/v1 kind: Job metadata: name: mps-sample spec: parallelism: 1 template: metadata: labels: app: mps-sample spec: hostIPC: true # Required. Otherwise, the pod fails to start. hostPID: true # Optional. Added here only to help you see the effect of MPS. nodeSelector: kubernetes.io/hostname: <NODE_NAME> # Replace <NODE_NAME> with the hostname of a GPU node that has the label ack.node.gpu.schedule=mps. For example: cn-shanghai.192.0.2.109. containers: - name: mps-sample image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5 command: - python - tensorflow-sample-code/tfjob/docker/mnist/main.py - --max_steps=100000 - --data_dir=tensorflow-sample-code/data resources: limits: aliyun.com/gpu-mem: 7 # Request 7 GiB of GPU memory for this pod. workingDir: /root restartPolicy: NeverImportantAfter you enable MPS on a node, GPU application pods on that node must set
hostIPC: true. Otherwise, the pod fails to start.Wait for the pod to reach the Running state. Then run the following command to check whether MPS is active.
kubectl exec -ti mps-sample-xxxxx -- nvidia-smiExpected output:
Tue May 27 05:32:12 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla ****-****-**** On | 00000000:00:09.0 Off | 0 | | N/A 33C P0 55W / 300W | 345MiB / 16384MiB | 0% E. Process | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 14732 C nvidia-cuda-mps-server 30MiB | | 0 N/A N/A 110312 M+C python 312MiB | +---------------------------------------------------------------------------------------+The output shows that the
nvidia-smicommand lists themps-server. Its process ID on the host is 14732. It also shows a Python process with process ID 110312 running under MPS. This confirms that MPS is working.
FAQ
How do I upgrade the MPS Control Daemon component?
Upgrading ack-mps-control v0.2.0 requires ack-ai-installer >= 1.13.1. Upgrade the MPS Control Daemon component in this order.
In the component list on the Cloud-native AI Suite page, upgrade the Helm version of the shared GPU scheduling component ack-ai-installer.
In , select the
kube-systemnamespace. Upgrade the Helm version of the ack-mps-control component.The upgrade strategy is
OnDelete. The system does not restart pods automatically. After the upgrade, manually delete the old pods in the ack-mps-control DaemonSet to finish the update.For each node, perform Drain, mark it as Unschedulable, and delete the ack-mps-control pod.
Set the node to Unschedulable and execute Drain.
Delete the ack-mps-control pod on that node.
Confirm that the new pod runs normally.
Delete and restart the ack-ai-installer pod.
After you upgrade ack-mps-control and confirm that the related pods are updated, manually delete the ack-ai-installer pod. It will rebuild automatically.
Mark the node as schedulable again.
After you confirm that both the ack-mps-control pod and the ack-ai-installer pod run normally on the target node, mark the node as schedulable.