Use GPU sharing on GPU-HPN nodes - Container Compute Service

Alibaba Cloud Container Service (ACS) supports GPU sharing on GPU-HPN nodes. This feature lets you run multiple pods on a single GPU device. In an exclusive GPU scheduling scenario, a pod must request an entire GPU. If a pod does not require the resources of a full GPU, resources are wasted. GPU sharing lets you request fine-grained heterogeneous computing power for your pods. GPU sharing also supports flexible requests and limits constraints for pods. This capability meets the resource isolation and sharing requirements of various application scenarios.

Introduction

Important

This topic applies only to ACS clusters.
GPU sharing provides a more fine-grained resource description. It allows a single pod to request resources in increments smaller than one full GPU, such as 0.5 of a GPU's computing power. It does not support aggregated requests across multiple GPUs, such as requesting 0.5 of the computing power from two different GPUs at the same time.
The GPU sharing module maintains the driver version for pods that use GPU sharing. You cannot specify a driver version for an individual pod.
This feature is in public preview in the Ulanqab and Shanghai Finance Cloud regions. To use this feature in other regions, please submit a ticket.

When you use GPU sharing, pods do not directly access a specific GPU device. Instead, they interact with the device through the GPU sharing module. The GPU sharing module consists of a proxy module and a resource management module. The proxy module is integrated into the pod by default. It intercepts API calls related to the GPU device and forwards them to the backend resource module. The resource module runs the GPU instructions on the actual GPU device and limits GPU resource usage based on the pod's resource description.

The resource module for GPU sharing also consumes some CPU and memory resources, which are automatically reserved when the feature is enabled. For more information, see Node configuration.

Resource configuration and QoS

Shared GPU resources are described using Kubernetes requests/limits constraints. You can configure computing power and GPU memory as percentages. The feature also supports resource descriptions where limits are greater than requests. This may cause multiple pods to compete for GPU resources simultaneously. ACS defines a Quality of Service (QoS) for shared GPU resources. When multiple pods on a node use GPU resources simultaneously, the pods are queued and preemption may be triggered. The following is an example:

...
resources:
  requests:  # Controls the number of pods that can be scheduled on the node.
    alibabacloud.com/gpu-core.percentage: 10         # The percentage of computing power that the pod requires.
    alibabacloud.com/gpu-memory.percentage: 10       # The percentage of GPU memory that the pod requires.
  limits:    # Controls the upper limit of resources that can be used at runtime. For more information about the effects, see the configuration instructions.
    alibabacloud.com/gpu-core.percentage: 100        # The upper limit of computing power usage.
    alibabacloud.com/gpu-memory.percentage: 100      # The upper limit of GPU memory usage. Exceeding this limit causes a CUDA OOM error.
...

Similar to the process management mechanism of an operating system, the GPU sharing module classifies pods into three states: hibernation, ready, and running. The state transition process is shown in the following figure.

When a pod starts, it enters the hibernation state.
When the pod attempts to use GPU resources, it enters the ready state. The GPU sharing module then allocates GPU resources to the pod based on a priority policy.
After the pod is allocated GPU resources, it enters the running state.
If pods are still in the ready state after all resources are allocated, a preemption mechanism is triggered to ensure resource fairness among pods.
When a pod is preempted, the process that occupies the GPU resources is killed, and the pod returns to the hibernation state.

Queuing policy

Pods in the ready state are queued based on the First In, First Out (FIFO) policy. The GPU sharing module allocates resources to the pod that entered the ready state first. If current resources are insufficient, the preemption policy is triggered.

Preemption policy

When resources cannot meet the demands of a pod in the ready state, the GPU sharing module attempts to preempt other pods. First, it filters the running pods based on specific conditions. Then, it scores and sorts the eligible pods and preempts them one by one until the resource demands of the queued pod are met.

If none of the currently running pods meet the filter conditions, the pod in the ready state remains in the queue and waits for resources. The details are as follows.

Policy type	Description
Filter policy	The currently running pod has continuously occupied GPU resources for 2 hours. This is customizable. For more information, see QoS configuration.
Scoring policy	The duration for which a pod has continuously occupied GPU resources. Pods that have occupied resources for a longer time are preempted first.

Resource sharing models

GPU sharing is based on a shared model and allows multiple pods to run on a single GPU card simultaneously. ACS currently supports the following sharing models:

Model name	Effect	GPU shared resource configuration	Queuing policy	Preemption policy	Scenarios
share-pool	Treats all GPUs on a node as a share pool. A pod can use any physical GPU that has idle resources.	`requests` <= `limits`	FIFO	Allows custom configurations.	Notebook development scenarios. Combined with the request/limit configuration in resource QoS, this model supports off-peak GPU resource usage by multiple users. When resources are insufficient, QoS mechanisms such as queuing and preemption are triggered. For more information, see Example: Use the share-pool model for off-peak resource usage in Notebook scenarios.
static	GPU slicing scenario. Assigns a fixed GPU device to a pod, which does not change during runtime. The scheduler prioritizes placing pods on the same GPU to avoid fragmentation.	`requests` == `limits` Warning If request is less than limit for GPU computing power or memory, resource competition occurs between pods. This can even cause pods to be killed due to an out-of-memory (OOM) error.	Not supported	Not supported	Small-scale AI applications where multiple pods share a GPU device to improve resource utilization. Due to the `request`==`limit` constraint, a pod can obtain GPU resources at any time during runtime without queuing.

Example: Use the share-pool model for off-peak resource usage in Notebook scenarios

In Notebook development, applications typically do not occupy resources for long periods. You can use the share-pool model to allow pods to run on different GPU cards during off-peak hours. A pod enters the ready queue to wait for resources only when it requires resources.

The following is a use case for a Notebook scenario:

Pods A and B are configured with `requests=0.5` and `limits=0.5`. Pods C and D are configured with `requests=0.5` and `limits=1`. Based on the `requests` values, these pods can be scheduled to a single GPU-HPN node that has two GPUs.

Time T1:

Pod A and Pod C are occupying resources. Pod B and Pod D are in the ready queue, waiting to be scheduled.
The GPU sharing module attempts to allocate resources to Pod D, which is at the head of the queue. However, GPU 0 has only 0.5 GPU of idle resources. Although Pod D's `request` is 0.5, which the available capacity can satisfy, its `limit` is 1. Running Pod A and Pod D on the same GPU would cause resource competition. Therefore, the GPU sharing module keeps Pod D in the queue.

Time T2 - Phase 1:

Pod C's task is complete, and it enters the hibernation queue.
After GPU 1 becomes idle, its resources are allocated to Pod D.

Time T2 - Phase 2:

Pod B is allocated resources. Because Pod B's `limit` is 0.5, it can run on GPU 0 simultaneously with Pod A without resource competition.

Example: Use GPU sharing

This example demonstrates how to use the GPU sharing feature. The procedure covers enabling the GPU sharing feature (share-pool) on a GPU-HPN node, submitting a pod that uses shared GPU resources, and then disabling the feature on the node.

Step 1: Add a label to the GPU-HPN node

View the GPU-HPN nodes.
Important
Before enabling this feature, you must delete any pods on the node that request exclusive GPU resources. You do not need to delete pods that request only CPU and memory resources.
```
kubectl get node -l alibabacloud.com/node-type=reserved
```
Expected output:
```
NAME                     STATUS   ROLES   AGE   VERSION
cn-wulanchabu-c.cr-xxx   Ready    agent   59d   v1.28.3-aliyun
```
Add the label alibabacloud.com/gpu-share-policy=share-pool to the node cn-wulanchabu-c.cr-xxx to enable the GPU sharing feature.
```
$ kubectl label node cn-wulanchabu-c.cr-xxx alibabacloud.com/gpu-share-policy=share-pool
```

Step 2: Check the enabling status of the node

Wait for the feature to be enabled on the node, and then check the enabling status. You can see the GPU shared resources in the capacity field. In the conditions field, GPUSharePolicyValid is True. This indicates that the feature is enabled.

$ kubectl get node cn-wulanchabu-c.cr-xxx -o yaml

After the GPU sharing policy takes effect, the node status is updated. Expected output:

# The actual output may vary.
apiVersion: v1
kind: Node
spec: 
  # ...
status:
  allocatable:
    # GPU shared resource description
    alibabacloud.com/gpu-core.percentage: "1600"
    alibabacloud.com/gpu-memory.percentage: "1600"
    # After the feature is enabled, CPU, memory, and storage resources are reserved for the GPU sharing module.
    cpu: "144"
    memory: 1640Gi
    nvidia.com/gpu: "16"
    ephemeral-storage: 4608Gi
  capacity:
    # GPU shared resource description
    alibabacloud.com/gpu-core.percentage: "1600"
    alibabacloud.com/gpu-memory.percentage: "1600"
    cpu: "176"
    memory: 1800Gi
    nvidia.com/gpu: "16"
    ephemeral-storage: 6Ti
  conditions:
  # Indicates whether the GPU Share policy configuration is valid.
  - lastHeartbeatTime: "2025-01-07T04:13:04Z"
    lastTransitionTime: "2025-01-07T04:13:04Z"
    message: gpu share policy is valid.
    reason: Valied
    status: "True"
    type: GPUSharePolicyValid
  # Indicates the GPU Share policy that is in effect on the current node.
  - lastHeartbeatTime: "2025-01-07T04:13:04Z"
    lastTransitionTime: "2025-01-07T04:13:04Z"
    message: gpu share policy is share-pool.
    reason: share-pool
    status: "True"
    type: GPUSharePolicy

For more information about the configuration items for GPU shared resources, see Node configuration.

Step 3: Deploy a pod with GPU shared resource specifications

Create a file named gpu-share-demo.yaml. Configure it to use the same share-pool model as the node.

apiVersion: v1
kind: Pod
metadata:
  labels:
    alibabacloud.com/compute-class: "gpu-hpn"
    # Set the GPU sharing model for the pod to share-pool, which is the same as the node configuration.
    alibabacloud.com/gpu-share-policy: "share-pool" # static
  name: gpu-share-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: registry-cn-wulanchabu-vpc.ack.aliyuncs.com/acs/stress:v1.0.4
    args:
      - '1000h'
    command:
      - sleep
    # Specify the GPU shared resources gpu-core.percentage and gpu-memory.percentage in the resource description.
    # For more information about the effects of request and limit, see the configuration instructions.
    resources:
      limits:
        cpu: '5'
        memory: 50Gi
        alibabacloud.com/gpu-core.percentage: 100
        alibabacloud.com/gpu-memory.percentage: 100
      requests:
        cpu: '5'
        memory: 50Gi
        alibabacloud.com/gpu-core.percentage: 10
        alibabacloud.com/gpu-memory.percentage: 10

Deploy the sample pod.
```
kubectl apply -f gpu-share-demo.yaml
```

Step 4: Check the GPU shared resource usage of the pod

Log on to the container to check the GPU shared resource usage of the pod.

kubectl exec -it pod gpu-share-demo -- /bin/bash

Use commands such as `nvidia-smi` to view the GPU resource allocation and usage of the container. The actual output may vary.

Note

For pods of the share-pool type, the BusID field displays `Pending` when the pod is not using GPU resources.

The specific command depends on the GPU card type. For example, nvidia-smi corresponds to NVIDIA series GPU devices. For other card types, submit a ticket for assistance.

(Optional) Step 5: Disable the GPU sharing policy on the node

Important

Before disabling the policy, you must delete any pods on the node that request GPU shared resources. You do not need to delete pods that request only CPU and memory resources.

Delete the pod that uses the GPU sharing feature.
```
$ kubectl delete pod gpu-share-demo
```

Disable the GPU sharing feature on the node.

$ kubectl label node cn-wulanchabu-c.cr-xxx alibabacloud.com/gpu-share-policy=none

Check the policy configuration status of the node again.

$ kubectl get node cn-wulanchabu-c.cr-xxx -o yaml

Expected output:

apiVersion: v1
kind: Node
spec: 
  # ...
status:
  allocatable:
    # After the feature is disabled, the reserved CPU and memory resources are restored to their initial values.
    cpu: "176"
    memory: 1800Gi
    nvidia.com/gpu: "16"
    ephemeral-storage: 4608Gi
  capacity:
    cpu: "176"
    memory: 1800Gi
    nvidia.com/gpu: "16"
    ephemeral-storage: 6Ti
  conditions:
  # Indicates whether the GPU Share policy configuration is valid.
  - lastHeartbeatTime: "2025-01-07T04:13:04Z"
    lastTransitionTime: "2025-01-07T04:13:04Z"
    message: gpu share policy config is valid.
    reason: Valid
    status: "True"
    type: GPUSharePolicyValid
  # Indicates the GPU Share policy that is in effect on the current node.
  - lastHeartbeatTime: "2025-01-07T04:13:04Z"
    lastTransitionTime: "2025-01-07T04:13:04Z"
    message: gpu share policy is none.
    reason: none
    status: "False"
    type: GPUSharePolicy

Detailed configuration instructions

Node configuration

Enablement configuration

To enable GPU sharing, you can configure a label on the node. The details are as follows.

Configuration item	Description	Valid values	Example
alibabacloud.com/gpu-share-policy	The GPU resource sharing policy.	none: Disables the GPU sharing feature on the node. share-pool: Treats all GPUs on the node as a share pool. A pod is not fixed to a specific GPU device and can use any physical GPU that has idle resources. static: GPU slicing scenario. A pod runs on a fixed GPU device.	`apiVersion: v1 kind: Node metadata: labels: # Enable the feature and use the share-pool policy. You can specify other policies in the value. alibabacloud.com/gpu-share-policy: share-pool`

Important

If pods that use exclusive GPUs already exist on the node, you must delete them before you enable the sharing policy.
If pods that use GPU shared resources already exist on the node, you cannot modify or disable the GPU sharing policy. You must delete these pods first.
You do not need to delete pods that request only CPU and memory resources.

QoS configuration

On GPU-HPN nodes, you can configure the Quality of Service (QoS) parameters for GPU sharing in the node annotations. Use the following format.

apiVersion: v1
kind: Node
...
metadata:
  annotations:
    alibabacloud.com/gpu-share-qos-config: '{"preemptEnabled": true, "podMaxDurationMinutes": 120, "reservedEphemeralStorage": "1.5Ti"}'
...

The following describes the details:

Parameter	Type	Valid values	Description
preemptEnabled	Boolean	true false	Applies only to the share-pool model. Specifies whether to enable preemption. The default value is true, which enables preemption.
podMaxDurationMinutes	Int	An integer greater than 0. Unit: minutes.	Applies only to the share-pool model. A pod can be preempted only if it has occupied a GPU for longer than this time. The default value is 120, which is 2 hours.
reservedEphemeralStorage	resource.Quantity	Greater than or equal to 0. The unit is in Kubernetes string format, such as 500Gi.	The reserved capacity for the node's local temporary storage. The default value is 1.5 TiB.

View shared resources on a node

After the feature is enabled, the corresponding GPU shared resource names are added to the `allocatable` and `capacity` fields of the node. The basic resource overhead is deducted from the `allocatable` field. The resource names are described as follows.

Configuration item	Description	Calculation method
alibabacloud.com/gpu-core.percentage	The computing power of the GPU shared resource, in percentage format. This field is added when the feature is enabled and deleted when the feature is disabled.	Number of devices × 100. For example, for a machine with 16 GPUs, the value is 1600.
alibabacloud.com/gpu-memory.percentage	The GPU memory of the GPU shared resource, in percentage format. This field is added when the feature is enabled and deleted when the feature is disabled.	Number of devices × 100. For example, for a machine with 16 GPUs, the value is 1600.
cpu	After the feature is enabled, the basic overhead is deducted from `.status.allocatable` for the GPU sharing module. The reservation is automatically canceled when the feature is disabled.	Number of devices × 2. For example, for a machine with 16 GPUs, 32 cores are reserved.
memory		Number of devices × 10 GB. For example, for a machine with 16 GPUs, 160 GB is reserved.
ephemeral-storage		1.5 TB of disk space per node.

Correctness tips

Field	Value	Description
type	GPUSharePolicyValid	Indicates whether the current GPU Share configuration is valid.
status	"True", "False"	True: The current GPU Share configuration is valid. False: The current GPU Share configuration is invalid. You can view the reason in `condition.reason`.
reason	Valid, InvalidParameters, InvalidExistingPods, ResourceNotEnough	Valid: The current sharing policy configuration is valid. InvalidParameters: The current sharing policy configuration has a syntax error. InvalidExistingPods: Other types of GPU pods exist on the current node. The feature cannot be enabled or disabled. ResourceNotEnough: The current node has insufficient resources to meet the basic overhead of the GPU sharing feature. You must delete some pods before you can enable the feature. For more information about reserved resources, see Node configuration.
message	-	A user-friendly message.
lastTransitionTime lastHeartbeatTime	UTC	The time when the `condition` was last updated.

Current effective GPU sharing policy

Field	Value	Description
type	GPUSharePolicy	Indicates whether the current GPU Share configuration is valid.
status	"True", "False"	True: The GPU sharing feature is enabled on the current node. False: The GPU sharing feature is not enabled on the current node.
reason	none, share-pool, static	none: The GPU sharing feature is not enabled on the current node. share-pool: The share-pool sharing policy is in effect on the current node. static: The static sharing policy is in effect on the current node.
message	-	A user-friendly message.
lastTransitionTime lastHeartbeatTime	UTC	The time when the `condition` was last updated.

Important

If the node resources do not change as described above after you enable or disable the feature, the configuration modification has failed. You can check the validity condition message in the conditions field.

Pod configuration

After the feature is enabled, you can use it by configuring the GPU shared resource label in the pod.

apiVersion: v1
kind: Pod
metadata:
  labels:
    # Only the gpu-hpn compute class is supported.
    alibabacloud.com/compute-class: "gpu-hpn"
    # Set the GPU sharing model for the pod to share-pool, which is the same as the node configuration.
    alibabacloud.com/gpu-share-policy: "share-pool"
  name: gpu-share-demo
  namespace: default
spec:
  containers:
  - name: demo
    image: registry-cn-wulanchabu-vpc.ack.aliyuncs.com/acs/stress:v1.0.4
    args:
      - '1000h'
    command:
      - sleep
    resources:
      limits:
        cpu: '5'
        memory: 50Gi
        alibabacloud.com/gpu-core.percentage: 100
        alibabacloud.com/gpu-memory.percentage: 100
      requests:
        cpu: '5'
        memory: 50Gi
        alibabacloud.com/gpu-core.percentage: 10
        alibabacloud.com/gpu-memory.percentage: 10

The configuration items are described as follows:

Compute class

Configuration item	Value	Description
metadata.labels.alibabacloud.com/compute-class	gpu-hpn	Only the gpu-hpn compute class is supported.

GPU sharing policy

Configuration item	Type	Valid values	Description
metadata.labels.alibabacloud.com/gpu-share-policy	String	none share-pool static	Specifies the GPU sharing model for the pod. Only nodes that match this model are considered for scheduling.

Resource requirements

Configure GPU shared resources in the container's resource requests to describe the computing power and GPU memory requirements and limits. These settings control the number of pods that can be scheduled on a node. The number of pods on a node is also limited by other resource dimensions, such as CPU, memory, and the maximum number of pods.

Requirement category	Configuration item	Type	Valid values	Description
requests	alibabacloud.com/gpu-core.percentage	Int	share-pool policy: [10, 100] static policy: [10, 100)	The computing power percentage. This indicates the requested proportion of a single GPU's computing power. The minimum is 10%.
requests	alibabacloud.com/gpu-memory.percentage			The GPU memory percentage. This indicates the requested proportion of a single GPU's memory. The minimum is 10%.
limits	alibabacloud.com/gpu-core.percentage			The computing power percentage. This indicates the requested limit on the proportion of a single GPU's computing power. The minimum is 10%.
limits	alibabacloud.com/gpu-memory.percentage			The GPU memory percentage. This indicates the requested limit on the proportion of a single GPU's memory. The minimum is 10%.

Configuration constraints

In addition to the constraints on individual configuration items, the following constraints apply when a pod requests resources.

You must specify both GPU memory and computing power (alibabacloud.com/gpu-core.percentage and alibabacloud.com/gpu-memory.percentage) in both requests and limits.
A pod can have at most one container that uses GPU shared resources. This is typically the main container. Other containers, such as sidecar containers, can request only non-GPU resources such as CPU and memory.
A container cannot request both exclusive GPU resources (such as nvidia.com/gpu) and GPU shared resources (alibabacloud.com/gpu-core.percentage, alibabacloud.com/gpu-memory.percentage).

FAQ

What happens to a pod in the ready queue if no GPU resources are available?

When a GPU sharing pod is waiting for resources, it periodically prints a message. The following is a sample message.

You have been waiting for ${1} seconds. Approximate position: ${2}

The ${1} parameter indicates the waiting time, and the ${2} parameter indicates the current position in the ready queue.

What are the pod monitoring metrics specific to the GPU sharing mode?

For pods that use GPU shared resources, you can use the following metrics to view their resource usage.

Metric

Description

Example

DCGM_FI_POOLING_STATUS

Provided only in share-pool mode. Indicates the pod's status in the GPU sharing mode, including hibernation, ready, and running. The details are as follows:

0 "Hibernation": The pod has no demand for GPU resources.
1 "Ready": The pod is waiting for GPU resources.
2 "Normal": The pod is using GPU resources, and the continuous occupation time has not exceeded `podMaxDurationMinutes`.
3 "Preemptible": The pod is using GPU resources, and the continuous occupation time has exceeded `podMaxDurationMinutes`. However, because no other pods are in the queue, it can continue to occupy the resources.

DCGM_FI_POOLING_STATUS{NodeName="cn-wulanchabu-c.cr-xxx",pod="gpu-share-demo",namespace="default"} 1

DCGM_FI_POOLING_POSITION

Provided only in share-pool mode. Indicates that the pod is waiting for resources in the ready queue. The value indicates the pod's position in the ready queue, starting from 1.

This metric appears only when POOLING_STATUS=1.

DCGM_FI_POOLING_POSITION{NodeName="cn-wulanchabu-c.cr-xxx",pod="gpu-share-demo",namespace="default"} 1

How are GPU utilization metrics different when a pod uses GPU sharing?

The GPU utilization metrics for a pod are the same as before. However, for pods that use GPU sharing, the labels and meanings of the metrics are different.

In the pod monitoring data provided by ACS, metrics such as GPU computing power utilization and GPU memory usage are absolute values based on the entire GPU card, which is the same as in the exclusive GPU scenario.
The GPU memory usage seen within a pod using commands such as `nvidia-smi` is an absolute value, which is the same as in the exclusive GPU scenario. However, the computing power utilization is a relative value, where the denominator is the pod's limit.
The device information, such as the ID number in the pod's GPU utilization metrics, corresponds to the actual ID on the node. The numbering does not always start from 0.
For the share-pool sharing model, the device number in the metrics may change because the pod elastically uses different GPU devices from the pool.

If GPU sharing is enabled on only some nodes in a cluster, how can I avoid scheduling conflicts with exclusive GPU pods?

The default scheduler in an ACS cluster automatically matches pod and node types to avoid scheduling conflicts.

If you use a custom scheduler, an exclusive GPU pod might be scheduled to a GPU sharing node because the node's capacity includes both GPU device resources and GPU shared resources. You can choose one of the following solutions:

Solution 1: Write a scheduler plugin that automatically detects the configuration labels and condition protocol of ACS nodes to filter out nodes of a mismatched type. For more information, see Scheduling Framework.
Solution 2: Use Kubernetes labels or taints. Add a label or taint to the nodes where GPU sharing is enabled. Then, configure different affinity policies for exclusive GPU pods and shared GPU pods.

What information is available when a GPU sharing pod is preempted?

For the share-pool sharing model, when preemption is triggered, the pod has an Event and a Condition. An Event is in an unstructured data format. To read structured data, you can retrieve it from the `reason` and `status` fields of the corresponding Condition. The details are as follows.

# Indicates that the GPU resources of the current pod were preempted. The name of the preempting pod is <new-pod-name>.
Warning  GPUSharePreempted  5m15s  gpushare   GPU is preempted by <new-pod-name>.
# Indicates that the current pod preempted the GPU resources of another pod. The name of the preempted pod is <old-pod-name>.
Warning  GPUSharePreempt    3m47s  gpushare   GPU is preempted from <old-pod-name>.

- type: Interruption.GPUShareReclaim # The event type for a GPU sharing pod preemption.
  status: "True" # True indicates that a preemption or preemption-by action occurred.
  reason: GPUSharePreempt # GPUSharePreempt indicates that this pod preempted another pod. GPUSharePreempted indicates that this pod was preempted by another pod.
  message: GPU is preempted from <old-pod-name>. # A user-friendly message similar to the event.
  lastTransitionTime: "2025-04-22T08:12:09Z" # The time when the preemption occurred.
  lastProbeTime: "2025-04-22T08:12:09Z"

How can I run more pods on a node in a Notebook scenario?

For pods with GPU sharing enabled, ACS also lets you configure CPU and memory specifications where the `request` is less than the `limit`. This helps to fully utilize node resources. Note that when the total `limit` of resources for pods submitted to a node exceeds the node's allocatable resources, the pods will compete for CPU and memory resources. You can analyze the resource competition for CPU and memory by reviewing the node's resource utilization data. For more information, see ACS GPU-HPN node-level monitoring metrics. For a pod, CPU resource competition is reflected in the pod's CPU Steal Time. Memory resource competition triggers a machine-wide out-of-memory (OOM) error, which causes some pods to be killed. Plan your pod priorities and resource specifications based on your application's characteristics to avoid affecting pod service quality due to resource competition.