All Products
Search
Document Center

Platform For AI:Manage integrated resources for model training and inference

Last Updated:Dec 16, 2025

In a multi-team collaborative environment, you must properly allocate computing resources for each team to ensure efficient operations. This topic describes how to efficiently manage and utilize resources by creating quotas and assign the resources in quotas to different teams.

Background information

Example

AI computing resources (128 GPUs) are purchased for Team A, Team B, and Team C.

  • Team A is responsible for inference services and requires resource assurance.

  • Team B and Team C are training teams and are responsible for submitting training jobs.

  • The inference service of Team A takes precedence over the training jobs of Team B and Team C. If the inference resources required by Team A become insufficient, the system can quickly reclaim the resources used for training to meet the requirements of inference services.

  • The amount of computing resources used by Team B and Team C can be dynamically increased or decreased based on actual requirements.

  • Team B and Team C can manage their resources and jobs.

Overview

image

The preceding figure shows the sample scenario used in this topic. Solution description:

  • Create a quota of 128 GPUs named Quota1 and turn on the child-level preemption switch. Then, create two child-level quotas for Quota1: Quota1.1 (48 GPUs) and Quota1.2 (80 GPUs). In the preceding figure, Quota1 is in a parent-child relationship with Quota1.1 and Quota1.2. Quota1 is the parent-level quota, and Quota1.1 and Quota1.2 are the child-level quotas.

  • Create a workspace named workspace-a for Team A and associate the workspace with Quota1. Deploy EAS services on Quota1 for model inference.

  • Create a workspace named workspace-b for Team B and associate the workspace with Quota1.1. Create DLC jobs on Quota1.1.

  • Create a workspace named workspace-c for Team C and associate the workspace with Quota1.2. Create a DSW instance on Quota1.2 for model development.

Procedure

  1. Prepare AI computing resources (general computing resources or Lingjun resources). For more information about how to purchase resources, see Resource pool. If you purchased AI computing resources, skip this step.

  2. Create a quota.

    1. Create a quota named Quota1 and configure the following key parameters. For more information about the configurations, see Create resource quotas or General computing resource quotas.

      • Specifications/Resources: Select a resource, such as 128 GPUs.

      • Child-level Preemption: Turn on this switch.

    2. In the Actions column of Quota1, click New Child-level Resource Quota to create the following child-level quotas. For more information, see Create parent-child quotas.

      • Add a child-level quota named Quota1.1, and configure the Specifications/Resources parameter for the quota, such as 48 GPUs.

      • Add a child-level quota named Quota1.2, and configure the Specifications/Resources parameter for the quota, such as 80 GPUs.

  3. Create the following workspaces and associate the workspaces with quotas. For more information, see Create and manage a workspace.

    • Create a workspace named workspace-a for Team A and associate the workspace with Quota1.

    • Create a workspace named workspace-b for Team B and associate the workspace with Quota1.1.

    • Create a workspace named workspace-c for Team C and associate the workspace with Quota1.2.

  4. Grant workspace administrator permissions to Team A, Team B, and Team C. For more information, see Manage a workspace. You can also grant other permissions. For more information, see Appendix: Roles and permissions.

  5. Create an inference service and training jobs.

Scenarios

Scenario 1: Inference resources are insufficient, and the inference service preempts resources of training jobs

The administrator must go to the Resource Quota page, click the parent-level quota Quota1, and turn on the Child-level Preemption switch on the Overview tab.image

When Team A uses Quota1 to submit a new inference service in workspace-a, the resources become insufficient because Team B and Team C use child-level quotas to create training jobs, the system reclaims the computing resources that are used to run jobs of Team B and Team C. This ensures that the inference service of Team A runs as expected.

Scenario 2: Resources of Team B and Team C are reallocated

The administrator reallocates the resources of Quota1.1 and Quota1.2 by using the quota scaling feature based on the requirements of Team B and Team C. For more information, see Scale quotas.image

  • For example, you can increase the number of GPUs of Quota1.1 from 48 to 56 (8 GPUs added).

  • For example, you can decrease the number of the GPUs of Quota1.2 from 80 to 72 (8 GPUs reduced).

Scenario 3: The permissions for Team B and Team C are isolated

Quota1.1 is assigned to workspace-b of Team B, and Quota1.2 is assigned to workspace-c of Team C. Team B and Team C can manage permissions on resources and jobs in their respective workspaces. For more information, see Workspace scheduling center.image