Allocate computing resources across multiple teams by creating quotas and assigning resources to each team.
Background
Example
A pool of 128 GPUs serves Team A, Team B, and Team C with the following requirements:
-
Team A runs inference services and requires guaranteed resources.
-
Team B and Team C run training jobs.
-
Inference services of Team A take priority over training jobs. When Team A needs more resources, the system reclaims resources from training jobs to keep inference services running.
-
Resources for Team B and Team C are dynamically adjusted based on actual needs.
-
Team B and Team C independently manage their own resources and jobs.
Solution
The preceding figure illustrates the sample scenario. The solution is:
-
Create a parent quota named Quota1 with 128 GPUs and enable child-level preemption. Create two child quotas: Quota1.1 (48 GPUs) and Quota1.2 (80 GPUs).
-
Create workspace-a for Team A and associate it with Quota1 to deploy EAS inference services.
-
Create workspace-b for Team B and associate it with Quota1.1 to run DLC training jobs.
-
Create workspace-c for Team C and associate it with Quota1.2 to run DSW instances for model development.
Procedure
-
Prepare AI computing resources (general computing resources or Lingjun resources). For more information, see Resource pool overview. Skip this step if AI computing resources are already purchased.
-
Create a quota.
-
Create a quota named Quota1 with the following key parameters. For more information, see Create a resource quota or General computing resource quotas.
-
Specifications/Resources: Select a resource specification, such as 128 GPUs.
-
Turn on the Child-level Preemption switch.
-
-
In the Actions column for
Quota1, click New Child-level Resource Quota to create the following two child quotas. For details, see Create parent-child quotas.-
Create a child quota named Quota1.1 with 48 GPUs.
-
Create a child quota named Quota1.2 with 80 GPUs.
-
-
-
Create the following workspaces and associate them with quotas. For more information, see Create and manage workspaces.
-
Create workspace-a for Team A and associate it with Quota1.
-
Create workspace-b for Team B and associate it with Quota1.1.
-
Create workspace-c for Team C and associate it with Quota1.2.
-
-
Grant workspace administrator permissions to each team. For more information, see Manage a workspace. For other permission types, see Roles and permissions.
-
Create an inference service and training jobs.
-
Team A creates an inference service in workspace-a. For more information, see Service deployment.
-
Team B creates a DLC job in workspace-b. For more information, see Create a training job.
-
Team C creates a DSW instance in workspace-c. For more information, see Create a DSW instance.
-
Use cases
Scenario 1: Inference service preempts resources from training jobs
An administrator goes to the Resource Quota page, clicks the parent quota Quota1, and on the Overview tab, turns on Child-level Preemption.
When Team A submits a new inference service in workspace-a but resources are insufficient due to training jobs from Team B and Team C, the system automatically reclaims resources from training jobs to ensure the inference service runs properly.
Scenario 2: Reallocate resources between teams
An administrator uses quota scaling to adjust resources for Quota1.1 and Quota1.2 based on Team B and Team C needs. For details, see Scale quotas.
-
Increase Quota1.1 GPUs from 48 to 56 (+8 GPUs).
-
Decrease Quota1.2 GPUs from 80 to 72 (-8 GPUs).
Scenario 3: Isolate permissions between teams
Quota1.1 is assigned to workspace-b for Team B, and Quota1.2 is assigned to workspace-c for Team C. This setup lets each team independently manage permissions for their resources and jobs within their own workspace. For more information, see Workspace Scheduling Center.