All Products
Search
Document Center

Platform For AI:Manage integrated training and inference resources

Last Updated:Apr 03, 2026

Allocate computing resources across multiple teams by creating quotas and assigning resources to each team.

Background

Example

A pool of 128 GPUs serves Team A, Team B, and Team C with the following requirements:

  • Team A runs inference services and requires guaranteed resources.

  • Team B and Team C run training jobs.

  • Inference services of Team A take priority over training jobs. When Team A needs more resources, the system reclaims resources from training jobs to keep inference services running.

  • Resources for Team B and Team C are dynamically adjusted based on actual needs.

  • Team B and Team C independently manage their own resources and jobs.

Solution

image

The preceding figure illustrates the sample scenario. The solution is:

  • Create a parent quota named Quota1 with 128 GPUs and enable child-level preemption. Create two child quotas: Quota1.1 (48 GPUs) and Quota1.2 (80 GPUs).

  • Create workspace-a for Team A and associate it with Quota1 to deploy EAS inference services.

  • Create workspace-b for Team B and associate it with Quota1.1 to run DLC training jobs.

  • Create workspace-c for Team C and associate it with Quota1.2 to run DSW instances for model development.

Procedure

  1. Prepare AI computing resources (general computing resources or Lingjun resources). For more information, see Resource pool overview. Skip this step if AI computing resources are already purchased.

  2. Create a quota.

    1. Create a quota named Quota1 with the following key parameters. For more information, see Create a resource quota or General computing resource quotas.

      • Specifications/Resources: Select a resource specification, such as 128 GPUs.

      • Turn on the Child-level Preemption switch.

    2. In the Actions column for Quota1, click New Child-level Resource Quota to create the following two child quotas. For details, see Create parent-child quotas.

      • Create a child quota named Quota1.1 with 48 GPUs.

      • Create a child quota named Quota1.2 with 80 GPUs.

  3. Create the following workspaces and associate them with quotas. For more information, see Create and manage workspaces.

    • Create workspace-a for Team A and associate it with Quota1.

    • Create workspace-b for Team B and associate it with Quota1.1.

    • Create workspace-c for Team C and associate it with Quota1.2.

  4. Grant workspace administrator permissions to each team. For more information, see Manage a workspace. For other permission types, see Roles and permissions.

  5. Create an inference service and training jobs.

Use cases

Scenario 1: Inference service preempts resources from training jobs

An administrator goes to the Resource Quota page, clicks the parent quota Quota1, and on the Overview tab, turns on Child-level Preemption.image

When Team A submits a new inference service in workspace-a but resources are insufficient due to training jobs from Team B and Team C, the system automatically reclaims resources from training jobs to ensure the inference service runs properly.

Scenario 2: Reallocate resources between teams

An administrator uses quota scaling to adjust resources for Quota1.1 and Quota1.2 based on Team B and Team C needs. For details, see Scale quotas.image

  • Increase Quota1.1 GPUs from 48 to 56 (+8 GPUs).

  • Decrease Quota1.2 GPUs from 80 to 72 (-8 GPUs).

Scenario 3: Isolate permissions between teams

Quota1.1 is assigned to workspace-b for Team B, and Quota1.2 is assigned to workspace-c for Team C. This setup lets each team independently manage permissions for their resources and jobs within their own workspace. For more information, see Workspace Scheduling Center.image