All Products
Search
Document Center

Container Compute Service:GPU pod capacity reservation

Last Updated:Dec 15, 2025

Pod-based capacity reservation guarantees resource availability for elastic services. A GPU pod capacity reservation does not need to be attached to a cluster. When you purchase a reservation, you specify properties such as the pod specifications, zone, and reservation duration. ACS then guarantees that you can start pods with the specified specifications within minutes. GPU pod capacity reservation ensures resource availability and is more cost-effective than pay-as-you-go pods. This topic describes the features of GPU pod capacity reservation.

Features

  • Resource availability: During the effective period of the capacity reservation, the system guarantees that you can successfully start the resources.

  • Cost reduction: After a pod starts, it is billed at the pay-as-you-go rate. After the pod is destroyed, you are billed at the capacity reservation rate. You can flexibly configure the start and destroy times for pods based on your service traffic.

  • Resource flexibility: You can create GPU pod capacity reservations with different specifications to meet various business needs.

Note
  • GPU pod capacity reservation does not guarantee resources for pods of the BestEffort computing power type.

  • GPU pod capacity reservation supports savings plans that match the reservation's properties, such as region and type.

  • The successful creation of a GPU pod capacity reservation depends on inventory availability.

Use cases

  • Periodic real-time service resource demands: Your service traffic exhibits a tidal pattern on a daily or weekly basis, and tasks must be executed and completed in real time. An example is a real-time inference service.

    image
  • Occasional high resource demands: Your service has sudden needs for real-time computing that require fast resource delivery and scale-out to prevent business impact. An example is resource demand triggered by hot spot events in an Internet service.

    image

Billing example

GPU pod capacity reservation uses the pay-as-you-go billing method. During the effective period of the capacity reservation, your fees include the following:

  • Pay-as-you-go fees for unused capacity reservations.

  • Pay-as-you-go fees for started pods.

This example uses a scenario where you purchase two GPU pod capacity reservations and create two pay-as-you-go pods, Pod1 and Pod2. The following figure shows the process and the billing algorithm for each phase.

image

Phase 1: Purchase and create a capacity reservation

Before you perform the following operations, you must first activate GPU capacity reservation.

In the Container Compute Service console, choose Capacity Reservation > Create GPU Capacity Reservation, configure the parameters, and then click Create Capacity Reservation.

Configuration item

Description

Capacity Reservation Name

A custom name for the capacity reservation.

Reservation Type

The GPU card type.

Region

The region where you want to reserve resources.

Zone

The zone where you want to reserve resources.

Resource Specification

The specification of the capacity reservation. You only need to select the number of GPUs. The system automatically matches the highest CPU and memory specifications available for that number of GPUs.

Reservation Method

Pod reservation (cannot be modified).

Billing Mode

Pay-as-you-go (cannot be modified).

Quantity

The number of GPU pod capacity reservations for this specification.

The fee algorithm for this phase is as follows:

Phase

Fee

Description

Phase 1

None

No capacity reservation is created.

Phases 2-6: Capacity reservation effective period

During the effective period, you can create pod instances at any time, provided that they do not exceed the reservation configuration . The system guarantees that the pods are created successfully and deducts the corresponding capacity reservation quota. For a reservation to apply, the pod's GPU (card type and quantity), CPU, and memory must not exceed the reserved specifications. If a match is found, the entire reservation is applied. For example, if you purchase one capacity reservation (1 GPU, 10 vCPUs, and 80 GB of memory) and create a pod with the specification of 1 GPU, 1 vCPU, and 2 GB of memory, the system applies the entire capacity reservation. After the pod is destroyed, the corresponding GPU pod capacity reservation quota is restored.

The fee algorithms for these phases are as follows:

Phase

Fee

Phase 2

2 × capacity reservation unit price × duration of Phase 2

Phase 3

1 × capacity reservation unit price × duration of Phase 3 +

Pod1 pay-as-you-go unit price × duration of Phase 3

Phase 4

Pod1 pay-as-you-go unit price × duration of Phase 4 +

Pod2 pay-as-you-go unit price × duration of Phase 4

Phase 5

1 × capacity reservation unit price × duration of Phase 5 +

Pod2 pay-as-you-go unit price × duration of Phase 5

Phase 6

2 × capacity reservation unit price × duration of Phase 6

The capacity reservation unit price is the pay-as-you-go fee for the unused capacity reservation. The pay-as-you-go unit prices for Pod1 and Pod2 are the standard pay-as-you-go rates that apply after the pods are started.

Phase 7: Capacity reservation expires

When the capacity reservation expires, the system automatically releases the GPU pod capacity reservation.

Available resource reservation types

After you increase the quota for capacity reservation specifications, the capacity reservation supports the following card types and specifications:

Card type

GPU

vCPU

Memory (GiB)

L20 (GN8IS)

1 (48 G GPU memory)

16

128

2 (48 G × 2 GPU memory)

32

230

4 (48 G × 4 GPU memory)

64

460

8 (48 G × 8 GPU memory)

128

920

T4

1 (16 G GPU memory)

24

90

2 (16 G × 2 GPU memory)

48

180

A10

1 (24 G GPU memory)

16

60

2 (24 G × 2 GPU memory)

32

120

4 (24 G × 4 GPU memory)

64

240

8 (24 G × 8 GPU memory)

128

480

P16EN

1 (96 G GPU memory)

10

80

2 (96 G × 2 GPU memory)

22

225

4 (96 G × 4 GPU memory)

46

450

8 (96 G × 8 GPU memory)

92

900

16 (96 G × 16 GPU memory)

184

1800

GU8TF

1 (96 G GPU memory)

16

128

2 (96 G × 2 GPU memory)

46

230

4 (96 G × 4 GPU memory)

92

460

8 (96 G × 8 GPU memory)

184

920

GU8TEF

1 (141 G GPU memory)

22

225

2 (141 G × 2 GPU memory)

46

450

4 (141 G × 4 GPU memory)

92

900

8 (141 G × 8 GPU memory)

184

1800

L20X (GX8SF)

1 (141 G GPU memory)

22

225

2 (141 G × 2 GPU memory)

46

450

4 (141 G × 4 GPU memory)

92

900

8 (141 G × 8 GPU memory)

184

1800

Deduction rules

A capacity reservation is applied only if all the following conditions are met:

  • The GPU card type of the pod exactly matches the reserved card type. For example, both the reserved card type and the pod card type are L20.

  • The number of GPUs exactly matches the reserved configuration. For example, both the reserved number of GPUs and the pod's number of GPUs are 1.

  • The pod's vCPU count ≤ the reserved vCPU count.

  • The pod's memory ≤ the reserved memory.

The following matching scenarios assume that the pod's card type is the same as the reserved card type:

Deduction Rules

Scenario description

Result and description

Exact match or backward compatible

Reservation: 1 × (1 GPU, 16 vCPUs, 128 GB).

Pod creation: 1 × (1 GPU, 8 vCPUs, 16 GB).

Result: The reservation is successfully applied.

Description: The resources required by the pod (number of GPUs, CPU, and memory) do not exceed the reserved specifications, so a match is found. This pod uses the entire capacity reservation.

Smallest specification first

Reservations:

  • 1 × (1 GPU, 10 vCPUs, 80 GB).

  • 1 × (1 GPU, 16 vCPUs, 128 GB).

Pod creation: 1 × (1 GPU, 5 vCPUs, 30 GB).

Result: The reservation with the specification of 1 GPU, 10 vCPUs, and 80 GB is applied first.

Description: To maximize resource utilization, the system prioritizes the smallest reservation that best fits the pod's requirements.

First in, first out (FIFO)

Reservations: 4 × (1 GPU, 10 vCPUs, 80 GB), created at different times.

Pod creation: 4 × (1 GPU, 5 vCPUs, 30 GB).

Result: The four pods will use the reservations in chronological order based on the reservation creation time.

Description: For reservations of the same specification, the first in, first out principle is followed.

Atomicity of multi-GPU specifications (cannot be split)

Reservation: 1 × (4 GPUs, 46 vCPUs, 450 GB).

Pod creation: 4 × (1 GPU, 10 vCPUs, 60 GB).

Result: The reservation is not applied.

Description: A multi-GPU reservation is an atomic unit and cannot be split to accommodate multiple single-GPU pods. These four pods are created as pay-as-you-go instances instead.

Mixed specification matching

Reservations:

  • 1 × (2 GPUs, 22 vCPUs, 225 GB).

  • 1 × (4 GPUs, 46 vCPUs, 450 GB).

Pod creation:

  • 2 × (1 GPU, 12 vCPUs, 60 GB).

  • 2 × (2 GPUs, 20 vCPUs, 120 GB).

Result: Only one pod with 2 GPUs, 20 vCPUs, and 120 GB successfully uses the reservation for 2 GPUs, 22 vCPUs, and 225 GB.

Description: The remaining pods cannot match the remaining 4-GPU reservation and are created as pay-as-you-go instances instead.

Real-time dynamic matching

Existing pay-as-you-go pod: 1 × (1 GPU, 5 vCPUs, 30 GB)

New reservation purchase: 1 × (1 GPU, 10 vCPUs, 80 GB)

Result: After the new reservation is successfully created, it is immediately and automatically applied to the existing pay-as-you-go pod.

Description: A capacity reservation can be applied to existing pay-as-you-go pods that match the reservation's specifications.