All Products
Search
Document Center

Container Compute Service:GPU Pod capacity reservation

Last Updated:Feb 27, 2026

GPU Pod capacity reservation guarantees resource availability for elastic GPU workloads at lower costs than standard pay-as-you-go Pods. Reserve capacity by specifying Pod specifications, zone, and lock-in period. No cluster binding is required. You can create multiple reservations with different specifications for resource flexibility. When resources are needed, ACS starts Pods of the reserved specifications within minutes.

Typical use cases:

  • Periodic real-time workloads: Tasks with daily or weekly tidal patterns that must execute in real time, such as inference services.

  • Sporadic large-scale resource demands: Sudden computing requirements that need rapid resource delivery and scaling, such as resource spikes triggered by trending events.

How it works

GPU Pod capacity reservation operates in three states:

  • Idle reservation: No Pods are running. The reservation holds capacity and charges at the capacity reservation rate.

  • Active Pod: A Pod starts and deducts the reservation quota. Billing switches to the Pod pay-as-you-go rate.

  • Restored reservation: A Pod stops. The reservation quota is restored, and billing reverts to the capacity reservation rate.

This cycle repeats throughout the reservation validity period. When the reservation expires, the system automatically releases it.

Create a GPU Pod capacity reservation

In the ACS console, go to Capacity Reservation > Create GPU Capacity Reservation. Configure the following parameters and click Create Capacity Reservation.

ParameterDescription
Capacity Reservation NameA user-defined name for the reservation.
RegionRegion where resources are reserved.
ZoneZone where resources are reserved.
Reservation TypeGPU card type.
Resource SpecificationNumber of GPU cards. The system automatically matches the highest CPU and memory specifications for the selected card count.
Reservation ModePod reservation (not modifiable).
Billing ModePay-as-you-go (not modifiable).
Release MethodDefault time to release the reservation.
QuantityNumber of GPU Pod capacity reservations for this specification.
Reservation creation depends on inventory availability. If the requested GPU resources are not available in the selected zone, creation may fail.

Supported GPU specifications

The following GPU card types and specifications are available after the capacity reservation specification upgrade:

Card typeGPU cardsGPU memory per cardvCPUMemory (GiB)
L20 (GN8IS)148 GB16128
248 GB32230
448 GB64460
848 GB128920
T4116 GB2490
216 GB48180
A10124 GB1660
224 GB32120
424 GB64240
824 GB128480
P16EN196 GB1080
296 GB22225
496 GB46450
896 GB92900
1696 GB1841800
GU8TF196 GB16128
296 GB46230
496 GB92460
896 GB184920
GU8TEF1141 GB22225
2141 GB46450
4141 GB92900
8141 GB1841800
L20X (GX8SF)1141 GB22225
2141 GB46450
4141 GB92900
8141 GB1841800

Quota deduction rules

When a Pod starts, the system attempts to match it against available capacity reservations. A successful match deducts the reservation quota and charges the Pod at pay-as-you-go rates. All four conditions must be met:

  1. GPU card type exactly matches the reserved card type.

  2. GPU card count exactly matches the reserved configuration.

  3. Pod vCPU is less than or equal to reserved vCPU.

  4. Pod memory is less than or equal to reserved memory.

When a Pod is terminated, the corresponding capacity reservation quota is restored.

Deduction example

If a reservation specifies 1 card, 10 vCPU, and 80 GB memory, creating a Pod with 1 card, 1 vCPU, and 2 GB memory fully deducts this reservation. The card type and count match exactly, and the Pod's CPU and memory do not exceed the reserved amounts.

Deduction principles

The system applies the following principles when matching Pods to reservations. All examples assume the Pod card type matches the reserved card type.

PrincipleExampleResult
Exact match or downward compatibilityReserved: 1 x (1 card, 16 vCPU, 128 GB). Pod: 1 x (1 card, 8 vCPU, 16 GB).Deducted. The Pod resources do not exceed the reserved specification.
Smallest specification firstReserved: 1 x (1 card, 10 vCPU, 80 GB) and 1 x (1 card, 16 vCPU, 128 GB). Pod: 1 x (1 card, 5 vCPU, 30 GB).Deducted from the 1 card, 10 vCPU, 80 GB reservation. The system selects the smallest matching reservation to maximize resource utilization.
First-in-first-out (FIFO)Reserved: 4 x (1 card, 10 vCPU, 80 GB), created at different times. Pods: 4 x (1 card, 5 vCPU, 30 GB).Deducted in order from earliest to latest reservation creation time.
Multi-card atomicityReserved: 1 x (4 cards, 46 vCPU, 450 GB). Pods: 4 x (1 card, 10 vCPU, 60 GB).Not deducted. Multi-card reservations cannot be split across multiple single-card Pods. The 4 Pods run as standard pay-as-you-go.
Mixed specification matchingReserved: 1 x (2 cards, 22 vCPU, 225 GB) and 1 x (4 cards, 46 vCPU, 450 GB). Pods: 2 x (1 card, 12 vCPU, 60 GB) and 2 x (2 cards, 20 vCPU, 120 GB).Only 1 Pod with 2 cards, 20 vCPU, 120 GB deducts the 2-card reservation. The remaining Pods cannot match the 4-card reservation and run as pay-as-you-go.
Real-time dynamic matchingExisting pay-as-you-go Pod: 1 x (1 card, 5 vCPU, 30 GB). New reservation purchased: 1 x (1 card, 10 vCPU, 80 GB).Deducted. New reservations automatically match and deduct existing pay-as-you-go Pods that meet the conditions.

Billing

GPU Pod capacity reservation uses pay-as-you-go billing. During the reservation validity period, charges consist of:

  • Capacity reservation fees for unused (idle) reservations

  • Pod pay-as-you-go fees for running Pods

When a Pod starts, it deducts the reservation quota. The reservation fee stops for that slot, and the Pod pay-as-you-go fee begins. When the Pod stops, the quota is restored and the reservation fee resumes.

Billing example

The following example shows charges across seven phases for two GPU Pod capacity reservations and two pay-as-you-go Pods (Pod1 and Pod2).

PhaseStateCharges
Phase 1Purchase and create reservationNone (reservation not yet active)
Phase 2Both reservations idle2 x capacity reservation rate x duration
Phase 3Pod1 running, 1 reservation idle1 x capacity reservation rate x duration + Pod1 pay-as-you-go rate x duration
Phase 4Both Pods runningPod1 pay-as-you-go rate x duration + Pod2 pay-as-you-go rate x duration
Phase 5Pod2 running, 1 reservation idle1 x capacity reservation rate x duration + Pod2 pay-as-you-go rate x duration
Phase 6Both reservations idle again2 x capacity reservation rate x duration
Phase 7Reservation expiresSystem releases the reservation automatically
GPU Pod capacity reservation supports savings plans that match region and type attributes.

Limitations

  • GPU Pod capacity reservation does not support BestEffort compute class Pods.

  • Reservation creation depends on inventory availability.

FAQ

Am I charged twice when a Pod is running -- once for the reservation and once for the Pod?

No. When a Pod starts and deducts a reservation, the capacity reservation fee for that slot stops. Only the Pod pay-as-you-go fee applies. When the Pod stops, the reservation fee resumes at the capacity reservation rate.

What happens if my Pod exceeds the reserved specification?

The Pod does not deduct the reservation. A Pod must match the reserved GPU card type and count exactly, and its vCPU and memory must not exceed the reserved amounts. Pods that do not match any reservation run as standard pay-as-you-go.

Can a new reservation match an existing pay-as-you-go Pod?

Yes. When a new capacity reservation is created, it automatically matches and deducts existing pay-as-you-go Pods that meet the matching conditions. This is the real-time dynamic matching principle.

What happens when a capacity reservation expires?

The system automatically releases the GPU Pod capacity reservation.

Related topics