GPU Pod capacity reservation guarantees resource availability for elastic GPU workloads at lower costs than standard pay-as-you-go Pods. Reserve capacity by specifying Pod specifications, zone, and lock-in period. No cluster binding is required. You can create multiple reservations with different specifications for resource flexibility. When resources are needed, ACS starts Pods of the reserved specifications within minutes.
Typical use cases:
Periodic real-time workloads: Tasks with daily or weekly tidal patterns that must execute in real time, such as inference services.
Sporadic large-scale resource demands: Sudden computing requirements that need rapid resource delivery and scaling, such as resource spikes triggered by trending events.
How it works
GPU Pod capacity reservation operates in three states:
Idle reservation: No Pods are running. The reservation holds capacity and charges at the capacity reservation rate.
Active Pod: A Pod starts and deducts the reservation quota. Billing switches to the Pod pay-as-you-go rate.
Restored reservation: A Pod stops. The reservation quota is restored, and billing reverts to the capacity reservation rate.
This cycle repeats throughout the reservation validity period. When the reservation expires, the system automatically releases it.
Create a GPU Pod capacity reservation
In the ACS console, go to Capacity Reservation > Create GPU Capacity Reservation. Configure the following parameters and click Create Capacity Reservation.
| Parameter | Description |
|---|---|
| Capacity Reservation Name | A user-defined name for the reservation. |
| Region | Region where resources are reserved. |
| Zone | Zone where resources are reserved. |
| Reservation Type | GPU card type. |
| Resource Specification | Number of GPU cards. The system automatically matches the highest CPU and memory specifications for the selected card count. |
| Reservation Mode | Pod reservation (not modifiable). |
| Billing Mode | Pay-as-you-go (not modifiable). |
| Release Method | Default time to release the reservation. |
| Quantity | Number of GPU Pod capacity reservations for this specification. |
Reservation creation depends on inventory availability. If the requested GPU resources are not available in the selected zone, creation may fail.
Supported GPU specifications
The following GPU card types and specifications are available after the capacity reservation specification upgrade:
| Card type | GPU cards | GPU memory per card | vCPU | Memory (GiB) |
|---|---|---|---|---|
| L20 (GN8IS) | 1 | 48 GB | 16 | 128 |
| 2 | 48 GB | 32 | 230 | |
| 4 | 48 GB | 64 | 460 | |
| 8 | 48 GB | 128 | 920 | |
| T4 | 1 | 16 GB | 24 | 90 |
| 2 | 16 GB | 48 | 180 | |
| A10 | 1 | 24 GB | 16 | 60 |
| 2 | 24 GB | 32 | 120 | |
| 4 | 24 GB | 64 | 240 | |
| 8 | 24 GB | 128 | 480 | |
| P16EN | 1 | 96 GB | 10 | 80 |
| 2 | 96 GB | 22 | 225 | |
| 4 | 96 GB | 46 | 450 | |
| 8 | 96 GB | 92 | 900 | |
| 16 | 96 GB | 184 | 1800 | |
| GU8TF | 1 | 96 GB | 16 | 128 |
| 2 | 96 GB | 46 | 230 | |
| 4 | 96 GB | 92 | 460 | |
| 8 | 96 GB | 184 | 920 | |
| GU8TEF | 1 | 141 GB | 22 | 225 |
| 2 | 141 GB | 46 | 450 | |
| 4 | 141 GB | 92 | 900 | |
| 8 | 141 GB | 184 | 1800 | |
| L20X (GX8SF) | 1 | 141 GB | 22 | 225 |
| 2 | 141 GB | 46 | 450 | |
| 4 | 141 GB | 92 | 900 | |
| 8 | 141 GB | 184 | 1800 |
Quota deduction rules
When a Pod starts, the system attempts to match it against available capacity reservations. A successful match deducts the reservation quota and charges the Pod at pay-as-you-go rates. All four conditions must be met:
GPU card type exactly matches the reserved card type.
GPU card count exactly matches the reserved configuration.
Pod vCPU is less than or equal to reserved vCPU.
Pod memory is less than or equal to reserved memory.
When a Pod is terminated, the corresponding capacity reservation quota is restored.
Deduction example
If a reservation specifies 1 card, 10 vCPU, and 80 GB memory, creating a Pod with 1 card, 1 vCPU, and 2 GB memory fully deducts this reservation. The card type and count match exactly, and the Pod's CPU and memory do not exceed the reserved amounts.
Deduction principles
The system applies the following principles when matching Pods to reservations. All examples assume the Pod card type matches the reserved card type.
| Principle | Example | Result |
|---|---|---|
| Exact match or downward compatibility | Reserved: 1 x (1 card, 16 vCPU, 128 GB). Pod: 1 x (1 card, 8 vCPU, 16 GB). | Deducted. The Pod resources do not exceed the reserved specification. |
| Smallest specification first | Reserved: 1 x (1 card, 10 vCPU, 80 GB) and 1 x (1 card, 16 vCPU, 128 GB). Pod: 1 x (1 card, 5 vCPU, 30 GB). | Deducted from the 1 card, 10 vCPU, 80 GB reservation. The system selects the smallest matching reservation to maximize resource utilization. |
| First-in-first-out (FIFO) | Reserved: 4 x (1 card, 10 vCPU, 80 GB), created at different times. Pods: 4 x (1 card, 5 vCPU, 30 GB). | Deducted in order from earliest to latest reservation creation time. |
| Multi-card atomicity | Reserved: 1 x (4 cards, 46 vCPU, 450 GB). Pods: 4 x (1 card, 10 vCPU, 60 GB). | Not deducted. Multi-card reservations cannot be split across multiple single-card Pods. The 4 Pods run as standard pay-as-you-go. |
| Mixed specification matching | Reserved: 1 x (2 cards, 22 vCPU, 225 GB) and 1 x (4 cards, 46 vCPU, 450 GB). Pods: 2 x (1 card, 12 vCPU, 60 GB) and 2 x (2 cards, 20 vCPU, 120 GB). | Only 1 Pod with 2 cards, 20 vCPU, 120 GB deducts the 2-card reservation. The remaining Pods cannot match the 4-card reservation and run as pay-as-you-go. |
| Real-time dynamic matching | Existing pay-as-you-go Pod: 1 x (1 card, 5 vCPU, 30 GB). New reservation purchased: 1 x (1 card, 10 vCPU, 80 GB). | Deducted. New reservations automatically match and deduct existing pay-as-you-go Pods that meet the conditions. |
Billing
GPU Pod capacity reservation uses pay-as-you-go billing. During the reservation validity period, charges consist of:
Capacity reservation fees for unused (idle) reservations
Pod pay-as-you-go fees for running Pods
When a Pod starts, it deducts the reservation quota. The reservation fee stops for that slot, and the Pod pay-as-you-go fee begins. When the Pod stops, the quota is restored and the reservation fee resumes.
Billing example
The following example shows charges across seven phases for two GPU Pod capacity reservations and two pay-as-you-go Pods (Pod1 and Pod2).
| Phase | State | Charges |
|---|---|---|
| Phase 1 | Purchase and create reservation | None (reservation not yet active) |
| Phase 2 | Both reservations idle | 2 x capacity reservation rate x duration |
| Phase 3 | Pod1 running, 1 reservation idle | 1 x capacity reservation rate x duration + Pod1 pay-as-you-go rate x duration |
| Phase 4 | Both Pods running | Pod1 pay-as-you-go rate x duration + Pod2 pay-as-you-go rate x duration |
| Phase 5 | Pod2 running, 1 reservation idle | 1 x capacity reservation rate x duration + Pod2 pay-as-you-go rate x duration |
| Phase 6 | Both reservations idle again | 2 x capacity reservation rate x duration |
| Phase 7 | Reservation expires | System releases the reservation automatically |
GPU Pod capacity reservation supports savings plans that match region and type attributes.
Limitations
GPU Pod capacity reservation does not support BestEffort compute class Pods.
Reservation creation depends on inventory availability.
FAQ
Am I charged twice when a Pod is running -- once for the reservation and once for the Pod?
No. When a Pod starts and deducts a reservation, the capacity reservation fee for that slot stops. Only the Pod pay-as-you-go fee applies. When the Pod stops, the reservation fee resumes at the capacity reservation rate.
What happens if my Pod exceeds the reserved specification?
The Pod does not deduct the reservation. A Pod must match the reserved GPU card type and count exactly, and its vCPU and memory must not exceed the reserved amounts. Pods that do not match any reservation run as standard pay-as-you-go.
Can a new reservation match an existing pay-as-you-go Pod?
Yes. When a new capacity reservation is created, it automatically matches and deducts existing pay-as-you-go Pods that meet the matching conditions. This is the real-time dynamic matching principle.
What happens when a capacity reservation expires?
The system automatically releases the GPU Pod capacity reservation.