Pod-based capacity reservation guarantees resource availability for elastic services. A GPU pod capacity reservation does not need to be attached to a cluster. When you purchase a reservation, you specify properties such as the pod specifications, zone, and reservation duration. ACS then guarantees that you can start pods with the specified specifications within minutes. GPU pod capacity reservation ensures resource availability and is more cost-effective than pay-as-you-go pods. This topic describes the features of GPU pod capacity reservation.
Features
Resource availability: During the effective period of the capacity reservation, the system guarantees that you can successfully start the resources.
Cost reduction: After a pod starts, it is billed at the pay-as-you-go rate. After the pod is destroyed, you are billed at the capacity reservation rate. You can flexibly configure the start and destroy times for pods based on your service traffic.
Resource flexibility: You can create GPU pod capacity reservations with different specifications to meet various business needs.
GPU pod capacity reservation does not guarantee resources for pods of the BestEffort computing power type.
GPU pod capacity reservation supports savings plans that match the reservation's properties, such as region and type.
The successful creation of a GPU pod capacity reservation depends on inventory availability.
Use cases
Periodic real-time service resource demands: Your service traffic exhibits a tidal pattern on a daily or weekly basis, and tasks must be executed and completed in real time. An example is a real-time inference service.
Occasional high resource demands: Your service has sudden needs for real-time computing that require fast resource delivery and scale-out to prevent business impact. An example is resource demand triggered by hot spot events in an Internet service.
Billing example
GPU pod capacity reservation uses the pay-as-you-go billing method. During the effective period of the capacity reservation, your fees include the following:
Pay-as-you-go fees for unused capacity reservations.
Pay-as-you-go fees for started pods.
This example uses a scenario where you purchase two GPU pod capacity reservations and create two pay-as-you-go pods, Pod1 and Pod2. The following figure shows the process and the billing algorithm for each phase.
Phase 1: Purchase and create a capacity reservation
Before you perform the following operations, you must first activate GPU capacity reservation.
In the Container Compute Service console, choose Capacity Reservation > Create GPU Capacity Reservation, configure the parameters, and then click Create Capacity Reservation.
Configuration item | Description |
Capacity Reservation Name | A custom name for the capacity reservation. |
Reservation Type | The GPU card type. |
Region | The region where you want to reserve resources. |
Zone | The zone where you want to reserve resources. |
Resource Specification | The specification of the capacity reservation. You only need to select the number of GPUs. The system automatically matches the highest CPU and memory specifications available for that number of GPUs. |
Reservation Method | Pod reservation (cannot be modified). |
Billing Mode | Pay-as-you-go (cannot be modified). |
Quantity | The number of GPU pod capacity reservations for this specification. |
The fee algorithm for this phase is as follows:
Phase | Fee | Description |
Phase 1 | None | No capacity reservation is created. |
Phases 2-6: Capacity reservation effective period
During the effective period, you can create pod instances at any time, provided that they do not exceed the reservation configuration . The system guarantees that the pods are created successfully and deducts the corresponding capacity reservation quota. For a reservation to apply, the pod's GPU (card type and quantity), CPU, and memory must not exceed the reserved specifications. If a match is found, the entire reservation is applied. For example, if you purchase one capacity reservation (1 GPU, 10 vCPUs, and 80 GB of memory) and create a pod with the specification of 1 GPU, 1 vCPU, and 2 GB of memory, the system applies the entire capacity reservation. After the pod is destroyed, the corresponding GPU pod capacity reservation quota is restored.
The fee algorithms for these phases are as follows:
Phase | Fee |
Phase 2 | 2 × capacity reservation unit price × duration of Phase 2 |
Phase 3 | 1 × capacity reservation unit price × duration of Phase 3 + Pod1 pay-as-you-go unit price × duration of Phase 3 |
Phase 4 | Pod1 pay-as-you-go unit price × duration of Phase 4 + Pod2 pay-as-you-go unit price × duration of Phase 4 |
Phase 5 | 1 × capacity reservation unit price × duration of Phase 5 + Pod2 pay-as-you-go unit price × duration of Phase 5 |
Phase 6 | 2 × capacity reservation unit price × duration of Phase 6 |
The capacity reservation unit price is the pay-as-you-go fee for the unused capacity reservation. The pay-as-you-go unit prices for Pod1 and Pod2 are the standard pay-as-you-go rates that apply after the pods are started.
Phase 7: Capacity reservation expires
When the capacity reservation expires, the system automatically releases the GPU pod capacity reservation.
Available resource reservation types
After you increase the quota for capacity reservation specifications, the capacity reservation supports the following card types and specifications:
Card type | GPU | vCPU | Memory (GiB) |
L20 (GN8IS) | 1 (48 G GPU memory) | 16 | 128 |
2 (48 G × 2 GPU memory) | 32 | 230 | |
4 (48 G × 4 GPU memory) | 64 | 460 | |
8 (48 G × 8 GPU memory) | 128 | 920 | |
T4 | 1 (16 G GPU memory) | 24 | 90 |
2 (16 G × 2 GPU memory) | 48 | 180 | |
A10 | 1 (24 G GPU memory) | 16 | 60 |
2 (24 G × 2 GPU memory) | 32 | 120 | |
4 (24 G × 4 GPU memory) | 64 | 240 | |
8 (24 G × 8 GPU memory) | 128 | 480 | |
P16EN | 1 (96 G GPU memory) | 10 | 80 |
2 (96 G × 2 GPU memory) | 22 | 225 | |
4 (96 G × 4 GPU memory) | 46 | 450 | |
8 (96 G × 8 GPU memory) | 92 | 900 | |
16 (96 G × 16 GPU memory) | 184 | 1800 | |
GU8TF | 1 (96 G GPU memory) | 16 | 128 |
2 (96 G × 2 GPU memory) | 46 | 230 | |
4 (96 G × 4 GPU memory) | 92 | 460 | |
8 (96 G × 8 GPU memory) | 184 | 920 | |
GU8TEF | 1 (141 G GPU memory) | 22 | 225 |
2 (141 G × 2 GPU memory) | 46 | 450 | |
4 (141 G × 4 GPU memory) | 92 | 900 | |
8 (141 G × 8 GPU memory) | 184 | 1800 | |
L20X (GX8SF) | 1 (141 G GPU memory) | 22 | 225 |
2 (141 G × 2 GPU memory) | 46 | 450 | |
4 (141 G × 4 GPU memory) | 92 | 900 | |
8 (141 G × 8 GPU memory) | 184 | 1800 |
Deduction rules
A capacity reservation is applied only if all the following conditions are met:
The GPU card type of the pod exactly matches the reserved card type. For example, both the reserved card type and the pod card type are L20.
The number of GPUs exactly matches the reserved configuration. For example, both the reserved number of GPUs and the pod's number of GPUs are 1.
The pod's vCPU count ≤ the reserved vCPU count.
The pod's memory ≤ the reserved memory.
The following matching scenarios assume that the pod's card type is the same as the reserved card type:
Deduction Rules | Scenario description | Result and description |
Exact match or backward compatible | Reservation: 1 × (1 GPU, 16 vCPUs, 128 GB). Pod creation: 1 × (1 GPU, 8 vCPUs, 16 GB). | Result: Description: The resources required by the pod (number of GPUs, CPU, and memory) do not exceed the reserved specifications, so a match is found. This pod uses the entire capacity reservation. |
Smallest specification first | Reservations:
Pod creation: 1 × (1 GPU, 5 vCPUs, 30 GB). | Result: Description: To maximize resource utilization, the system prioritizes the smallest reservation that best fits the pod's requirements. |
First in, first out (FIFO) | Reservations: 4 × (1 GPU, 10 vCPUs, 80 GB), created at different times. Pod creation: 4 × (1 GPU, 5 vCPUs, 30 GB). | Result: Description: For reservations of the same specification, the first in, first out principle is followed. |
Atomicity of multi-GPU specifications (cannot be split) | Reservation: 1 × (4 GPUs, 46 vCPUs, 450 GB). Pod creation: 4 × (1 GPU, 10 vCPUs, 60 GB). | Result: Description: A multi-GPU reservation is an atomic unit and cannot be split to accommodate multiple single-GPU pods. These four pods are created as pay-as-you-go instances instead. |
Mixed specification matching | Reservations:
Pod creation:
| Result: Description: The remaining pods cannot match the remaining 4-GPU reservation and are created as pay-as-you-go instances instead. |
Real-time dynamic matching | Existing pay-as-you-go pod: 1 × (1 GPU, 5 vCPUs, 30 GB) New reservation purchase: 1 × (1 GPU, 10 vCPUs, 80 GB) | Result: Description: A capacity reservation can be applied to existing pay-as-you-go pods that match the reservation's specifications. |