Alibaba Cloud Container Compute Service (ACS) supports multiple GPU types for different scenarios. Request a specific GPU model series in your cluster by using the alibabacloud.com/gpu-model-series label. Refer to the following specifications to select the instance family that best meets your needs.
GU8TF
This family features high-performance compute GPUs.
96 GB of memory per GPU and native support for the FP8 floating-point format, enabling single-node inference for 70B and larger models.
Features high-speed NVLink interconnect among all 8 GPUs, making it ideal for small to medium-scale model training. Provides 1.6 Tbps Remote Direct Memory Access (RDMA) for internode communication.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (96 GB) | 2 | 2–16 | 1 | 30–256 |
4 | 4–32 | 1 | ||
6 | 6–48 | 1 | ||
8 | 8–64 | 1 | ||
10 | 10–80 | 1 | ||
12 | 12–96 | 1 | ||
14 | 14–112 | 1 | ||
16 | 16–128 | 1 | ||
22 | 22, 32, 64, 128 | N/A | ||
2 (96 GB) | 16 | 16–128 | 1 | 30–512 |
32 | 32, 64, 128, 230 | N/A | ||
46 | 64, 128, 230 | N/A | ||
4 (96 GB) | 32 | 32, 64, 128, 256 | N/A | 30–1,024 |
64 | 64, 128, 256, 460 | N/A | ||
92 | 128, 256, 460 | N/A | ||
8 (96 GB) | 64 | 64, 128, 256, 512 | N/A | 30–2,048 |
128 | 128, 256, 512, 920 | N/A | ||
184 | 256, 512, 920 | N/A |
GU8TEF
This family features high-performance compute GPUs.
141 GB of memory per GPU and native support for the FP8 floating-point format. Multi-GPU configurations support single-node inference for models such as DeepSeek-LLM 67B.
Features high-speed NVLink interconnect between all 8 GPUs, making it ideal for small- to medium-scale model training. Provides 1.6 Tbps RDMA for internode communication.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (141 GB) | 2 | 2–16 | 1 | 30–768 |
4 | 4–32 | 1 | ||
6 | 6–48 | 1 | ||
8 | 8–64 | 1 | ||
10 | 10–80 | 1 | ||
12 | 12–96 | 1 | ||
14 | 14–112 | 1 | ||
16 | 16–128 | 1 | ||
22 | 22, 32, 64, 128, 225 | N/A | ||
2 (141 GB) | 16 | 16–128 | 1 | 30–1,536 |
32 | 32, 64, 128, 256 | N/A | ||
46 | 64, 128, 256, 450 | N/A | ||
4 (141 GB) | 32 | 32, 64, 128, 256 | N/A | 30–3,072 |
64 | 64, 128, 256, 512 | N/A | ||
92 | 128, 256, 512, 900 | N/A | ||
8 (141 GB) | 64 | 64, 128, 256, 512 | N/A | 30–6,144 |
128 | 128, 256, 512, 1,024 | N/A | ||
184 | 256, 512, 1024, 1,800 | N/A |
L20 (GN8IS)
This family features compute GPUs suitable for a wide range of AI workloads.
Supports common acceleration libraries such as TensorRT, and the FP8 floating-point format. Peer-to-peer (P2P) communication between GPUs is enabled.
48 GB of memory per GPU. Multi-GPU configurations support single-node inference for 70B and larger models.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (48 GB) | 2 | 2–16 | 1 | 30–256 |
4 | 4–32 | 1 | ||
6 | 6–48 | 1 | ||
8 | 8–64 | 1 | ||
10 | 10–80 | 1 | ||
12 | 12–96 | 1 | ||
14 | 14–112 | 1 | ||
16 | 16–120 | 1 | ||
2 (48 GB) | 16 | 16–128 | 1 | 30–512 |
32 | 32, 64, 128, 230 | N/A | ||
4 (48 GB) | 32 | 32, 64, 128, 256 | N/A | 30–1,024 |
64 | 64, 128, 256, 460 | N/A | ||
8 (48 GB) | 64 | 64, 128, 256, 512 | N/A | 30–2,048 |
128 | 128, 256, 512, 920 | N/A |
L20X (GX8SF)
This family features high-performance compute GPUs for large-scale AI workloads.
141 GB of memory per GPU. Multi-GPU configurations support single-node inference for very large models.
Features high-speed NVLink interconnect among all 8 GPUs, making it ideal for large model training and inference. Provides 3.2 Tbps RDMA for internode communication.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory (GiB) | Memory increment (GiB) | Storage range (GiB) |
8 (141 GB) | 184 | 1,800 | N/A | 30–6,144 |
P16EN
This family features high-performance compute GPUs.
96 GB of memory per GPU with support for the FP16 floating-point format. Multi-GPU configurations support single-node inference for models such as DeepSeek R1.
Features a 700 GB/s high-speed interconnect between all 16 GPUs, ideal for small- to medium-scale model training. Provides 1.6 Tbps RDMA for internode communication.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory (GiB) | Memory increment (GiB) | Storage range |
1 (96 GB) | 2 | 2–16 | 1 | 30 GB–384 GB |
4 | 4–32 | 1 | ||
6 | 6–48 | 1 | ||
8 | 8-64 | 1 | ||
10 | 10–80 | 1 | ||
2 (96 GB) | 4 | 4–32 | 1 | 30 GB–768 GB |
6 | 6–48 | 1 | ||
8 | 8–64 | 1 | ||
16 | 16–128 | 1 | ||
22 | 32, 64, 128, 225 | N/A | ||
4 (96 GB) | 8 | 8–64 | 1 | 30 GB–1.5 TB |
16 | 16–128 | 1 | ||
32 | 32, 64, 128, 256 | N/A | ||
46 | 64, 128, 256, 450 | N/A | ||
8 (96 GB) | 16 | 16–128 | 1 | 30 GB–3 TB |
32 | 32, 64, 128, 256 | N/A | ||
64 | 64, 128, 256, 512 | N/A | ||
92 | 128, 256, 512, 900 | N/A | ||
16 (96 GB) | 32 | 32, 64, 128, 256 | N/A | 30 GB–6 TB |
64 | 64, 128, 256, 512 | N/A | ||
128 | 128, 256, 512, 1024 | N/A | ||
184 | 256, 512, 1,024, 1,800 | N/A |
G49E
This family features compute GPUs suitable for a wide range of AI and graphics workloads.
48 GB of memory per GPU, with support for acceleration libraries such as RTX and TensorRT. P2P communication between GPUs is enabled.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (48 GB) | 2 | 2–16 | 1 | 30–256 |
4 | 4–32 | 1 | ||
6 | 6–48 | 1 | ||
8 | 8–64 | 1 | ||
10 | 10–80 | 1 | ||
12 | 12–96 | 1 | ||
14 | 14–112 | 1 | ||
16 | 16–120 | 1 | ||
2 (48 GB) | 16 | 16–128 | 1 | 30–512 |
32 | 32, 64, 128, 230 | N/A | ||
4 (48 GB) | 32 | 32, 64, 128, 256 | N/A | 30–1024 |
64 | 64, 128, 256, 460 | N/A | ||
8 (48 GB) | 64 | 64, 128, 256, 512 | N/A | 30–2048 |
128 | 128, 256, 512, 920 | N/A |
T4
This family features versatile GPUs based on the Turing architecture, suitable for inference and graphics workloads.
16 GB of memory per GPU with 320 GB/s of memory bandwidth.
Variable-precision Tensor Cores support 65 TFLOPS (FP16), 130 TOPS (INT8), and 260 TOPS (INT4).
Pod resource constraints of single-node family:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (16 GB) | 2 | 2–8 | 1 | 30–1536 |
4 | 4–16 | 1 | ||
6 | 6–24 | 1 | ||
8 | 8–32 | 1 | ||
10 | 10–40 | 1 | ||
12 | 12–48 | 1 | ||
14 | 14–56 | 1 | ||
16 | 16–64 | 1 | ||
24 | 24, 48, 90 | N/A | ||
2 (16 GB) | 16 | 16–64 | 1 | |
24 | 24, 48, 96 | N/A | ||
32 | 32, 64, 128 | N/A | ||
48 | 48, 96, 180 | N/A |
A10
This family features powerful GPUs based on the Ampere architecture, suitable for deep learning, HPC, and graphics.
24 GB of memory per GPU and supports features such as RTX and TensorRT.
Pod resource constraints:
GPU count (memory per GPU) | vCPU | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) |
1 (24 GB) | 2 | 2–8 | 1 | 30–256 |
4 | 4–16 | 1 | ||
6 | 6–24 | 1 | ||
8 | 8–32 | 1 | ||
10 | 10–40 | 1 | ||
12 | 12–48 | 1 | ||
14 | 14–56 | 1 | ||
16 | 16–60 | 1 | ||
2 (24 GB) | 16 | 16–64 | 1 | 30–512 |
32 | 32, 64, 120 | N/A | ||
4 (24 GB) | 32 | 32, 64, 128 | N/A | 30–1,024 |
64 | 64, 128, 240 | N/A | ||
8 (24 GB) | 64 | 64, 128, 256 | N/A | 30–2,048 |
128 | 128, 256, 480 | N/A |
G59
This family features compute GPUs suitable for various AI and HPC workloads.
32 GB of memory per GPU and supports features such as RTX and TensorRT. P2P communication between GPUs is enabled.
Pod resource constraints:
GPU count (memory per GPU) | vCPUs | Memory options (GiB) | Memory increment (GiB) | Storage range (GiB) | Network bandwidth |
1 (32 GB) | 2 | 2–16 | 1 | 30–256 | 1 Gbps per vCPU |
4 | 4–32 | 1 | |||
6 | 6–48 | 1 | |||
8 | 8–64 | 1 | |||
10 | 10–80 | 1 | |||
12 | 12–96 | 1 | |||
14 | 14–112 | 1 | |||
16 | 16–128 | 1 | |||
22 | 22, 32, 64, 128 | N/A | |||
2 (32 GB) | 16 | 16–128 | 1 | 30–512 | |
32 | 32, 64, 128, 256 | N/A | |||
46 | 64, 128, 256, 360 | N/A | |||
4 (32 GB) | 32 | 32, 64, 128, 256 | N/A | 30–1,024 | |
64 | 64, 128, 256, 512 | N/A | |||
92 | 128, 256, 512, 720 | N/A | |||
8 (32 GB) | 64 | 64, 128, 256, 512 | N/A | 30–2,048 | |
128 | 128, 256, 512, 1024 | N/A | 100 Gbps per vCPU | ||
184 | 256, 512, 1024, 1440 | N/A |