Alibaba Cloud Container Compute Service (ACS) provides unified scheduling for heterogeneous computing resources. By offering a serverless model for GPU utilization, ACS significantly reduces the operational complexity of managing heterogeneous computing clusters. This topic provides an overview of the supported GPU types and usage patterns in ACS.
Typical ACS GPU workflow for AI
ACS provides a highly elastic and cost-effective solution for AI workloads, covering the entire lifecycle from data pre-processing and model training to inference deployment. By combining the on-demand and auto-scaling nature of a serverless architecture with powerful GPU computing, ACS empowers developers and data scientists to focus on business logic and algorithm innovation instead of underlying resource management.
Data pre-processing: For large-scale data cleansing, transformation, and augmentation tasks, you can use the parallel processing power of serverless CPUs. On-demand CPU instances can be launched in large numbers to accelerate computation and are released immediately upon completion, eliminating costs for idle time. This approach is particularly efficient for periodic or bursty data batch processing, significantly shortening the data preparation cycle.
Model training: During the compute-intensive model training phase, serverless GPUs allow you to flexibly select the appropriate GPU instance types based on your model size and required convergence speed. You are billed for the exact duration of your training jobs with per-second precision, completely eliminating the cost of idle GPU servers common in traditional setups. This model is ideal for experimental hyperparameter tuning and iterative training.
To address the need for resource certainty and flexibility, you can use the GPU-HPN capacity reservation to reserve GPU resources in advance.
Inference deployment: Once a model is trained, you can seamlessly deploy it as an online inference service. The ACS serverless architecture automatically scales GPU instances in or out within seconds based on real-time request traffic, even scaling down to zero. This means you incur no resource costs when there is no traffic. This extreme elasticity is ideal for AI applications with highly variable or bursty traffic patterns, such as image recognition and natural language processing. This ensures high availability for your service while maximizing cost savings.
By using serverless GPUs in ACS, you can efficiently manage your entire AI workflow on a unified, seamless platform, achieving optimal resource allocation and cost-effectiveness to accelerate the development and deployment of your AI applications.

Supported GPU types in ACS
GPU type | Memory | Supported GPU count | RDMA support |
96 GB | 1/2/4/8 | Yes | |
141 GB | 1/2/4/8 | Yes | |
48 GB | 1/2/4/8 | No | |
141 GB | 8 | Yes | |
96 GB | 1/2/4/8/16 | Yes | |
48 GB | 1/2/4/8 | No | |
16 GB | 1/2 | No | |
24 GB | 1/2/4/8 | No | |
32 GB | 1/2/4/8 | No |
For details on GPU specifications, see GPU instance families supported by ACS.
Availability zones for ACS GPU resources
Availability zone | Supported GPU types |
cn-wulanchabu-a | GU8TF, L20, G49E |
cn-wulanchabu-b | G59 |
cn-wulanchabu-c | P16EN |
cn-beijing-d | GU8TF, GU8TEF, P16EN |
cn-beijing-i | A10 |
cn-beijing-l | L20, G49E, G59 |
cn-shanghai-e | G59 |
cn-shanghai-f | GU8TF, GU8TEF, P16EN |
cn-shanghai-l | L20, G49E, T4 |
cn-shanghai-n | L20 |
cn-shanghai-o | P16EN |
cn-hangzhou-b | GU8TF, L20, G49E, P16EN, G59 |
cn-hangzhou-i | T4 |
cn-shenzhen-c | L20 |
cn-shenzhen-d | GU8TEF, G49E, G59 |
cn-shenzhen-e | T4 |
cn-hongkong-d | GU8TEF |
ap-southeast-1 | GU8TF, L20, L20X |