This topic describes the use scenarios, customer requirements, architecture, and references for using GPUs to train AI models.
Use scenarios
You can use GPUs to train AI image generation models, use Cloud Parallel File System (CPFS) and File Storage NAS (NAS) file systems to store and share model data, and use Container Service for Kubernetes (ACK) to manage GPU-accelerated Elastic Compute Service (ECS) instances that are used to run training jobs.
Customer requirements
Build environments for training AI models based on images
Use CPFS to store model training data
Use Apsara AI acceleration tools to accelerate model training
Use Arena to submit training jobs