Platform for AI (PAI) uses a four-layer architecture that covers the full AI development lifecycle, from foundational resources and platform tools to model services and industry solutions.

As shown in the figure, the PAI architecture consists of the following four layers:
Infrastructure layer (computing resources & infrastructure):
Infrastructure: Provides CPUs, GPUs, high-speed RDMA networks, and Container Service for Kubernetes (ACK).
Computing resources: Includes cloud-native computing resources (Lingjun specialized resources and general-purpose computing resources) and big data engines, such as MaxCompute and Flink.
Platform and tools layer (AI services & frameworks):
AI frameworks: Supports popular AI frameworks such as Alink, TensorFlow, PyTorch, Megatron, DeepSpeed, and Reinforcement Learning from Human Feedback (RLHF).
Optimization and acceleration: Provides Dataset Acceleration (DatasetAcc), Training Acceleration (TorchAcc), Parallel Training (EPL), Inference Acceleration (BladeLLM), Automatic Fault-tolerant Training (AIMaster), and Training Snapshot (EasyCkpt).
End-to-end machine learning tools:
Data preparation: iTAG data annotation service and dataset management.
Model development and training: Machine Learning Designer, Data Science Workshop (DSW), Deep Learning Containers (DLC), and FeatureStore.
Model deployment: Elastic Algorithm Service (EAS) for deploying models as services.
Application layer (model services): Integrates with various model service platforms, including the ModelScope community, PAI-DashScope, third-party Model-as-a-Service (MaaS) platforms, and Alibaba Cloud Model Studio.
Business layer (Industry solutions): PAI provides industry solutions for fields such as autonomous driving, AI for Science (AI4Science), financial risk management, and intelligent recommendation systems. For example, internal systems at Alibaba Group use PAI for data mining in search, recommendations, and financial services.