After you go to the details page of a workspace, you can use the modules of Machine Learning Platform for AI (PAI) to perform AI development. This topic describes how to get started with PAI and the commonly-used AI development workflow.
AI development workflow
After you go to the details page of a workspace, you can view all PAI modules in the left-side navigation pane. You can use the modules throughout the lifecycle of AI development based on your business scenarios. The following figures show the common use cases. You can refer to the introduction to use different modules.
Cloud-native development scenario
Section
Description
Reference
①
High-quality datasets are essential to high-precision models. The goal of data preparation is to create high-quality datasets. You can use the dataset management module to register public datasets, or create datasets from files that are uploaded from on-premises machines or stored in Alibaba Cloud storage services. You can also create index datasets by scanning Object Storage Service (OSS) folders. This allows you to manage the data of PAI in a centralized manner, and prepare for data labeling and model training.
②
Data Science Workshop (DSW) is an interactive machine learning integrated development environment (IDE) that is designed for AI development in the cloud. You can enable Notebooks to obtain data, develop algorithms, and train and deploy models anytime and anywhere.
③
The image management module offers the public images provided by PAI and also allows you to add custom images. This way, you can manage application images in the PAI console in a centralized manner.
④
Deep Learning Containers (DLC) provides a flexible, stable, easy-to-use, and high-performance training environment for machine learning. DLC supports various algorithm frameworks. It allows you to run ultra-large distributed deep learning tasks and create custom algorithm frameworks.
⑤
PAI allows you to use datasets that are stored in Apsara File Storage NAS (NAS), OSS, and Git repositories. You can specify the required datasets and code repositories when you submit jobs.
⑥
The model management module allows you to manage trained models in a centralized manner. This module is integrated with Elastic Algorithm Service (EAS) to allow you to deploy the trained models as online services.
⑦
EAS allows you to load and deploy models as online services based on CPU or GPU resources. EAS features high throughput and low latency, allows you to deploy a large number of complex models with a few clicks, and supports auto scaling in real time.
NoteEAS does not support DSW images or CPFS datasets.
Best practices for AI and big data
Section
Description
Reference
①
The source data that is used for model training is stored in MaxCompute as MaxCompute tables, preprocessed in DataWorks, and then referenced in PAI.
②
Machine Learning Designer supports large-scale distributed model training for traditional machine learning, deep learning, reinforcement learning, and stream/batch processing. This module provides hundreds of machine learning algorithms. Machine Learning Designer supports automatic parameter tuning and allows you to create models by dragging and dropping components. You can use Machine Learning Designer to experience AI-assisted computing with minimal code modification.
③
DataWorks schedules tasks based on the scheduling parameters and time properties that you configured.
④
The task management module allows you to store the following information in the task management service provided by PAI: the experiment data generated by Machine Learning Designer and the records of custom tasks. This helps you compare experiments that belong to different tasks.
⑤
The model management module allows you to manage trained models in a centralized manner. This module is integrated with Elastic Algorithm Service (EAS) to allow you to deploy the trained models as online services.
⑥
EAS allows you to load and deploy models as online services based on CPU or GPU resources. EAS features high throughput and low latency, allows you to deploy a large number of complex models with a few clicks, and supports auto scaling in real time.
NoteEAS does not support DSW images or CPFS datasets.