Platform for AI (PAI)
|
Module |
Feature |
Description |
Reference |
|
AI computing resource management |
Lingjun resources |
PAI provides Lingjun resources for large-scale, high-density computing workloads. Lingjun resources deliver heterogeneous computing power required for high-performance AI training and computation. You can leverage Lingjun resources for training tasks in PAI. |
|
|
General training resources |
General training resources are deep learning training resources built on Container Service for Kubernetes (ACK). These resources provide scalable, stable, user-friendly, and high-performance runtimes for training deep learning models. |
||
|
Other big data computing resources |
Big data computing resources including MaxCompute and Realtime Compute for Apache Flink. |
||
|
Workspaces |
Resource management |
Workspace administrators can associate AI computing resources from the current Alibaba Cloud account with the workspace, enabling workspace members to utilize these resources for development and training activities. |
|
|
Workspace notification |
PAI provides a notification mechanism for workspaces. You can create notification rules to track and monitor Deep Learning Containers (DLC) jobs or Machine Learning Designer pipelines. Notification rules can also trigger events based on model version status changes. |
||
|
Workspace storage and SLS configuration |
Workspace administrators can configure the default storage path for development and training within the workspace, as well as the storage lifecycle for temporary tables. |
||
|
Member and permission management |
PAI employs role-based access control with multiple predefined roles, including labeling administrators, algorithm developers, and algorithm operations and maintenance (O&M) personnel, facilitating efficient collaboration. You can manage AI asset visibility scope within workspaces and configure access permissions for different roles. |
||
|
QuickStart |
Model Hub |
PAI provides access to diverse pre-trained models from open-source communities including ModelScope and Hugging Face. |
|
|
Pre-trained model training |
You can utilize the pre-trained models for training tasks in PAI. |
||
|
Pre-trained model deployment |
You can deploy the pre-trained models as services in PAI. |
||
|
Machine Learning Designer |
Pipeline building |
Machine Learning Designer enables you to build and debug models using visual pipelines. You can drag and drop components onto the canvas to construct pipelines tailored to your business requirements. |
|
|
Pipeline import and export |
You can export pipelines as JSON files and import JSON files into workspaces to reconstruct pipelines. |
||
|
Pipeline scheduling |
You can leverage DataWorks to schedule Machine Learning Designer pipelines on a periodic basis. |
Use DataWorks tasks to schedule pipelines in Machine Learning Designer |
|
|
Preset pipeline templates |
PAI provides industry-specific pipeline templates covering domains such as product recommendation, news classification, financial risk management, weather prediction, healthcare diagnostics, agricultural lending, and demographic analysis. These templates include complete datasets and documentation for streamlined implementation. |
||
|
Custom pipeline templates |
You can create custom pipeline templates based on your proprietary algorithm workflows and share them with team members. Team members can directly perform modeling, deployment, and production validation using these custom templates. |
||
|
Dashboards |
Machine Learning Designer provides interactive dashboards to visualize data analysis, model performance, and prediction results. |
||
|
Preset algorithm component library |
PAI provides hundreds of built-in algorithm components spanning multiple domains including data sources, data preprocessing, feature engineering, statistical analysis, machine learning, time series analysis, recommendation systems, anomaly detection, natural language processing, network analysis, financial analytics, computer vision, speech processing, and custom algorithms. |
||
|
Custom algorithms |
You can implement custom nodes using multiple programming interfaces including SQL, Python, and PyAlink scripts. |
||
|
Data Science Workshop (DSW) |
Cloud-native development environment |
DSW provides a flexible, stable, user-friendly, and high-performance environment for AI development, offering both CPU-accelerated and GPU-accelerated resources to support training workflows. |
|
|
DSW Gallery |
DSW Gallery provides easy-to-use cases from various industries and technical verticals to help improve development efficiency. |
||
|
JupyterLab |
DSW integrates open source JupyterLab and provides plug-ins for custom development. You can directly start Notebook to write, debug, and run Python code without O&M configurations. |
||
|
WebIDE |
DSW provides WebIDE in which you can install open source plug-ins for modeling. |
||
|
Terminal |
DSW supports character terminals to debug models. |
||
|
Persistent instance environment |
You can manage the lifecycle of the development environment, save the instance environment, mount and share data, and persist the environment image. |
||
|
Resource usage monitoring |
You can view real-time resource usage in a visualized manner. |
||
|
Image creation |
You can create an image and save the image to Container Registry for subsequent distributed training or inference. |
||
|
SSH remote connection |
DSW provides the following SSH connection methods: direct connection and proxy client connection. You can select a connection method based on the resource dependencies, usage methods, and limits of the connection methods to meet your business requirements. |
||
|
Deep Learning Containers (DLC) |
Cloud-native distributed training environment |
DLC is a deep learning platform developed based on Container Service for Kubernetes (ACK) that provides stable, easy-to-use, scalable, and high-performance runtimes for training deep learning models. |
|
|
Dataset mounting |
You can mount multiple datasets, such as File Storage NAS or Object Storage Service (OSS) datasets, in DLC at the same time. |
||
|
Public and dedicated resource groups |
DLC provides public and dedicated resource groups. |
||
|
Official and custom images |
DLC allows you to use official images or custom images to submit training jobs. |
||
|
Distributed trainings |
DLC provides a distributed deployment solution for implementing data parallelism, model parallelism, and hybrid parallelism. |
||
|
Training job management |
DLC allows you to manage jobs during the entire lifecycle. |
||
|
Elastic Algorithm Service (EAS) |
Resource group management |
EAS provides resources in resource groups for isolation. When you create a model service, you can deploy the model service in the public resource group provided by the system or a dedicated resource group that you created. |
|
|
Service and application deployment |
You can deploy models that you downloaded from the open source community or models that you trained as inference services or AI-powered web applications in EAS. EAS provides multiple methods that you can use to deploy models. You can use the PAI console to deploy models as API services. |
||
|
Service debugging and stress testing |
After you deploy the service, you can use the online debugging and stress testing feature to test whether the service runs as expected. |
||
|
Auto scaling |
You can configure automatic scaling, scheduled scaling, and elastic resource pools for EAS services. |
||
|
Service calls |
EAS provides the following service call methods based on the network environment of the client: Internet access, VPC access, and VPC direct connection. |
||
|
Asynchronous inference |
EAS provides the asynchronous inference feature, which allows you to obtain inference results by subscribing to requests or polling. |
||
|
Integrated resource group and service management capabilities |
EAS provides standard OpenAPI and SDKs that support integration. |
||
|
AI computing asset management |
Datasets |
PAI provides public datasets and supports dataset management during labeling and modeling. PAI also support OSS and NAS datasets and SDK calls. |
|
|
Models |
PAI allows you to manage versions, lineages, evaluation metrics, and associated services of models in a centralized manner. |
||
|
Tasks |
PAI supports management of distributed training tasks and PAIFlow pipeline runs. |
||
|
Images |
PAI provides official images and supports image management. |
||
|
Code builds |
You can register code repositories to PAI to facilitate code version management in PAI modules. |
||
|
Custom components |
You can create custom algorithm components based on your business requirements. You can use custom components together with preset components in Machine Learning Designer to manage pipelines in a flexible manner. |
- |
|
|
AutoML |
Automatic hyperparameter optimization (HPO) |
HPO is used to automatically fine-tune model-related parameters and training parameters. |
|
|
Scenario-based solutions |
Multimedia analysis |
PAI provides ready-to-use image-related services such as image labeling, classification, and quality evaluation. |
|
|
AI acceleration |
Dataset Accelerator |
DatasetAcc is a PaaS service developed by Alibaba Cloud to accelerate AI and datasets in the cloud. DatasetAcc provides dataset acceleration solutions for various cloud-native training engines by pre-analyzing and preprocessing training datasets used in machine learning training. This helps improve the overall training efficiency. |
- |
|
Easy Parallel Library (EPL) |
EPL is an efficient and easy-to-use framework for distributed model training. EPL uses multiple training optimization technologies and provides easy-to-use API operations that allow you to use parallelism strategies. You can use EPL to reduce costs and improve the efficiency of distributed model training. |
||
|
PAI-Rapidformer |
PAI-Rapidformer applies various technologies to optimize the training of PyTorch transformers and provide optimal training performance. |
||
|
Blade |
Blade integrates various optimization technologies. You can use PAI-Blade to optimize the inference performance of a trained model. |
||
|
PAI-SDK |
Distributed model training |
PAI SDK for Python provides an easy-to-use HighLevel API that allows you to submit training jobs to PAI and run the jobs in the cloud. |
|
|
Service deployment |
PAI SDK for Python provides an easy-to-use HighLevel API that allows you to deploy models to PAI and create inference services. |