All Products
Search
Document Center

Platform For AI:Terms

Last Updated:Feb 07, 2024

This topic describes the terms related to management, AI development, and modules in Machine Learning Platform for AI (PAI).

Terms related to management

Term

Description

workspace

Workspace is a key concept in PAI. Workspaces can help enterprises and teams manage computing resources and control permissions in a centralized manner. Workspaces also provide AI developers with development tools and AI asset management capabilities to allow different teams to collaborate throughout the entire lifecycle of AI development. PAI runs on top of DataWorks. Therefore, workspaces in PAI are mapped to workspaces in DataWorks.

Default workspace: The default workspace contains commonly used pay-as-you-go resources. You need to activate the default workspace before you can use these resources. The default workspace can help first-time users quickly get started with model development and training without the need to understand concepts such as resource groups.

Deep Learning Containers (DLC)

DLC is a cloud-native platform for basic AI computing jobs. DLC provides an elastic, stable, easy-to-use, and high-performance environment for model training. DLC provides multiple algorithm frameworks, allows you to run a large number of deep learning jobs in a distributed manner, and supports custom algorithm frameworks. DLC supports the following types of clusters:

  • Fully managed clusters: Fully managed clusters are public resource groups or dedicated resource groups. The administrators of a workspace can associate fully managed clusters with the workspace to use these clusters.

  • Semi-managed clusters: Semi-managed clusters are self-managed resource groups. Semi-managed clusters provide separate dashboards and can be used more flexibly than fully managed clusters.

resource group

  • You can create resource groups to classify computing resources by different dimensions, such as purposes, permissions, and ownership. Resource groups can be used to isolate computing resources that belong to different users or workspaces.

  • Resource groups refer to all underlying resources that are used by different modules in PAI, such as MaxCompute quota groups, DLC clusters, Kubernetes clusters, E-MapReduce (EMR) clusters, Flink clusters, and Elastic Compute Service (ECS) clusters.

  • You can purchase and create resource groups on the MaxCompute, EMR, or other consoles by using an Alibaba Cloud account or the resource administrator role. The purchased resource groups can be consumed in workspaces.

member

Members are Alibaba Cloud accounts or Resource Access Management (RAM) users that join workspaces. Members in the same workspace can assume different roles to collaborate throughout the AI development pipeline. Only the owner and administrators of a workspace can modify the members in the workspace.

role

Roles are mappings between members and permissions. You can use the system roles or create custom roles. The system roles include:

  • Resource administrator: This role has the permissions to purchase and manage computing resources. In most cases, Alibaba Cloud enterprise accounts assume this role. To manage the permissions provided by this role and assign the role, you need to log on to the RAM console.

  • Workspace owner: This role provides the permissions to modify the members in the workspace and reference resource groups. This role is automatically assigned to the user who created the workspace.

  • Workspace administrator: This role provides the permissions to modify the members and manage all assets in the workspace, including resource groups.

  • Algorithm developer: This role provides the permissions to develop and train models in the workspace.

  • Algorithm operator: This role provides the permissions to manage job priorities, publish models, and monitor online services.

  • Label administrator: This role provides the permissions to use iTAG.

  • Guest: This role provides only the permissions to view all assets in the workspace.

dependency

PAI relies on other Alibaba Cloud services. To use the features provided by PAI, you must first activate these Alibaba Cloud services and complete RAM authorization by using an Alibaba Cloud account or the resource administrator role. The Alibaba Cloud services that you need to activate include Object Storage Service (OSS), Apsara File Storage NAS (NAS), Log Service, Container Registry, and API Gateway.

Terms related to AI development

Term

Description

dataset

Datasets are used in labeling, model training, and model evaluation. You can create datasets to use structured data or unstructured data or mount directories in datastores such as OSS, NAS, and MaxCompute. In addition, you can centrally manage the storage, versions, and schemas of datasets in PAI.

pipeline

Pipelines are directed acyclic graphs (DAGs) that consist of upstream components and downstream components connected based on logical scheduling. You can submit multiple runs for a pipeline to generate PipelineRuns.

PipelineDraft

PipelineDrafts are configurable pipeline objects on the canvas of Designer. You can submit runs for PipelineDrafts to generate PipelineRuns.

component

Components are the smallest configurable units in pipelines and PipelineDrafts, and the smallest executable units in PipelineRuns. Components consist of the following types:

  • Built-in components: PAI provides built-in components that can be used throughout the model development pipeline. These components are developed based on the best practices of Alibaba Cloud and can be used to preprocess data, train models, and make predictions.

  • Custom components: You can create custom components based on code or images, and add these custom components to your pipelines.

node

Nodes represent components that are dragged and dropped to the canvas to form a pipeline.

snapshot

Each time the system runs a single node in a PipelineDraft, multiple nodes in the PipelineDraft, or the entire PipelineDraft, the system creates a snapshot for the configurations of the PipelineDraft. The snapshot includes the node configurations, runtime parameters, and execution mode. Snapshots can be used in PipelineDraft versioning and configuration rollbacks.

PipelineRun

A PipelineRun represents a single run of a pipeline. After you can use Designer to submit a run for a PipelineDraft or use the SDK to submit a run for a pipeline, a PipelineRun is generated.

job

Jobs use different types of computing resources. You can create multiple types of jobs, such as DLC jobs and MC jobs. Compared with run and PipelineRun, job is an underlying concept.

run

A run represents a single execution of a pipeline. Run is equivalent to the same concept in MLflow. All runs must belong to an experiment. You can use runs to track the training jobs that you submitted in PAI. You can also use the MLflow client to submit runs from an on-premises machine.

model

Models are generated by training jobs based on datasets, algorithms, and code. You can use models to make predictions.

Processor

A processor is a package of online prediction logic, including the logic for loading models and handling requests. In most cases, processors are deployed together with model files to provision services. Processors consist of the following types:

  • Built-in processors: EAS provides built-in processors for commonly used models, such as Predictive Model Markup Language (PMML) models and TensorFlow models.

  • Custom processors: If the built-in processors cannot meet your business requirements, you can create custom processors that conform to the processor development standards.

service

You can deploy model files together with the online prediction logic to provision services. EAS allows you to create, update, start, stop, scale out, and scale in services.

image

Machine Learning Platform for AI allows you to manage the following Docker images as AI assets:

  • Public images that are provided by Machine Learning Platform for AI.

  • Custom images that are saved in DSW.

  • Custom images that are saved in Container Registry.

You can use images in pipelines to create custom components to complete specific jobs. Images can also provide runtime environments for DSW instances and training jobs.

instance

Instances are the smallest units for provisioning computing resources. Instances consist of the following types:

  • DSW instances: DSW instances are notebook instances. Each DSW instance provides a certain amount of computing resources for you to modify code, perform debugging, or train models.

  • EAS instances: EAS instances are service processes. You can deploy more than one EAS instance for each service to increase the number of concurrent requests that can be handled by the service.

Terms related to modules

Term

Description

iTAG

iTAG is a dataset labeling tool that is developed with black box models to help improve the quality and efficiency of dataset labeling.

Designer

Designer is intended for pipeline design in the AI sector. Designer provides a variety of built-in machine learning algorithm components. You can drag and drop these components to train models without coding.

Data Science Workshop (DSW)

DSW is an integrated development environment (IDE) intended for interactive AI development in the cloud. DSW consists of Notebook, VS Code, and Terminal. You can use images to deploy DSW instances that use NAS as the storage.

Deep Learning Containers (DLC)

You can submit training jobs to computing resource groups, such as DLC clusters, in the current workspace. After you submit training jobs, you can view the details about the jobs in the Jobs module of the PAI console.

Elastic Algorithm Service (EAS)

EAS allows you to deploy complex models as services on a large scale with a few clicks. EAS supports real-time scaling and provides a sophisticated monitoring and maintenance system.

AI Asset Management

This module allows you to manage key AI assets, including datasets, models, and source code repositories.

Scenario-based solution

A collection of solutions provided by PAI to help you resolve issues in vertical markets.