AI asset lineage service - Platform For AI - Alibaba Cloud Documentation Center

The AI asset lineage service is an important technical service used to optimize enterprise AI asset management. The AI asset lineage service can comprehensively track and understand data and model origins and evolutions to significantly improve asset transparency and management efficiency. The service also provides strong support and competitive advantages for enterprises in the AI industry.

Introduction

The AI asset lineage service is designed to help enterprises manage and optimize AI assets in an efficient manner. The service enables enterprises to track and understand the origin, usage, and evolution of data and models in an efficient manner. The service covers a range of AI assets, including datasets (versions), data processing pipelines, training jobs, models (versions), model services, and other metadata. The details page of each asset provides an entry point to view the lineage information for easy access and analysis.

The AI asset lineage service is suitable for the following application scenarios:

AI asset management: The service provides detailed asset lineage information, which allows enterprises to gain insights into the origin and usage of their AI assets. This improves the quality of data and models and ensures that the AI practices of an enterprise comply with regulatory standards to support precise data management and decision-making.
Model traceability: In the context of Responsible AI, maintaining AI model transparency is crucial. The service enables enterprises to trace datasets, feature engineering methods, and parameter adjustments used in model training to meet regulatory compliance, verify results of experiments, and perform model audits.
Troubleshooting and optimization: The asset lineage information can be used to quickly locate the root cause of performance issues in AI services. For example, a sudden drop in model prediction accuracy can be traced to changes in upstream data processing. A lineage graph helps enterprises quickly identify and resolve the issue.
Resource usage improvement: Understanding task dependencies enables enterprises to allocate computing resources in a reasonable manner. This prevents redundant calculations and reduces costs. The lineage information also helps enterprises identify dependencies between tasks and data and enables enterprises to identify which tasks can be executed in parallel in large-scale experiments. This improves resource usage and processing.
Collaboration efficiency improvement: Multiple teams can share the same infrastructure for research in a large organization. Clear lineage information facilitates cross-team communication and knowledge sharing, and accelerates the innovation process.

Prerequisites

To use the AI asset lineage service, activate DataWorks in the DataWorks console.

DataWorks Standard Edition: If you require the common AI asset lineage service, activate DataWorks Standard Edition.
DataWorks Professional Edition: If you require the lineage reporting capabilities for Deep Learning Containers (DLC) training and pipeline jobs, activate DataWorks Professional Edition.

For more information about DataWorks editions, see Differences among DataWorks editions.

Lineage reporting method and entry point

Lineage reporting refers to the automatic or manual recording and generation of various metadata related to AI models and relationships between metadata and models during development, training, deployment, and maintenance in Platform for AI (PAI). Lineage reporting involves the following operations:

Create a dataset

Datasets support version management. Each version is an independent lineage asset. You can view the lineage information about a specific dataset version and the upstream and downstream relationships of the dataset version.

Lineage information structure
Entry point: Create and manage datasets
Entry point to view the lineage information: In the dataset list, find the desired dataset and click the name of the dataset. In the Version Details section of the dataset details page, click View Lineage to view the lineage information.

Preprocess data

Entry point: If you perform a data analysis task in a production environment based on the MaxCompute engine in DataWorks and the input and output is a MaxCompute table or OSS path, you can also view and analyze the lineage information. For example, you obtain a MaxCompute table after you perform multiple SQL tasks and register the table as a PAI dataset. You can trace the tasks that generate the table based on the lineage information.
Entry point to view the lineage information: In the dataset list, find the desired dataset and click the name of the dataset. In the Version Details section of the dataset details page, click View Lineage to view the lineage information.

Create a labeling job

When you create a data labeling job in iTAG, you must specify an input dataset. After the labeling job is created, the system automatically displays the lineage information in the following structure.

Lineage information structure
Entry point
- Create a labeling job
- Export labeling result data
Entry point to view the lineage information: In the labeled dataset list, find the desired labeled dataset and click the name of the source dataset. In the Version Details section of the dataset details page, click View Lineage to view the lineage information.

Create a pipeline job

You can regard a pipeline job as an independent asset. During the submission of a pipeline job in Machine Learning Designer, if the pipeline contains the Read Table, Read File Data, Model Register, or Dataset Register component, the system automatically reports the lineage information in the following structure after the pipeline job is executed.

Lineage information structure
Entry point: Create a pipeline
On the pipeline details page, add related components based on your business requirements. In this example, the Read File Data and Dataset Register components are added.
Entry point to view the lineage information: In the pipeline job list, find the desired pipeline job and click the name of the pipeline job. In the Basic Information section of the pipeline details page, click View Lineage to view the lineage information.

Create a model training job

Model Gallery

After the model training job that you submit is executed in Model Gallery, the system automatically displays the lineage information in the following structure.

Lineage information structure
Entry point: Train a model
Entry point to view the lineage information: In the model list, find the desired model and click the name of the model. In the model version list of the model details page, select the desired model version and click the version number. In the Model Version panel, click View Lineage to view the lineage information.

DLC

During the submission of a model training job, you can manually report the lineage information and configure inputs and outputs based on your business requirements. This solution is suitable for users who have advanced technical skills and well-established business. Otherwise, the accuracy of the lineage information may be affected. If you have questions, contact your business manager to add you to the whitelist for feature access.

Register a model

Models support version management. Each version serves as an independent lineage asset. You can view the lineage information about a specific model version and the upstream and downstream relationships of the model.

Lineage information structure
Entry point: Register a model
Note
You can manually register a model. After a training job that you submit in Model Gallery is executed, the generated model will also be automatically registered as a model in the current workspace. For more information, see Model Gallery.
Entry point to view the lineage information: In the model list, find the desired model and click the name of the model. In the model version list of the model details page, select the desired model version and click the version number. In the Model Version panel, click View Lineage to view the lineage information.

Deploy a model service

Lineage information structure
Entry point: Register a model
- Choose AI Asset Management > Models. On the Model page, find the desired model and click Deploy in EAS in the Actions column.
- On the Events tab of the Workspace Details page, click Create Event Rule. In the Create Event Rule panel, set the Event Type parameter to Models and select Version Approved from the drop-down list.
  When the value of the Version Approval Status parameter for a specific model version changes from Pending to Approved, the model service is automatically updated.
Entry point to view the lineage information: In the model service list, find the desired model service and click the name of the model service. On the Overview tab of the model service details page, click View Lineage in the Basic Information section to view the lineage information.
Note:
If a service in Elastic Algorithm Service (EAS) has multiple versions, the service corresponds to the same EAS instance in the lineage information. To analyze a specific version of the service, update the VersionId value of the service to locate the version.

References

Create and manage datasets

Manage models

Manage pipeline tasks