All Products
Search
Document Center

Platform For AI:Create and manage datasets

Last Updated:Mar 11, 2026

Create and manage datasets with version control to track changes, reproduce experiments, and roll back if needed.

Dataset types

AI Asset Management supports basic and labeled datasets. Basic datasets contain raw unlabeled data for pre-training models. Labeled datasets contain annotated data for fine-tuning and evaluating models.

Item

Basic dataset

Labeled dataset

Definition

Raw, unlabeled data.

Annotated data

Data processing

Data cleaning, deduplication, and more.

Data labeling, validation, and quality control

Use cases

  • Unsupervised learning

  • Pre-training models to capture broad features

  • Supervised learning and model evaluation

  • Fine-tuning models to improve performance on specific tasks

Access the Datasets page

  1. Log on to the PAI console.

  2. In the upper-left corner, select the region where your workspace resides.

  3. In the left-side navigation pane, choose Workspaces. Click the workspace name.

  4. In the left-side navigation pane, choose AI Asset Management > Datasets.

Create a basic dataset

On the Custom Datasets tab, click Create Dataset and select Basic for Dataset Type. Create datasets from Object Storage Service (OSS) or file systems (General-purpose NAS, Extreme NAS, CPFS, AI-CPFS).

Storage type: Object Storage Service (OSS)

Parameter

Description

Content Type

Select the data type: image, text, audio, video, table, or general. Specify the type to filter datasets when creating labeling tasks.

Owner

Select the dataset owner. Only workspace administrators can configure this parameter.

Import Format / OSS Path

  • File: Specify a single file path in OSS. Used for creating datasets for iTAG.

  • Folder: Specify a folder path in OSS. The folder mounts in containers. Used for datasets in DSW, DLC, or EAS.

Default Mount Path

Default mount path for data. Used in DSW and DLC:

  • In DSW, mount an existing file system to this path when creating instances.

  • In DLC, access files in this directory from code. For example, python /root/data/file.py.

Enable Version Acceleration

Enable dataset version acceleration when Import Format is set to Folder. Key settings:

  • Maximum Capacity: Capacity of the acceleration slot. Must be greater than or equal to the dataset size.

  • Accelerated Mount Target: Uses internal mount target by default. Select an existing accelerated mount target or create one.

    Note

    When using Lingjun Intelligent Computing Resources, if Create Mount Target is selected for Accelerated Mount Target, set Mount Target Type to VPC. The selected VPC and vSwitch must match those used by Lingjun resources.

  • Accelerated Version Default Mount Path: Default mount path for the accelerated dataset version.

Storage type: file system

Parameter

Description

Content Type

Select the data type: image, text, audio, video, table, or general. Specify the type to filter datasets when creating labeling tasks.

Owner

Select the dataset owner. Only workspace administrators can configure this parameter.

File System

Select a file system that corresponds to Storage Type.

Mount Target

Configure a mount target to access the file system.

File System Path

Specify the path to your data in the file system. For example, /.

Default Mount Path

Default mount path for data. Used in DSW and DLC:

  • In DSW, mount an existing file system to this path when creating instances.

  • In DLC, access files in this directory from code. For example, python /root/data/file.py.

Enable Version Acceleration

If Storage Type is General-purpose NAS, Extreme NAS, or CPFS, enable dataset version acceleration. Key parameters:

  • Maximum Capacity: Capacity of the acceleration slot. Must be greater than or equal to the dataset size.

  • Accelerated Version Default Mount Path: Default mount path for the accelerated dataset version.

Create a basic dataset version

On the Custom Datasets tab, click Create Version in the Actions column for the target dataset.

image

Notes on version creation:

  • Dataset name, storage type, and data type are inherited from V1 and are read-only.

  • System automatically generates the dataset version number (read-only).

  • For other parameters, see Create a basic dataset.

View public datasets

PAI provides built-in public datasets such as MMLU, CMMLU, and GSM8K. On the Public Dataset tab, click a dataset name to view its basic information.

image

Manage datasets

For custom datasets: view versions, create versions, set to public, or delete. For labeled datasets: view data, set to public, or delete.

image

Important notes:

  • For datasets with Visibility set to Dataset Owner, click Set Dataset to Public to share within the workspace. All workspace members can then view the dataset. Making a dataset public is irreversible.

  • If a RAM user encounters an access denied error when viewing dataset data, grant permissions to the RAM user.

  • Deleting a dataset might affect running tasks that depend on the dataset. Deletion is irreversible.