Create and manage datasets with version control to track changes, reproduce experiments, and roll back if needed.
Dataset types
AI Asset Management supports basic and labeled datasets. Basic datasets contain raw unlabeled data for pre-training models. Labeled datasets contain annotated data for fine-tuning and evaluating models.
|
Item |
Basic dataset |
Labeled dataset |
|
Definition |
Raw, unlabeled data. |
Annotated data |
|
Data processing |
Data cleaning, deduplication, and more. |
Data labeling, validation, and quality control |
|
Use cases |
|
|
Access the Datasets page
-
Log on to the PAI console.
-
In the upper-left corner, select the region where your workspace resides.
-
In the left-side navigation pane, choose Workspaces. Click the workspace name.
-
In the left-side navigation pane, choose AI Asset Management > Datasets.
Create a basic dataset
On the Custom Datasets tab, click Create Dataset and select Basic for Dataset Type. Create datasets from Object Storage Service (OSS) or file systems (General-purpose NAS, Extreme NAS, CPFS, AI-CPFS).
Storage type: Object Storage Service (OSS)
|
Parameter |
Description |
|
Content Type |
Select the data type: image, text, audio, video, table, or general. Specify the type to filter datasets when creating labeling tasks. |
|
Owner |
Select the dataset owner. Only workspace administrators can configure this parameter. |
|
Import Format / OSS Path |
|
|
Default Mount Path |
Default mount path for data. Used in DSW and DLC:
|
|
Enable Version Acceleration |
Enable dataset version acceleration when Import Format is set to Folder. Key settings:
|
Storage type: file system
|
Parameter |
Description |
|
Content Type |
Select the data type: image, text, audio, video, table, or general. Specify the type to filter datasets when creating labeling tasks. |
|
Owner |
Select the dataset owner. Only workspace administrators can configure this parameter. |
|
File System |
Select a file system that corresponds to Storage Type. |
|
Mount Target |
Configure a mount target to access the file system. |
|
File System Path |
Specify the path to your data in the file system. For example, |
|
Default Mount Path |
Default mount path for data. Used in DSW and DLC:
|
|
Enable Version Acceleration |
If Storage Type is General-purpose NAS, Extreme NAS, or CPFS, enable dataset version acceleration. Key parameters:
|
Create a basic dataset version
On the Custom Datasets tab, click Create Version in the Actions column for the target dataset.

Notes on version creation:
-
Dataset name, storage type, and data type are inherited from V1 and are read-only.
-
System automatically generates the dataset version number (read-only).
-
For other parameters, see Create a basic dataset.
View public datasets
PAI provides built-in public datasets such as MMLU, CMMLU, and GSM8K. On the Public Dataset tab, click a dataset name to view its basic information.

Manage datasets
For custom datasets: view versions, create versions, set to public, or delete. For labeled datasets: view data, set to public, or delete.

Important notes:
-
For datasets with Visibility set to Dataset Owner, click Set Dataset to Public to share within the workspace. All workspace members can then view the dataset. Making a dataset public is irreversible.
-
If a RAM user encounters an access denied error when viewing dataset data, grant permissions to the RAM user.
-
Deleting a dataset might affect running tasks that depend on the dataset. Deletion is irreversible.