All Products
Search
Document Center

DataWorks:Manage datasets

Last Updated:Nov 14, 2025

The dataset feature in DataWorks lets you manage unstructured data, such as images and documents, for use within DataWorks. This topic describes how to create and use datasets.

Background

When you develop data in DataWorks, you can use the dataset feature to read and write data stored in OSS and NAS. This feature supports the creation and management of datasets and their versions. Version management lets you track data versions and quickly revert to a previous version if a new one has issues. This helps ensure that your business operations run smoothly.

Precautions

The dataset feature is currently in beta. The final features and stability may vary.

Billing

The DataWorks dataset feature is free of charge. However, storing data in OSS or NAS incurs storage and network access fees. For more information, see OSS billing and NAS billing.

Create a dataset

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Map. On the page that appears, click Go to Data Map.

  2. In the navigation pane on the left of the Data Map page, click Data Catalog (image) to open the Data Catalog page. In the Directory List, click Dataset Catalog.

  3. Find the workspace in which you want to create a dataset and click its name. This action opens the dataset details page for the workspace, which displays all existing datasets. Click the Create Dataset button and follow the instructions to create a DataWorks dataset.

Storage class: OSS

  • Dataset configuration:

    Configuration item

    Description

    Storage class

    OSS

    Content type

    Select the type of data you are registering. This is optional. The default is General.

  • Import configuration:

    Configuration item

    Description

    OSS path

    Specify the path of the OSS folder to mount.

    Note

    Make sure you have the required OSS Bucket permissions.

    Default mount path

    Specify the default mount path for the OSS folder. You can use this path to access the data in DataWorks. The system default is /mnt/data/. You can change the mount path manually.

Storage class: NAS

  • Dataset configuration:

    Configuration item

    Description

    Storage class

    Select File Storage (General-purpose NAS file systems) or File Storage (Extreme NAS file systems)

    Content type

    Select the type of data you are registering. This is optional. The default is General.

  • Import configuration:

    Configuration item

    Configuration description

    File system

    Select the destination NAS file system created in the current region under your Alibaba Cloud account.

    File system mount target

    Configure a mount target to access the NAS file system.

    Important

    Make sure the VPC of the mount target is connected to the VPC of the resource group:

    • Use the same VPC for the NAS mount target and the resource group to ensure network connectivity.

    • For other scenarios, see Network connectivity solutions to connect the VPC of the NAS mount target to the VPC configured for the resource group.

    File system path

    Specify the path of the NAS folder to mount. The default is the root directory /. Make sure this path exists in the NAS file system. Otherwise, an error occurs when you use the dataset.

    Default mount path

    Specify the default mount path in the dataset for the NAS folder. You can then use this path to access the data in the NAS path from DataWorks. The system default is /mnt/data/. You can change the mount path manually.

Manage datasets

In Data Catalog > Dataset Catalog, navigate to the dataset list of the destination workspace. In the Operation column of the dataset that you want to manage, click Details. This action opens the dataset details page. On this page, you can view the Overview and Dataset Version information and perform the following operations:

  • Create Version: Click the Create Version button in the upper-right corner to open the version creation page. When you create a new version, you can customize the OSS Path or NAS File System Configuration and set the Default Mount Path.

  • Delete Dataset: Click the Delete button in the upper-right corner of the dataset details page to delete the dataset.

  • View Dataset Data: This operation is supported only for Object Storage Service (OSS) datasets. In the Dataset Version section, select the desired version from the drop-down menu next to the title. Then, click View In OSS. You will be redirected to the storage path for that version in the OSS console.

  • Delete Version: In the Dataset Version section, select the desired version from the drop-down menu next to the title. Then, click the Delete button.

Important

Deleting a dataset or a dataset version does not delete the original files. However, the deleted dataset or version cannot be recovered from the DataWorks dataset feature. Proceed with caution.

Use a dataset

You can use datasets that you create in Data Studio, such as Shell nodes, Python node, and Notebook development, and in your personal development environment.

For more information, see Use a dataset.