Read and write dataset data - Platform For AI - Alibaba Cloud Documentation Center

Platform for AI (PAI) provides a disk with a specific quota for Data Science Workshop (DSW) instances that are created by using the public resource group. You can use the disk to persistently store the data. If the DSW instance is stopped and not launched for more than 15 days, the disk is cleared. DSW instances that are created by using a dedicated resource group provide non-persistent on-premises storage. If you want to persist DSW data, create an Apsara File Storage NAS or Object Storage Service (OSS) dataset and mount the dataset to the specified path of DSW. This allows you to read data from and write data to the dataset in DSW. This topic describes how to create and mount datasets.

Step 1: Create datasets

Log on to the PAI console and create a dataset in AI Computing Asset Management of the workspace. For more information, see Create and manage datasets.

Key parameters:

Create Dataset: Select From Alibaba Cloud Product.
Select data store: Select Alibaba Cloud Object Storage (OSS) or Alibaba Cloud file storage (NAS).
Property: Folder is automatically selected and cannot be changed.
Note
Only directories are supported. Files are not supported.

For information about how to configure other parameters, see Create a dataset based on data that is stored in an Alibaba Cloud storage service.

Step 2: Mount datasets

A dedicated resource group is selected in the following example.

Go to the page on which you can create DSW instances and configure the parameters in the Configure Instance step. For more information, see Create a DSW instance.
Configure the required parameters and click Next.
Storage: Specify the dataset that you want to mount to the instance.
- If you set Resource Group to Public Resource Group, you can click Shared Datasets and select a NAS or OSS dataset that you want to mount.
- If you set Resource Group to a dedicated resource group, you can click Shared Datasets and select the dataset that you want to mount to persist data.
Note
- You cannot attach multiple datasets to the same directory.
- We recommend that you do not frequently perform write operations on OSS directories to which datasets are mounted.
- If you use a CPFS dataset, you need to set the virtual private cloud (VPC) for the instance. The VPC must be the same as the one used by the CPFS dataset. Otherwise, the DSW instance may fail to be created.
- NAS provides better support for the Filesystem in Userspace (FUSE) interface than OSS. As such, the first dataset that you want to add must be of the NAS type and mounted to the specified directory or the default DSW working directory /home/admin/workspace.
For information about how to configure other parameters, see Create a DSW instance.
On the Confirm page, confirm the Datasets configuration and click Create Instance.
The instance takes about 10 minutes to be created. Then, you can view the instance that is in the Running state.
Find the created instance and click Launch in the Actions column.
In the top navigation bar of the DSW page, click the Terminal tab.
In the Terminal tab, run the following command to check whether NAS and OSS datasets are mounted:
```
# Query the mount directory of a NAS dataset.
mount | grep nas
# Query the mount directory of an OSS dataset.
mount | grep oss
```
If a similar response to the following figure is returned, the datasets are mounted.
- NAS datasets are mounted to the /mnt/data_nas, /mnt/workspace, or /home/admin/workspace directory. /mnt/data_nas indicates the mount directory that you specified when you created the DSW instance. The other two directories are the default directories of DSW provided for your first NAS dataset. As long as your NAS resources and server work as expected, your data and code persist.
- OSS datasets are mounted to the /mnt/data_oss directory.