Mounting Object Storage Service (OSS) in Deep Learning Containers (DLC) - Platform For AI

When you submit a Deep Learning Containers (DLC) training job, you can configure Object Storage Service (OSS), File Storage NAS (NAS), Cloud Parallel File Storage (CPFS), or MaxCompute storage by using code configuration or mounts. This lets you directly read data from and write data to the specified storage during the training process.

Prerequisites

You have activated PAI (DLC) and created a workspace. For more information, see Activate PAI and create a default workspace.
(Optional) To use Object Storage Service (OSS) storage, complete the following steps:
- You have activated OSS and granted the required permissions to PAI. For more information, see Activate OSS and Authorize PAI service accounts.
- You have created an OSS bucket. For more information, see Console quick start.
(Optional) To use File Storage NAS (NAS), you must create a general-purpose NAS file system. For more information, see Create a file system.
(Optional) To use MaxCompute storage, you must activate MaxCompute and create a MaxCompute project. For more information, see Activate MaxCompute and Create a MaxCompute project.

Use OSS storage

Configure OSS storage with mounts

When you create a Deep Learning Containers (DLC) training job, you can mount Object Storage Service (OSS). The following mount types are supported. For configuration details, see Create a training job.

Mount type

Description

Mount dataset

Mount a dataset, which can be a custom dataset or a public dataset.

Public datasets support only read-only mounts.
For custom datasets from Object Storage Service (OSS), you can use the Read-only switch to set read and write permissions.

Select a dataset of the OSS type and configure the Mount Path. When the DLC job runs, the system accesses the OSS data at this path.

Mount storage

Mount an OSS bucket path and use the Read-only switch to set read and write permissions.

DLC supports JindoFuse and ossfs for mounting OSS:

JindoFuse: This is the default mount method. Its default configuration has functional limitations and may not suit all scenarios. For details, see JindoFuse. You can adjust the parameters for specific scenarios. For more information, see JindoFuse.
ossfs: When mounting an OSS bucket path with the storage mount method, you can specify {"mountType":"ossfs"} in Advanced Settings to use ossfs.

Configure OSS storage without mounts

DLC jobs can read from and write to Object Storage Service (OSS) using the OSS PyTorch Connector or an OSS SDK. When you create a training job, configure the relevant code files. For code examples, see OSS Connector for AI/ML or OSS SDK.

In the Code Configuration section, on the Online Configuration tab, select the code source from the drop-down list and enter the Mount Path for your code.

Use NAS/CPFS storage

When you create a Deep Learning Containers (DLC) job, you can use File Storage NAS (NAS) or Cloud Parallel File Storage (CPFS) storage by binding a custom dataset of the NAS/CPFS type or by using a storage mount. For configuration details, see Use NAS/CPFS.

In the Dataset Mount section, set the Mount Path, for example, /mnt/data/. In the Storage Mount section, configure Select File System, File System Mount Point, File System Path, and Mount Path in sequence, for example, /mnt/data2/. If the selected storage type has a single-availability-zone risk, an orange warning appears.

Mount type

Description

Mount dataset

Mount a custom dataset. You can use the Read-only switch to set read and write permissions.

Mount storage

Mount a File Storage NAS (NAS) or Cloud Parallel File Storage (CPFS) file system and use the Read-only switch to set read and write permissions.

Additionally, in Advanced Settings, you can set the nconnect parameter to improve the throughput when DLC containers access NAS. nconnect is a Linux mount option for NFS clients that increases throughput by establishing more TCP transport connections between the client and the server. For more information, see How do I resolve poor performance when accessing NAS from a Linux operating system?. Example:

// Replace <SampleValue> with a positive integer. 
{"nconnect":"<SampleValue>"}

Use MaxCompute storage

You can use MaxCompute storage without mounts. When you create a training job, configure the relevant code files. For code examples, see Use MaxCompute.

FAQ

Q: PAIIO job 'killed' without error

This issue is caused by insufficient resources. PAIIO does not limit memory usage, which can cause it to consume a large amount of memory when reading MaxCompute data. The process is likely terminated by the operating system when the container runs out of memory, as the OS and other system components also consume resources.