Access and store OSS data in PyTorch training jobs - Object Storage Service

Object Storage Service (OSS) Connector for AI/ML enables efficient reading of large-scale training data from OSS, significantly accelerating PyTorch model training and optimizing AI/ML task performance and efficiency.

Benefits

Item	Do not use OSS Connector for AI/ML	Use OSS Connector for AI/ML
Performance	You must manually optimize performance, which may be inefficient.	OSS Connector for AI/ML automatically optimizes the performance of OSS data download and checkpoint storage.
Data loading method	You must download data in advance, which increases costs and management workloads.	OSS Connector for AI/ML supports stream load to reduce cost and management complexity.
Data access	You must read and write data by using adapters, which increases access complexity.	OSS Connector for AI/ML directly reads and writes data in OSS to simplify access.
Configuration difficulty	You must compile code, which makes configuration difficult.	OSS Connector for AI/ML provides simple configuration options to improve development efficiency.

How it works

The following figure shows how OSS Connector for AI/ML works with PyTorch training jobs and OSS data.

Feature description

The following table describes the features of OSS Connector for AI/ML.

Item	Feature	Class	Method
Map-style dataset	Supports random access for quick retrieval of specific data during training.	OssMapDataset	The OssMapDataset and OssIterableDataset classes provide the same methods to build a dataset. from_prefix() Builds a dataset from an OSS_URI prefix. Ideal when OSS data storage paths follow uniform rules. from_objects() Builds a dataset from a list of OSS_URIs. Ideal when OSS data storage paths are known but scattered. from_manifest_file() Create a manifest file and use the manifest file to build a dataset. Ideal for datasets with a large number of files (tens of millions), frequent loading, and data indexing enabled on the bucket.
Iterable-style dataset	Supports sequential streaming reads for efficiently processing large volumes of continuous data.	OssIterableDataset
Checkpoint API operations	Loads checkpoints from OSS during model training and saves checkpoints to OSS at regular intervals, simplifying the training workflow.	OssCheckpoint	OssCheckpoint() Initializes an OssCheckpoint object for reading and writing checkpoints during model training. reader() Reads checkpoints from OSS. writer() Writes checkpoints to OSS.

Procedure

Before you access and store data in OSS in a PyTorch training job, you must install and configure OSS Connector for AI/ML. For more information, see Install OSS Connector for AI/ML and Configure OSS Connector for AI/ML.
After you install and configure OSS Connector for AI/ML, you can perform the following operations in PyTorch training jobs:
- Use OssMapDataset to build a map-style dataset suitable for random reading. For more information, see Use data in OSS to build a map dataset suitable for random reading.
- Use OssIterableDataset to build an iterable-style dataset suitable for sequential streaming reading. For more information, see Build an iterable dataset for sequential stream reading from OSS data.
- Use OssCheckpoint to store and access checkpoints. For more information, see Store and access checkpoints in OSS.
- Note
  Data in map-style and iterable-style datasets and checkpoints is of the same type. For more information about the supported methods of the data type, see Data type in OSS Connector for AI/ML.

Use cases

For a quick start, OSS Connector for AI/ML provides a demo that trains a handwritten digit recognition model using OSS data and saves the training results to OSS. For more information, see Get started with OSS Connector for AI/ML.
To further improve performance, use the accelerated endpoint of an OSS accelerator instead of the OSS internal endpoint. For a performance comparison, see Performance Tests.
To use OSS Connector for AI/ML in a containerized environment, you can use a Docker image that contains an OSS Connector for AI/ML environment. For more information about how to build a Docker image, see Build a Docker image that contains an OSS Connector for AI/ML environment.