Object Storage Service (OSS) Connector for AI/ML enables efficient reading of large-scale training data from OSS, significantly accelerating PyTorch model training and optimizing AI/ML task performance and efficiency.
Benefits
|
Item |
Do not use OSS Connector for AI/ML |
Use OSS Connector for AI/ML |
|
Performance |
You must manually optimize performance, which may be inefficient. |
OSS Connector for AI/ML automatically optimizes the performance of OSS data download and checkpoint storage. |
|
Data loading method |
You must download data in advance, which increases costs and management workloads. |
OSS Connector for AI/ML supports stream load to reduce cost and management complexity. |
|
Data access |
You must read and write data by using adapters, which increases access complexity. |
OSS Connector for AI/ML directly reads and writes data in OSS to simplify access. |
|
Configuration difficulty |
You must compile code, which makes configuration difficult. |
OSS Connector for AI/ML provides simple configuration options to improve development efficiency. |
How it works
The following figure shows how OSS Connector for AI/ML works with PyTorch training jobs and OSS data.
Feature description
The following table describes the features of OSS Connector for AI/ML.
|
Item |
Feature |
Class |
Method |
|
Map-style dataset |
Supports random access for quick retrieval of specific data during training. |
The OssMapDataset and OssIterableDataset classes provide the same methods to build a dataset.
|
|
|
Iterable-style dataset |
Supports sequential streaming reads for efficiently processing large volumes of continuous data. |
||
|
Checkpoint API operations |
Loads checkpoints from OSS during model training and saves checkpoints to OSS at regular intervals, simplifying the training workflow. |
|
Procedure
-
Before you access and store data in OSS in a PyTorch training job, you must install and configure OSS Connector for AI/ML. For more information, see Install OSS Connector for AI/ML and Configure OSS Connector for AI/ML.
-
After you install and configure OSS Connector for AI/ML, you can perform the following operations in PyTorch training jobs:
-
Use OssMapDataset to build a map-style dataset suitable for random reading. For more information, see Use data in OSS to build a map dataset suitable for random reading.
-
Use OssIterableDataset to build an iterable-style dataset suitable for sequential streaming reading. For more information, see Build an iterable dataset for sequential stream reading from OSS data.
-
Use OssCheckpoint to store and access checkpoints. For more information, see Store and access checkpoints in OSS.
-
Note
Data in map-style and iterable-style datasets and checkpoints is of the same type. For more information about the supported methods of the data type, see Data type in OSS Connector for AI/ML.
-
Use cases
-
For a quick start, OSS Connector for AI/ML provides a demo that trains a handwritten digit recognition model using OSS data and saves the training results to OSS. For more information, see Get started with OSS Connector for AI/ML.
-
To further improve performance, use the accelerated endpoint of an OSS accelerator instead of the OSS internal endpoint. For a performance comparison, see Performance Tests.
-
To use OSS Connector for AI/ML in a containerized environment, you can use a Docker image that contains an OSS Connector for AI/ML environment. For more information about how to build a Docker image, see Build a Docker image that contains an OSS Connector for AI/ML environment.