Efficient data loading is critical for large-scale machine learning and deep learning workloads. This topic compares the performance of different dataset construction methods (OssIterableDataset, OssMapDataset, and ossfs with ImageFolder) when accessing data through the OSS internal endpoint and the OSS transfer acceleration endpoint.
Test setup
Test scenario: Performance and peak performance tests were conducted for datasets constructed in different ways, using both the OSS internal endpoint and the OSS transfer acceleration endpoint.
Test data: 10,000,000 images, approximately 100 KB each, totaling about 1 TB.
Test environment: g7nex, network-enhanced general-purpose instance family, 128 vCPUs, 512 GB memory, 160 Gbps internal bandwidth.
Dataset construction methods: OssIterableDataset and OssMapDataset were constructed using OSS Connector for AI/ML. The ossfs with ImageFolder dataset was constructed by mounting a remote bucket with ossfs 1.0.
Performance test
-
Test parameters
Parameter
Value
Description
dataloader batch size
256
Each batch processes 256 samples.
dataloader workers
32
32 worker processes load data in parallel.
transform
trans = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) def transform(object): img = Image.open(io.BytesIO(object.read())).convert('RGB') val = trans(img) return val, object.labelData preprocessing is applied.
-
Test results
Dataset construction method
Dataset type
Time to load 10 million images (OSS internal endpoint)
Time to load 10 million images (transfer acceleration endpoint)
OSS Connector for AI/ML
OssIterableDataset
2,182 seconds
2,107 seconds
OssMapDataset
2,493 seconds
2,288 seconds
ossfs with ImageFolder
178,571 seconds
39,840 seconds
Peak performance test
-
Test parameters
Parameter
Value
Description
dataloader batch size
256
Each batch processes 256 samples.
dataloader workers
32
32 worker processes load data in parallel.
transform
def transform(object): data = object.read() return object.key, object.labelNo data preprocessing is applied.
-
Test results
Dataset construction method
Dataset type
Time to load 10 million images (OSS internal endpoint)
Time to load 10 million images (transfer acceleration endpoint)
OSS Connector for AI/ML
OssIterableDataset
100 seconds
81 seconds
OssMapDataset
176 seconds
127 seconds
Analysis
The performance test results show that OssIterableDataset and OssMapDataset built with OSS Connector are approximately 81 times faster than the traditional ossfs with ImageFolder approach. Even when both use the OSS transfer acceleration endpoint, OSS Connector remains about 18 times faster. This demonstrates a significant advantage in data processing speed and model training efficiency.
The peak performance test results show that enabling the OSS transfer acceleration endpoint improves performance by approximately 1.4 times compared to using the OSS internal endpoint alone. OSS Connector handles high-concurrency, high-bandwidth access without the accelerator, and delivers even stronger performance when paired with it.
Conclusion
By using OSS Connector for AI/ML in Python projects, you can stream OSS objects with minimal setup. In most large-scale training scenarios, OSS Connector substantially improves data access efficiency. For workloads that require even higher throughput, combine OSS Connector for AI/ML with the OSS transfer acceleration endpoint.