Performance benchmarks for OSS Connector in AI/ML dataset processing - Object Storage Service

Efficient data loading is critical for large-scale machine learning and deep learning workloads. This topic compares the performance of different dataset construction methods (OssIterableDataset, OssMapDataset, and ossfs with ImageFolder) when accessing data through the OSS internal endpoint and the OSS transfer acceleration endpoint.

Test setup

Test scenario: Performance and peak performance tests were conducted for datasets constructed in different ways, using both the OSS internal endpoint and the OSS transfer acceleration endpoint.

Test data: 10,000,000 images, approximately 100 KB each, totaling about 1 TB.

Test environment: g7nex, network-enhanced general-purpose instance family, 128 vCPUs, 512 GB memory, 160 Gbps internal bandwidth.

Dataset construction methods: OssIterableDataset and OssMapDataset were constructed using OSS Connector for AI/ML. The ossfs with ImageFolder dataset was constructed by mounting a remote bucket with ossfs 1.0.

Performance test

Test parameters

Parameter	Value	Description
dataloader batch size	256	Each batch processes 256 samples.
dataloader workers	32	32 worker processes load data in parallel.
transform	`trans = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) def transform(object): img = Image.open(io.BytesIO(object.read())).convert('RGB') val = trans(img) return val, object.label`	Data preprocessing is applied.

Test results

Dataset construction method	Dataset type	Time to load 10 million images (OSS internal endpoint)	Time to load 10 million images (transfer acceleration endpoint)
OSS Connector for AI/ML	OssIterableDataset	2,182 seconds	2,107 seconds
OSS Connector for AI/ML	OssMapDataset	2,493 seconds	2,288 seconds
ossfs 1.0 remote bucket mount	ossfs with ImageFolder	178,571 seconds	39,840 seconds

Peak performance test

Test parameters

Parameter	Value	Description
dataloader batch size	256	Each batch processes 256 samples.
dataloader workers	32	32 worker processes load data in parallel.
transform	`def transform(object): data = object.read() return object.key, object.label`	No data preprocessing is applied.

Test results

Dataset construction method	Dataset type	Time to load 10 million images (OSS internal endpoint)	Time to load 10 million images (transfer acceleration endpoint)
OSS Connector for AI/ML	OssIterableDataset	100 seconds	81 seconds
OSS Connector for AI/ML	OssMapDataset	176 seconds	127 seconds

Analysis

The performance test results show that OssIterableDataset and OssMapDataset built with OSS Connector are approximately 81 times faster than the traditional ossfs with ImageFolder approach. Even when both use the OSS transfer acceleration endpoint, OSS Connector remains about 18 times faster. This demonstrates a significant advantage in data processing speed and model training efficiency.

The peak performance test results show that enabling the OSS transfer acceleration endpoint improves performance by approximately 1.4 times compared to using the OSS internal endpoint alone. OSS Connector handles high-concurrency, high-bandwidth access without the accelerator, and delivers even stronger performance when paired with it.

Conclusion

By using OSS Connector for AI/ML in Python projects, you can stream OSS objects with minimal setup. In most large-scale training scenarios, OSS Connector substantially improves data access efficiency. For workloads that require even higher throughput, combine OSS Connector for AI/ML with the OSS transfer acceleration endpoint.