OSS accelerator caches OSS data at edge nodes close to your compute resources, reducing the distance data must travel on each request. This page shows benchmark results across five workloads so you can estimate the impact for your scenario.
The gains range from ~1.6× to 10×. The higher your data volume, request concurrency, and throughput demand, the more headroom OSS accelerator creates.
Benchmark summary
| Workload | Speedup vs. OSS | Key metric |
|---|---|---|
| Batch downloads (ossutil) | ~10× faster | 2.2 MB/s → 24 MB/s |
| ML/DL training data reads | ~1.6× faster | Up to 123,043 img/s |
| Download response latency | ~10× lower latency | P50 and P999 both reduced |
| Data lake queries (analytics) | 2–2.5× faster (large scans) | 85% of local ESSD CacheFS speed |
| Simulation training (containers) | 60% shorter training time | 100 Gbps → 300 Gbps peak bandwidth |
Batch downloads with ossutil
Test setup: ossutil cp command downloading 10,000 objects (100 KB each, 976 MB total) from an OSS bucket to a local computer. Compares the OSS internal endpoint against the OSS accelerator accelerated endpoint with data preloading enabled.
| Tool | OSS internal endpoint | OSS accelerator accelerated endpoint |
|---|---|---|
| ossutil | 2.2 MB/s | 24 MB/s |
Result: ~10× faster. OSS accelerator significantly improves throughput for batch data transfers using tools like ossutil.
Machine learning and deep learning
Test setup: Reading data from OssIterableDataset and OssMapDataset datasets created by OSS Connector for AI/ML. Dataset: 10,000,000 objects averaging 100 KB each (1 TB total).
| Parameter | Value |
|---|---|
| Dataloader batch size | 256 |
| Dataloader workers | 32 |
| Transform | No preprocessing (object.read(), returns object.key and object.label) |
| Dataset type | OSS internal endpoint | OSS accelerator accelerated endpoint |
|---|---|---|
| OssIterableDataset | 99,920 img/s | 123,043 img/s |
| OssMapDataset | 56,564 img/s | 78,264 img/s |
Result: ~1.6× faster. OSS Connector for AI/ML already handles high-concurrency access at high bandwidth on its own. OSS accelerator adds further throughput on top of that baseline.
Download response latency
Test setup: Downloading 10 MB objects multiple times, measuring response latency in milliseconds with OSS accelerator disabled (direct OSS access) and enabled.
P50 is the 50th percentile — half of all requests complete within this time. P999 is the 99.9th percentile — effectively the worst-case tail latency. Tail latency matters for interactive workloads: even if median response is fast, slow outliers degrade the user experience.
Result: ~10× lower latency. The improvement appears at both P50 and P999, so OSS accelerator reduces both typical and worst-case response times.
Data lakes and data warehouses
Test setup: Query performance on a lineitem table (~2 billion rows, 760 GB), comparing local ESSD CacheFS, direct OSS access, and OSS accelerator.
| Scenario | Local ESSD CacheFS | OSS | OSS accelerator |
|---|---|---|---|
| Point queries | 382 ms | 2,451 ms | 1,160 ms |
| Random queries on 1,000 rows | 438 ms | 3,786 ms | 1,536 ms |
| Random queries on 10% of data | 130,564 ms | 345,707 ms | 134,659 ms |
| Full scan | 171,548 ms | 398,681 ms | 197,134 ms |
Results:
Full scans and large random queries (10% of data): OSS accelerator is 2–2.5× faster than direct OSS access and reaches ~85% of local ESSD CacheFS performance.
Point queries and small random queries (1,000 rows): OSS accelerator is 1.5–3× faster than direct OSS access and reaches ~30% of local ESSD CacheFS performance. The fixed per-request latency of 8–10 ms limits gains when individual requests are very small.
Simulation training for containers and autonomous driving
Test setup: Simultaneous container startup to pull images, maps, and log data for simulation training. Total OSS data: 204 TB.
| Storage configuration | Data volume | Peak bandwidth | Training duration |
|---|---|---|---|
| OSS only | 204 TB | 100 Gbps | 2.2 hours |
| OSS + OSS accelerator | 204 TB (OSS) + 128 TB (OSS accelerator cache) | 300 Gbps | 40 minutes |
Result: 60% reduction in total training time. The 3× bandwidth increase — from 100 Gbps to 300 Gbps — drives the speedup when many containers read data simultaneously at startup.