MaxCompute provides the Storage API to improve integration with the big data ecosystem and allow external compute engines to access MaxCompute data. By calling the Storage API, mainstream third-party compute engines can directly access the underlying storage of MaxCompute. This capability significantly improves the efficiency of data access and interaction. This feature is in public preview.
Introduction
The Storage API is a data service interface that provides an efficient, low-latency, and secure method for data read operations. The Storage API allows mainstream third-party compute engines, such as Spark on EMR, StarRocks, Presto, and PAI, to directly access the underlying storage system of MaxCompute. This improves the integration and data processing efficiency between MaxCompute and open source compute engines or machine learning engines. To simplify the data reading process and improve data access performance, Spark on EMR, StarRocks, and Presto can use a connector to directly read data from MaxCompute. The following figure shows the architecture.

Use cases
The Storage API is ideal for data accessibility and multi-engine computing scenarios. When enterprises or developers need to flexibly switch between computing frameworks or use specific engine features to process data in MaxCompute, the Storage API acts as a bridge to facilitate data circulation and diversify data processing.
Key features
High throughput: The Storage API supports efficient columnar data reads, predicate pushdown for data filtering before transmission, and the Arrow data format.
Secure and user-friendly: Offers direct read access to underlying storage with table semantics, abstracting away storage complexities while adhering to security policies such as project isolation, access control, and data encryption.
Ecosystem integration: Spark on EMR and StarRocks can use a connector to directly read data from MaxCompute. This simplifies the integration of compute engines.
Limits
Third-party engines that access MaxCompute can read standard tables, partitioned tables, clustered tables, Delta Tables, and materialized views. They cannot read external tables or logical views in MaxCompute.
Reading data of the JSON type is not supported.
For the pay-as-you-go Storage API, the default limit is 1,000 concurrent requests per tenant, with a transmission rate of 10 MB/s per concurrent request.
Data transmission resources
When a third-party engine uses the MaxCompute Storage API for data transmission tasks, you can use exclusive resource groups for Data Transmission Service (DTS) (subscription) resources. The following table describes the resources.
Resource group name | Billing description | Supported regions | Usage instructions |
Exclusive resource group for Data Transmission Service (subscription) | This resource group is based on the subscription billing method. You are charged based on the number of concurrent instances that you purchase. For more information, see Subscription fees for exclusive resources for data transmission. |
| Purchase and use exclusive resource groups for Data Transmission Service |
You can go to the Resource Observation page to view usage details of exclusive resource groups for Data Transmission Service (subscription) . For more information, see Use resource observation.
Usage examples
Access MaxCompute using a connector. For more information, see the following topics:
Access MaxCompute using an SDK. For more information, see the following topics: