MaxCompute Streaming Tunnel lets you write data to MaxCompute in streaming mode using a dedicated set of APIs and backend services. These APIs significantly reduce the development costs of distributed services and remove the performance bottlenecks of MaxCompute Tunnel in high-concurrency and high-QPS (queries per second) scenarios such as partition locking conflicts, small-file fragmentation, and complex synchronization code.
MaxCompute Streaming Tunnel has been in public preview since January 1, 2021, and is free of charge during the preview period. Follow Service notices to stay informed about future billing changes.
When to use Streaming Tunnel
MaxCompute Streaming Tunnel complements MaxCompute Tunnel rather than replacing it. Use this table to decide which channel fits your workload:
| Dimension | MaxCompute Streaming Tunnel | MaxCompute Tunnel |
|---|---|---|
| Data form | Streaming rows | Batched files |
| Concurrency | High concurrency supported; no partition locking contention | Concurrent writes can cause partition locking conflicts |
| Write throughput | Optimized for high QPS; prevents small-file fragmentation | Small batch size at high QPS generates many small files |
| Incremental data | Asynchronously merged in the background without service interruption | No built-in async merge; data is written as-is |
| Partitioning | Automatic partitioning across concurrent jobs | Manual partition management required |
| Best for | Real-time log ingestion, stream processing results, message queue sync | Large-batch ETL, periodic bulk loads |
Key capabilities
-
Streaming semantic APIs: Help facilitate the development of distributed data synchronization services, reducing development costs.
-
Automatic partitioning: Eliminates concurrent partition locking when multiple synchronization jobs write to the same table simultaneously.
-
Asynchronous data merging: Merges incremental data in the background without interrupting active write operations, improving storage efficiency and preventing small-file accumulation.
-
Data aggregation (Merge): This feature improves storage efficiency.
-
zorder bysorting: This feature improves storage and query efficiency.
-
-
Asynchronous
zorder bysorting for incremental data. For more information aboutzorder by, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE).
Complete isolation between the data link and metadata access. This feature resolves lock contention delays and errors that are caused by metadata access in high-concurrency write scenarios.
Use cases
| Scenario | Description |
|---|---|
| Real-time event log ingestion | Write log data directly into MaxCompute for downstream batch processing—no intermediate storage service needed, which reduces pipeline costs. |
| Stream processing result storage | Persist Flink or other stream computing results into MaxCompute without concurrency or batch size limits, avoiding small-file accumulation from high-frequency writes. MaxCompute Streaming Tunnel ensures the availability of streaming services in scenarios that involve high-concurrency locking. |
| Message queue synchronization | Sync data from DataHub or ApsaraMQ for Kafka into MaxCompute at high concurrency and large batch volumes, replacing workarounds previously needed with the Simple Message Queue connector. |
Integrate with upstream services
By default, Realtime Compute for Apache Flink, DataWorks, and ApsaraMQ for Kafka write to MaxCompute via MaxCompute Tunnel. To switch to Streaming Tunnel:
| Service | How to enable Streaming Tunnel |
|---|---|
| Realtime Compute for Apache Flink | Use the built-in Streaming Tunnel plug-in provided by Realtime Compute for Apache Flink. |
| DataWorks | Contact the DataWorks engineer on duty to enable Streaming Tunnel in the background. |
| ApsaraMQ for Kafka | Contact the Kafka engineer on duty to enable Streaming Tunnel in the background. |
Limitations
Table or partition locking during writes
MaxCompute Tunnel Service locks the target table or partition for the duration of a streaming write. All DML operations that modify data—such as insert into and insert overwrite—are blocked until the write completes and the lock is released.
Schema modification not supported
If the schema of the target table is modified while Streaming Tunnel is active, streaming data cannot be written to the table.
Temporary storage overhead for hot data
When asynchronous data merging or ZORDER BY is enabled, Streaming Tunnel retains two copies of data written within the previous hour: the original ingested data and the asynchronously merged copy. This redundant storage is automatically cleaned up after the default retention period of 1 hour.
Plan storage capacity accordingly if your workload has a high ingestion rate during the merge window.