When you need to bring data into MaxCompute, the right channel and tool depend on how your data arrives — in scheduled batches or as a continuous stream. MaxCompute provides three channels — MaxCompute Tunnel, Streaming Tunnel, and DataHub — each optimized for a different data delivery pattern. This topic explains how to choose the right channel and which tools are available under each.
Choose a channel
Select a channel based on how your data arrives and how often it changes.
| Channel | Best for | Typical sources | Tools available |
|---|---|---|---|
| MaxCompute Tunnel | One-time or scheduled batch loads | Files, databases, object storage, log files | Data Integration (DataWorks), Tunnel commands, open source connectors, Tunnel SDK |
| Streaming Tunnel | Continuous low-latency writes where data arrives in real time | Message queues, event streams, application logs | Data Integration (real-time sync), Simple Log Service shipping, ApsaraMQ for Kafka shipping, Realtime Compute for Apache Flink |
| DataHub | Streaming data that requires publish, subscribe, distribute, or archive workflows | Any streaming source via DataHub APIs | DataHub APIs |
Upload data in batches
MaxCompute Tunnel handles batch operations — uploading files, migrating databases, pulling from object storage, and ingesting log files. The following tools are available, ordered from most automated to most customizable.
Data Integration (recommended)
Use Data Integration, the DataWorks data synchronization service, to extract, transform, and load data into MaxCompute without writing connector code. This is the recommended starting point for offline data synchronization.
For setup details, see Overview.
Open source connectors
If your pipeline already uses open source tooling, the following connectors write directly to MaxCompute:
| Connector | Best for |
|---|---|
| Sqoop | Relational database sources |
| Kettle | ETL workflows |
| Flume | Log and event streams |
| Fluentd | Unified log collection |
| Oracle GoldenGate (OGG) | Oracle database replication |
Tunnel commands
The MaxCompute client includes built-in Tunnel commands based on the Tunnel SDK. Use these for ad hoc uploads from the command line.
For command reference, see Tunnel commands.
Tunnel SDK
Use the Tunnel SDK directly when you need full programmatic control over uploads — for example, to build a custom integration or handle non-standard data formats.
For API details, see MaxCompute Tunnel.
Write data in streaming mode
Streaming Tunnel uses a separate set of APIs and backend services from MaxCompute Tunnel. Use it when data arrives continuously and must be available in MaxCompute with minimal delay.
| Solution | How it works |
|---|---|
| Data Integration — real-time sync | Write streaming data to MaxCompute using real-time synchronization nodes in DataWorks. See Overview of real-time synchronization nodes. |
| Data shipping | Ship data to MaxCompute from services that integrate streaming write APIs — for example, Simple Log Service and ApsaraMQ for Kafka. |
| Realtime Compute for Apache Flink | Write streaming data to MaxCompute in real time using Realtime Compute for Apache Flink. |
Improve upload performance
Network conditions significantly affect upload throughput. Under normal conditions, upload speed ranges from 1 MB/s to 10 MB/s.
To improve performance for large uploads:
Use a Virtual Private Cloud (VPC) or cloud product interconnection network endpoint. Access the Tunnel endpoint through an Elastic Compute Service (ECS) instance or a leased line. For the list of Tunnel endpoints by region, see Endpoints.
Use multi-thread upload. If upload speed is slow, switch to a multi-thread upload implementation to parallelize data transfer.
Service level agreement
MaxCompute Tunnel and Streaming Tunnel run on shared resources that are free of charge. The Tunnel service allocates slots that are available for services based on the data access sequence. Be aware of the following service level agreement (SLA) constraints:
If no resources are available, the request waits until resources are released.
If fewer than 100 valid requests arrive within a 5-minute window, the Tunnel service is unavailable.
Request latency and request-rate limits are not covered by the SLA.
What's next
MaxCompute Tunnel — Tunnel architecture and SDK reference
Tunnel commands — Command-line upload reference
Endpoints — Tunnel endpoint reference by region