All Products
Search
Document Center

MaxCompute:Data upload scenarios and tools

Last Updated:Mar 26, 2026

When you need to bring data into MaxCompute, the right channel and tool depend on how your data arrives — in scheduled batches or as a continuous stream. MaxCompute provides three channels — MaxCompute Tunnel, Streaming Tunnel, and DataHub — each optimized for a different data delivery pattern. This topic explains how to choose the right channel and which tools are available under each.

Choose a channel

Select a channel based on how your data arrives and how often it changes.

ChannelBest forTypical sourcesTools available
MaxCompute TunnelOne-time or scheduled batch loadsFiles, databases, object storage, log filesData Integration (DataWorks), Tunnel commands, open source connectors, Tunnel SDK
Streaming TunnelContinuous low-latency writes where data arrives in real timeMessage queues, event streams, application logsData Integration (real-time sync), Simple Log Service shipping, ApsaraMQ for Kafka shipping, Realtime Compute for Apache Flink
DataHubStreaming data that requires publish, subscribe, distribute, or archive workflowsAny streaming source via DataHub APIsDataHub APIs

Upload data in batches

MaxCompute Tunnel handles batch operations — uploading files, migrating databases, pulling from object storage, and ingesting log files. The following tools are available, ordered from most automated to most customizable.

Data Integration (recommended)

Use Data Integration, the DataWorks data synchronization service, to extract, transform, and load data into MaxCompute without writing connector code. This is the recommended starting point for offline data synchronization.

For setup details, see Overview.

Open source connectors

If your pipeline already uses open source tooling, the following connectors write directly to MaxCompute:

ConnectorBest for
SqoopRelational database sources
KettleETL workflows
FlumeLog and event streams
FluentdUnified log collection
Oracle GoldenGate (OGG)Oracle database replication

Tunnel commands

The MaxCompute client includes built-in Tunnel commands based on the Tunnel SDK. Use these for ad hoc uploads from the command line.

For command reference, see Tunnel commands.

Tunnel SDK

Use the Tunnel SDK directly when you need full programmatic control over uploads — for example, to build a custom integration or handle non-standard data formats.

For API details, see MaxCompute Tunnel.

Write data in streaming mode

Streaming Tunnel uses a separate set of APIs and backend services from MaxCompute Tunnel. Use it when data arrives continuously and must be available in MaxCompute with minimal delay.

SolutionHow it works
Data Integration — real-time syncWrite streaming data to MaxCompute using real-time synchronization nodes in DataWorks. See Overview of real-time synchronization nodes.
Data shippingShip data to MaxCompute from services that integrate streaming write APIs — for example, Simple Log Service and ApsaraMQ for Kafka.
Realtime Compute for Apache FlinkWrite streaming data to MaxCompute in real time using Realtime Compute for Apache Flink.

Improve upload performance

Network conditions significantly affect upload throughput. Under normal conditions, upload speed ranges from 1 MB/s to 10 MB/s.

To improve performance for large uploads:

  • Use a Virtual Private Cloud (VPC) or cloud product interconnection network endpoint. Access the Tunnel endpoint through an Elastic Compute Service (ECS) instance or a leased line. For the list of Tunnel endpoints by region, see Endpoints.

  • Use multi-thread upload. If upload speed is slow, switch to a multi-thread upload implementation to parallelize data transfer.

Service level agreement

MaxCompute Tunnel and Streaming Tunnel run on shared resources that are free of charge. The Tunnel service allocates slots that are available for services based on the data access sequence. Be aware of the following service level agreement (SLA) constraints:

  • If no resources are available, the request waits until resources are released.

  • If fewer than 100 valid requests arrive within a 5-minute window, the Tunnel service is unavailable.

  • Request latency and request-rate limits are not covered by the SLA.

What's next