This topic describes how to upload data to MaxCompute or download data from MaxCompute. This topic also describes the required service connections, SDKs, and tools, and the common operations, including data import and export, and data migration to the cloud.
Background information
MaxCompute provides the following types of channels for data uploads and downloads. You can select a channel based on your business requirements.
MaxCompute Tunnel: allows you to upload and download data in batches.
Streaming Tunnel: allows you to write data to MaxCompute in streaming mode.
DataHub: allows you to process streaming data. DataHub allows you to subscribe to streaming data, publish and distribute streaming data, and archive streaming data to MaxCompute.
Features
Upload data by using MaxCompute Tunnel
You can perform a single batch operation to upload data to MaxCompute by using MaxCompute Tunnel. For example, you can upload data in external files, external databases, external object storage systems, or log files to MaxCompute. MaxCompute Tunnel supports the following upload solutions:
Tunnel SDK: You can upload data to MaxCompute by using the interfaces of Tunnel SDK. For more information, see MaxCompute Tunnel.
Data synchronization: You can extract, transform, and load data to MaxCompute by using the Data Integration service of DataWorks. For more information, see Overview.
Open source tools and plug-ins: You can upload data to MaxCompute by using Sqoop, Kettle, Flume, Fluentd, and Oracle GoldenGate (OGG).
Built-in tool of MaxCompute: The MaxCompute client provides built-in commands based on Tunnel SDK. You can upload data to MaxCompute by using Tunnel commands. For more information about how to use Tunnel commands, see Tunnel commands.
NoteTo perform offline data synchronization, we recommend that you use Data Integration of DataWorks. For more information, see Overview.
Write data by using Streaming Tunnel
MaxCompute Streaming Tunnel allows you to write data to MaxCompute in streaming mode and provides a set of APIs and backend services that are different from the APIs and backend services of MaxCompute Tunnel. Streaming Tunnel supports the following data write solutions:
Data synchronization of Data Integration: allows you to write streaming data to MaxCompute. For more information, see Overview of real-time synchronization nodes.
Data shipping: allows you to write streaming data to MaxCompute by using the data shipping mode that integrates streaming write APIs. For example, you can ship data to MaxCompute by using Simple Log Service and ApsaraMQ for Kafka.
Data writing to MaxCompute in real time: allows you to write streaming data to MaxCompute in real time by using Realtime Compute for Apache Flink.
Reliability of solutions
MaxCompute provides the service level agreement (SLA) guarantee. By default, MaxCompute Tunnel and Streaming Tunnel use shared resources that are free of charge. When you upload or download data by using MaxCompute Tunnel and Streaming Tunnel, you must consider the reliability of the solution that you want to use. The Tunnel service allocates slots that are available for services based on the data access sequence.
If no resources are available for data access, data cannot be accessed until resources are released.
If the number of valid requests does not reach 100 within 5 minutes, the Tunnel service is not available. For more information, see Status codes.
The request latency and the limits on requests are not included in the scope of SLA guarantee. For more information about the limits on requests, see Limits.
Limits
Limits on using MaxCompute Tunnel
Data uploads
Lifecycle of an upload session: 24 hours
Maximum number of blocks that can be written in a single upload session: 20,000
Maximum data write speed of a single block: 10 MB/s
Maximum amount of data that can be written in a single block: 100 GB
Maximum number of upload sessions that can be created for a single table: 500 per 5 minutes
Maximum number of blocks that can be written to a single table: 500 per 5 minutes
Maximum number of upload sessions that can be concurrently committed by a single table: 32
Maximum number of blocks that can be written at the same time: depends on the number of Data Transmission Service (DTS) slots that can be used at the same time. One DTS slot is occupied each time data is written to a block.
Data downloads
Lifecycle of a download session: 24 hours
Lifecycle of a session that is used to download instance data: 24 hours (limited by the instance lifecycle)
Maximum number of instance-data download sessions that can be created for a single project: 200 per 5 minutes
Maximum number of download sessions that can be created for a single table: 200 per 5 minutes
Maximum speed of a single download: 10 MB/s
Maximum number of download sessions that can be created at the same time: depends on the number of DTS slots that can be used at the same time. One DTS slot is occupied each time a download session is created.
Maximum number of instance-data download sessions that can be created at the same time: depends on the number of DTS slots that can be used at the same time. One DTS slot is occupied each time an instance-data download session is created.
Maximum number of download requests that can be sent at the same time: depends on the number of DTS slots that can be used at the same time. One DTS slot is occupied each time a download request is sent.
Limits on using Streaming Tunnel
Maximum write speed per slot: 1 MB/s
Maximum number of write requests per slot: 10 per second
Maximum number of partitions to which data can be concurrently written in a single table: 64
Maximum number of slots that are available for a single partition: 32
Maximum number of slots that can be used by a single streaming-data upload session: depends on the number of DTS slots that can be used at the same time. You can specify the number of DTS slots when you create a streaming-data upload session.
Limits on data uploads by using DataHub
The size of each field cannot exceed its upper limit. For more information, see Data type editions.
NoteThe size of a string cannot exceed 8 MB.
During the upload, multiple data entries are packaged into the same file.
Shared DTS slots that are free of charge available for different regions
The following table describes the maximum number of shared DTS slots that can be assigned for different regions at the project level. The shared DTS slots are free of charge.
Country or region | Region | Number of DTS slots |
China | China (Hangzhou) | 300 |
China (Shanghai) | 600 | |
China East 2 Finance | 50 | |
China (Beijing) | 300 | |
China North 2 Ali Gov | 100 | |
China (Zhangjiakou) | 300 | |
China (Shenzhen) | 150 | |
China South 1 Finance | 50 | |
China (Chengdu) | 150 | |
China (Hong Kong) | 50 | |
Other countries or regions | Singapore | 100 |
Australia (Sydney) | 50 | |
Malaysia (Kuala Lumpur) | 50 | |
Indonesia (Jakarta) | 50 | |
Japan (Tokyo) | 50 | |
Germany (Frankfurt) | 50 | |
US (Silicon Valley) | 100 | |
US (Virginia) | 50 | |
UK (London) | 50 | |
India (Mumbai) | 50 | |
UAE (Dubai) | 50 |
Status codes
Status code | Meaning |
200 | HTTP_OK |
201 | HTTP_CREATED |
400 | HTTP_BAD_REQUEST |
401 | HTTP_UNAUTHORIZED |
403 | HTTP_FORBIDDEN |
404 | HTTP_NOT_FOUND |
405 | HTTP_METHOD_NOT_ALLOWED |
409 | HTTP_CONFLICT |
422 | HTTP_UNPROCESSABLE_ENTITY |
429 | HTTP_TOO_MANY_REQUESTS |
499 | HTTP_CLIENT_CLOSED_REQUEST |
500 | HTTP_INTERNAL_SERVER_ERROR |
502 | HTTP_BAD_GATEWAY |
503 | HTTP_SERVICE_UNAVAILABLE |
504 | HTTP_GATEWAY_TIME_OUT |
Precautions
The network status has a significant impact on Tunnel uploads and downloads. In normal cases, the upload speed ranges from 1 MB/s to 20 MB/s. If you want to upload a large amount of data, we recommend that you configure the Tunnel endpoint of the classic network or a virtual private cloud (VPC). You can access the Tunnel endpoint of the classic network or a VPC by using Elastic Compute Service (ECS) instances or a leased line. If the upload speed is slow, you can use the multi-thread upload method.
For more information about Tunnel endpoints, see Endpoints.