This topic describes how to upload data to or download data from MaxCompute, including service connection, SDKs, tools, data import and export, and cloud migration.

MaxCompute provides two data upload and download channels:
  • DataHub: provides real-time data upload and download services. It includes the OGG, Flume, Logstash, and Fluentd plug-ins.
  • Tunnel: provides batch data upload and download services. It includes the MaxCompute client, DataWorks, DTS, Sqoop, Kettle plug-in, and MaxCompute Migration Assist (MMA).

DataHub and Tunnel provide their own SDKs. The data upload and download tools derived from these SDKs allow you to upload or download data in a variety of scenarios. For more information, see MaxCompute Tunnel overview.

The preceding tools provide sufficient capabilities for most scenarios to migrate data to the cloud. The subsequent topics detail the tools and describe cloud migration scenarios, such as data migration from Hadoop to MaxCompute, database synchronization, and log collection, helping you select proper technical schemes.

Note For offline data synchronization, we recommend that you use the data integration feature.

Limits

  • Limits of data uploads by using Tunnel:
    • You cannot run Tunnel commands to upload or download data of the ARRAY, MAP, and STRUCT types.
    • There is no limit on the upload speed. The upload speed depends on the network bandwidth and server performance.
    • There is a limit on the number of retries. When the number of retries exceeds the limit, the next block is uploaded. After data is uploaded, you can execute the select count(*) from table_name statement to check whether any data is lost.
    • A project supports a maximum of 2,000 concurrent Tunnel connections by default.
    • On the server, the lifecycle for each session spans 24 hours after it is created. A session can be shared among processes and threads on the server, but you must make sure that each block ID is unique.
  • Limits of data uploads by using DataHub:
    • The size of each field cannot exceed its upper limit. For more information, see Data types.
      Note The size of a STRING-type field cannot exceed 8 MB.
    • During the upload, multiple data entries are packaged.
  • Limits of TableTunnel SDK interfaces:
    • The value of a block ID must be greater than or equal to 0 and less than 20000. The volume of data you want to upload in a block cannot exceed 100 GB.
    • The lifecycle of a session is 24 hours. If you want to transfer large volumes of data, we recommend that you transfer your data in multiple sessions.
    • The lifecycle of an HTTP request that corresponds to a RecordWriter is 120 seconds. If no data flows over an HTTP connection within 120 seconds, the server closes the connection.