All Products
Search
Document Center

MaxCompute:Data upload scenarios and tools

Last Updated:Jan 16, 2024

This topic describes how to upload data to MaxCompute or download data from MaxCompute. This topic also describes the required service connections, SDKs, and tools, and the common operations, including data import and export, and data migration to the cloud.

Background information

MaxCompute provides the following types of channels for data uploads and downloads. You can select a channel based on your business requirements.

  • MaxCompute Tunnel: allows you to upload and download data in batches.

  • Streaming Tunnel: allows you to write data to MaxCompute in streaming mode.

  • DataHub: allows you to process streaming data. DataHub allows you to subscribe to streaming data, publish and distribute streaming data, and archive streaming data to MaxCompute.

Features

  • Upload data by using MaxCompute Tunnel

    You can perform a single batch operation to upload data to MaxCompute by using MaxCompute Tunnel. For example, you can upload data in external files, external databases, external object storage systems, or log files to MaxCompute. MaxCompute Tunnel supports the following upload solutions:

    • Tunnel SDK: You can upload data to MaxCompute by using the interfaces of Tunnel SDK. For more information, see MaxCompute Tunnel.

    • Data synchronization: You can extract, transform, and load data to MaxCompute by using the Data Integration service of DataWorks. For more information, see Overview.

    • Open source tools and plug-ins: You can upload data to MaxCompute by using Sqoop, Kettle, Flume, Fluentd, and Oracle GoldenGate (OGG).

    • Built-in tool of MaxCompute: The MaxCompute client provides built-in commands based on Tunnel SDK. You can upload data to MaxCompute by using Tunnel commands. For more information about how to use Tunnel commands, see Tunnel commands.

    Note

    To perform offline data synchronization, we recommend that you use Data Integration of DataWorks. For more information, see Overview.

  • Write data by using Streaming Tunnel

    MaxCompute Streaming Tunnel allows you to write data to MaxCompute in streaming mode and provides a set of APIs and backend services that are different from the APIs and backend services of MaxCompute Tunnel. Streaming Tunnel supports the following data write solutions:

    • Data synchronization of Data Integration: allows you to write streaming data to MaxCompute. For more information, see Overview of real-time synchronization nodes.

    • Data shipping: allows you to write streaming data to MaxCompute by using the data shipping mode that integrates streaming write APIs. For example, you can ship data to MaxCompute by using Simple Log Service and ApsaraMQ for Kafka.

    • Data writing to MaxCompute in real time: allows you to write streaming data to MaxCompute in real time by using Realtime Compute for Apache Flink.

Reliability of solutions

MaxCompute provides the service level agreement (SLA) guarantee. By default, MaxCompute Tunnel and Streaming Tunnel use shared resources that are free of charge. When you upload or download data by using MaxCompute Tunnel and Streaming Tunnel, you must consider the reliability of the solution that you want to use. The Tunnel service allocates slots that are available for services based on the data access sequence.

  • If no resources are available for data access, data cannot be accessed until resources are released.

  • If the number of valid requests does not reach 100 within 5 minutes, the Tunnel service is not available.

  • The request latency and the limits on requests are not included in the scope of SLA guarantee.

Precautions

The network status has a significant impact on Tunnel uploads and downloads. In normal cases, the upload speed ranges from 1 MB/s to 10 MB/s. If you want to upload a large amount of data, we recommend that you configure the Tunnel endpoint of the classic network or a virtual private cloud (VPC). You can access the Tunnel endpoint of the classic network or a VPC by using Elastic Compute Service (ECS) instances or a leased line. If the upload speed is slow, you can use the multi-thread upload method.

For more information about Tunnel endpoints, see Endpoints.