MaxCompute provides a variety of data upload and download tools, which can be used in different scenarios to migrate data to the cloud. This topic describes the selection of data transmission tools in three typical scenarios.

Hadoop data migration

You can use MaxCompute Migration Assist (MMA), Sqoop, and DataWorks to migrate Hadoop data.

  • If you use DataWorks, DataX is required.
  • If you use Sqoop, a MapReduce job is executed on the original Hadoop cluster for distributed data transmission to MaxCompute. For more information, visit Apache Sqoop.

Synchronization of data in a database

To synchronize data from a database to MaxCompute, you must select a tool based on the database type and synchronization policy.

  • Use DataWorks for offline batch synchronization. DataWorks supports a wide range of database types, which include MySQL, SQL Server, and PostgreSQL. You can also perform instance-related operations based on Create a batch synchronization node.
  • Use the OGG plug-in for real-time synchronization of data in an Oracle database.
  • Use DTS for real-time synchronization of data in an ApsaraDB for RDS database.

Log collection

You can use tools such as Flume, Fluentd, and Logstash to collect logs.