MaxCompute supports a wide range of data upload and download tools. The source code for most of the tools can be found and maintained on the open source community GitHub. You can select the appropriate tools to upload and download data based on application scenarios.
Alibaba Cloud DTplus products
- MaxCompute client (Tunnel)
Note This is an open source project which you can view by visiting aliyun-odps-console.
- Based on the batch data tunnel SDK, the client provides built-in Tunnel commands for data uploads and downloads. For more information about Tunnel commands, see Tunnel commands.
- For information about client installation and usage, see Client.
- Data Integration of DataWorks (Tunnel)
Data Integration of DataWorks is a stable, efficient, and scalable data synchronization platform provided by Alibaba Cloud. It is designed to provide full offline and incremental real-time data synchronization, integration, and exchange services for the heterogeneous data storage systems on Alibaba Cloud.
Data synchronization tasks support the following data sources: MaxCompute, ApsaraDB for RDS (MySQL, SQL Server, and PostgreSQL), Oracle, FTP, AnalyticDB (ADS), OSS, ApsaraDB for Memcache, and DRDS.
- DTS (Tunnel)
Data Transmission Service (DTS) is an Alibaba Cloud data service that supports data exchange among multiple data sources, such as Relational Database Management System (RDBMS), NoSQL, and Online Analytical Processing (OLAP). It provides data transmission features, such as data migration, real-time data subscription, and real-time data synchronization.
DTS supports data synchronization from ApsaraDB for RDS and MySQL instances to MaxCompute tables. Other data sources are not supported.
Open source products
- Sqoop (Tunnel)
Sqoop 1.4.6 in the community is further developed to provide enhanced MaxCompute support. It imports data from relational databases such as MySQL and data from HDFS or Hive to MaxCompute tables. It also exports data from MaxCompute tables to relational databases such as MySQL.
- Kettle (Tunnel)
Kettle is an open source extract, transform, load (ETL) tool that is developed in Java. It runs on Windows, Unix, or Linux, and provides graphic interfaces for you to define the data transmission topology by using drag-and-drop components.
- Apache Flume (DataHub)
Apache Flume is a distributed and reliable system. It efficiently collects large volumes of log data from different data sources and then aggregates and stores the data in a centralized data storage. It supports multiple Source and Sink plug-ins.
The DataHub Sink plug-in of Apache Flume allows you to upload log data to DataHub in real time and archive the data in MaxCompute tables.
- Fluentd (DataHub)
Fluentd is an open source software product. It collects logs, such as application logs, system logs, and access logs, from various sources. It allows you to use plug-ins to filter log data and store the data in different data processors, including MySQL, Oracle, MongoDB, Hadoop, and Treasure Data.
The DataHub plug-in of Fluentd allows you to upload log data to DataHub in real time and archive the data in MaxCompute tables.
- Logstash (DataHub)
Logstash is an open source log collection and processing framework. The logstash-output-datahub plug-in allows you to import data to DataHub. This tool can be easily configured to collect and transmit data. It can be used together with MaxCompute or StreamCompute to easily create an all-in-one streaming data solution from data collection to analytics.
The DataHub plug-in of Logstash allows you to upload log data to DataHub in real time and archive the data in MaxCompute tables.
- OGG (DataHub)
The DataHub plug-in of OGG allows you to incrementally synchronize data in an Oracle database to DataHub in real time and archive the data in MaxCompute tables.