Data Integration is a distribution service providing data transmission, data conversion and synchronization based on an advanced distribution architecture with multiple modules (such as dirty data processing and flow control). Data Integration supports multiple features, including support for multiple data sources, fast transmission, high reliability, scalability, and mass synchronization.
- Support For Multiple Disparate Data Sources Data Integration supports data synchronization between more than 400 pairs of disparate data sources( including RDS databases, semi-structured storage, non-structured storage (such as audio, video, and images), NoSQL databases, and big data storage). Data Integration also supports real-time data reading and writing between data sources such as Oracle, MySQL, and DataHub.
- Scheduled Tasks Data Integration allows you to schedule offline tasks by setting a specific trigger time (including year, month, day, hour, and minute). It only requires a few steps to configure periodical incremental data extraction. Data integration works perfectly with DataWorks data modeling. The entire workflow is an integration of operations and maintenance.
- Mass Upload to Cloud Data Integration leverages the computing capability of Hadoop clusters to synchronize the HDFS data from clusters to MaxCompute. This is called Mass Cloud Upload. Data Integration can transmit up to 5TB of data per day. The maximum transmission rate is 2 GB/s.
- Monitoring and Alarms With 19 built-in monitoring rules, Data Integration applies to most monitoring scenarios. You can set alarm rules based on these monitoring rules. Additionally, you can pre-define the task failure notification mode for Data Integration.
Data Source Management
By leveraging the data sources and datasets that define the source and destination of data, Data Integration provides two data management plug-ins. The Reader plug-in is used to read data and the Writer plug-in is used to write data. Based on this framework, a set of simplified intermediate data transmission formats is developed to exchange data between arbitrary structured and semi-structured data sources.
Local Data Collection
Data Integration supports data synchronization in Alibaba Cloud classic networks and VPCs, as well as data collection in local IDCs.
Full Database Migration
Full Database migration is a tool provided by Data Integration, which allows the creation of multiple data synchronization tasks and imports all data tables in a MySQL database to MaxCompute. By using full database migration, you no longer need to create synchronization tasks one at a time.
By using the WHERE clause, Data Integration supports business data filtering by date. Data with different dates is synchronized to the relevant MaxCompute partition tables. By setting the synchronization interval to 1 hour or 10 minutes, Data Integration is capable of performing quasi-real-time incremental synchronization.
Certification course: Alibaba Cloud Big Data - Data Integration
Basic concepts and Usage of Data Integration.
Upgraded Support For You
1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Free Tickets.