All Products
Search
Document Center

MaxCompute:Use DataWorks Data Integration to run data synchronization jobs

Last Updated:Jan 09, 2024

You can synchronize data from a data source to MaxCompute by using the Data Integration service provided by DataWorks. MaxCompute supports three types of synchronization methods: batch synchronization, real-time synchronization, and integrated synchronization. This topic describes how to synchronize data to MaxCompute by using the Data Integration service.

Batch synchronization

The Data Integration service provided by DataWorks allows you to define data sources or datasets as sources and destinations for data synchronization and use them with readers and writers to build a simple data synchronization framework. This way, you can synchronize structured data and semi-structured data from a data source to MaxCompute.

Real-time synchronization

The real-time data synchronization feature provided by DataWorks allows you to synchronize incremental data in one or more tables in source databases to MaxCompute in real time. This implements data consistency between the MaxCompute tables and source databases in real time. When you run a real-time synchronization task, you can use multiple conversion plug-ins to cleanse the source data and use multiple writers to write the cleansed data to your intended destination at the same time. You can synchronize incremental data from a single table to a single MaxCompute table, from tables in sharded databases to a single MaxCompute table, and from multiple tables in a database to multiple MaxCompute tables.

Integrated synchronization

In actual practice, data synchronization is a complex operation and requires the use of multiple batch synchronization tasks, real-time synchronization tasks, and data processing tasks. In these scenarios, complex configurations are required.