You can synchronize data from a data source to MaxCompute by using the Data Integration service provided by DataWorks. MaxCompute supports three types of synchronization methods: batch synchronization, real-time synchronization, and integrated synchronization. This topic describes how to synchronize data to MaxCompute by using the Data Integration service.
Batch synchronization
The Data Integration service provided by DataWorks allows you to define data sources or datasets as sources and destinations for data synchronization and use them with readers and writers to build a simple data synchronization framework. This way, you can synchronize structured data and semi-structured data from a data source to MaxCompute.
For more information about how to configure a batch synchronization task, see Configure a batch synchronization node by using the codeless UI and Configure a batch synchronization node by using the code editor.
Usage notes
Batch synchronization allows you to synchronize data from a single table in a database or from tables in sharded databases to a single MaxCompute table.
Before you configure a synchronization task, you need to add a MaxCompute data source on the Data Sources page in the DataWorks console. For more information, see Add a MaxCompute data source.
Before you configure a synchronization task, you need to make sure that a network connection is established between a resource group for Data Integration and your data source. For more information, see Establish a network connection between a resource group and a data source.
Real-time synchronization
The real-time data synchronization feature provided by DataWorks allows you to synchronize incremental data in one or more tables in source databases to MaxCompute in real time. This implements data consistency between the MaxCompute tables and source databases in real time. When you run a real-time synchronization task, you can use multiple conversion plug-ins to cleanse the source data and use multiple writers to write the cleansed data to your intended destination at the same time. You can synchronize incremental data from a single table to a single MaxCompute table, from tables in sharded databases to a single MaxCompute table, and from multiple tables in a database to multiple MaxCompute tables.
For more information about how to configure a real-time synchronization task, see Create a real-time synchronization node to synchronize incremental data from a single table and Configure a real-time synchronization node in DataStudio.
Usage notes
Before you configure a synchronization task, you need to add a MaxCompute data source on the Data Sources page in the DataWorks console. For more information, see Add a MaxCompute data source.
You need to purchase an exclusive resource group for Data Integration with appropriate specifications based on your requirements. For more information, see Create and use an exclusive resource group for Data Integration.
Before you configure a synchronization task, you need to make sure that a network connection is established between a resource group for Data Integration and your data source. For more information, see Establish a network connection between a resource group and a data source.
Before you run a real-time synchronization task, you need to configure the environment in which the MaxCompute data source runs. For more information, see Prepare a MaxCompute environment.
Integrated synchronization
In actual practice, data synchronization is a complex operation and requires the use of multiple batch synchronization tasks, real-time synchronization tasks, and data processing tasks. In these scenarios, complex configurations are required.
To resolve this issue, DataWorks provides configurable synchronization solutions that are tailored for specific business scenarios. The solutions allow you to synchronize data to MaxCompute with a few clicks. For more information, see Create a real-time synchronization solution to synchronize data to MaxCompute and Create a batch synchronization solution to synchronize all data in a database to MaxCompute.
Usage notes
Before you configure a synchronization task, you need to add a MaxCompute data source on the Data Sources page in the DataWorks console. For more information, see Add a MaxCompute data source.
You need to purchase an exclusive resource group for Data Integration with appropriate specifications based on your requirements. For more information, see Create and use an exclusive resource group for Data Integration.
Before you configure a synchronization task, you need to make sure that a network connection is established between a resource group for Data Integration and your data source. For more information, see Establish a network connection between a resource group and a data source.
Before you run a real-time synchronization task, you need to configure the environment in which the MaxCompute data source runs. For more information, see Prepare a MaxCompute environment.