DataWorks provides different types of data synchronization solutions for various data synchronization scenarios, such as real-time synchronization, batch full synchronization, and batch incremental synchronization. These types of solutions help you migrate your business data to the cloud in an efficient and convenient manner.
Background information
In some business scenarios, data cannot be synchronized by using only one or more simple batch or real-time synchronization nodes. Instead, multiple batch synchronization nodes, real-time synchronization nodes, and data processing nodes are required to synchronize data. In this case, complex configurations are required.For example, a large amount of data is stored in your database, and you want to synchronize full and incremental data from your database to MaxCompute for analysis. You can use the traditional data synchronization method to perform full synchronization or perform incremental synchronization based on fields such as modify_time in tables in your database. However, in an actual business scenario, the fields may not exist in tables in your database. In this case, you cannot use the Java Database Connectivity (JDBC) driver to extract data for incremental synchronization. You can use a one-click real-time synchronization to MaxCompute solution to synchronize full and incremental data from your database to MaxCompute in real time. After the synchronization, the full and incremental data is automatically merged in MaxCompute. This simplifies data synchronization.
- Synchronizes full data at a time.
- Synchronizes incremental data in real time.
- Automatically merges incremental and full data on a regular basis and writes the merged data to the related partition in a table that is used to store full data.
Overview
The following figure shows the capabilities of the solution-based synchronization feature.
Capability | Description |
---|---|
Data synchronization from or to data sources that are deployed in complex network environments | The solution-based synchronization feature supports data synchronization from or to Alibaba Cloud data sources, data centers, data sources that are hosted on Elastic Compute Service (ECS) instances, and data sources that do not belong to Alibaba Cloud. You can select appropriate network connectivity solutions to establish network connections between your resource group and data sources based on the network environments in which the data sources are deployed. Before you configure a data synchronization solution, you must make sure that network connections are established between your resource group for Data Integration and data sources. For more information about how to establish a network connection between a resource group and a data source, see Establish a network connection between a resource group and a data source. |
Data synchronization scenarios | The solution-based synchronization feature supports the synchronization of data from a single table to another single table, from tables in sharded databases to a single table, and from multiple tables in a database to multiple tables. DataWorks provides the following types of data synchronization solutions: batch synchronization of data from all tables in a database (one-time full synchronization, periodical full synchronization, one-time full synchronization and periodical incremental synchronization, one-time incremental synchronization, and periodical incremental synchronization), and one-time full synchronization and real-time incremental synchronization. For more information, see Supported data source types and data synchronization solutions. |
Configurations for data synchronization solutions | For information about how to configure a data synchronization solution, see Configure a data synchronization solution in Data Integration. For information about the capabilities that you can use when you configure a data synchronization solution, see the Capabilities that you can use when you configure a data synchronization solution section in this topic. |
O&M for data synchronization solutions |
|
Capabilities that you can use when you configure a data synchronization solution
Capability | Description |
---|---|
Refresh mappings | Click the button that is used to refresh the mappings between the source tables and destination tables. Then, the system displays the mappings. In the preceding figure, |
View or modify the schema of a destination table | Find a mapping record and click the name of the destination table to open the dialog box that displays the schema of the table. In the dialog box, you can modify the schema of the table based on your business requirements. In the preceding figure, theadd_col field is added to the automatically created destination Hologres table hudi_b.tb_order_3 , the data type of the field is set to TEXT, and the description of the field is set to Add a field to an automatically created destination table. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important When you modify the schema of an automatically created destination table, you must take note of the following items about the fields that have the same names as fields in the source table:
add_col field is added to the existing destination Hologres table hudi_b.tb_order_1 , the data type of the field is set to TEXT, and the description of the field is set to Add a field to an existing destination table. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important When you modify the schema of an existing destination table, you must take note of the following items:
|
Modify the schemas of multiple destination tables at a time | Select multiple mapping records and click Batch Modify Table Schema. In the dialog box that appears, you can modify the schemas of the tables at a time. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important
Then, click the name of a destination table on which the batch operation is performed to view the new schema of the table. |
Specify the name of a destination schema or table | By default, data is written to the destination schema and table that are named the same as the source database and table. If no such destination schema or table exists, the system automatically creates the schema or table in the destination. You can specify the name of the destination schema or table to which you want to write data. For more information, see Configure rules for mapping databases or tables. Note
|
Assign values to the fields that are newly added to a destination table | By default, source fields specified in a data synchronization solution are mapped to destination fields that are named the same as the source fields. The values of the source fields are written to the destination fields that are named the same as the source fields. You can add fields to a destination table and assign constants or variables to the fields as values. You can find the desired mapping record and click Edit in the Value Assignment for Destination Field column. In the dialog box that appears, the new schema of the destination table to which fields are added is displayed.
Note Descriptions for the supported variables:
|
Configure rules to process DDL or DML messages | When you configure a data synchronization solution, you can configure rules to process different types of DDL messages for destination tables. For information about the DDL and DML operations supported by different destinations, see Supported DML and DDL operations. Note
|