Data Integration supports offline synchronization of entire databases from AnalyticDB for MySQL 3.0, ClickHouse, Hologres, MySQL, PolarDB, and other sources to MaxCompute. This topic uses MySQL to MaxCompute synchronization as an example to demonstrate how to synchronize an entire Hologres database to MaxCompute in a single operation.
Prerequisites
You have purchased a Serverless resource group or an exclusive resource group for Data Integration.
You have created a MySQL data source and a MaxCompute data source. For more information, see Create a data source for Data Integration.
You have completed network connectivity between the resource group and data sources. For more information, see Network connectivity solutions.
Limits
Real-time synchronization of all data in a MySQL database to MaxCompute does not support a MaxCompute data source with tenant-level schema syntax enabled.
Synchronization of source data to MaxCompute external tables is not supported.
Procedure
Step 1: Select the synchronization task type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Synchronization Tasks, and then click Create Synchronization Task at the top of the page to go to the synchronization task creation page. Configure the following basic information:
Data Source And Destination:
MySQL
→MaxCompute
New Task Name: Customize the synchronization task name.
Synchronization Type:
Offline Database
.Synchronization Steps: Select Full Synchronization and Incremental Synchronization.
Step 2: Configure network and resources
In the Network And Resource Configuration section, select the Resource Group for the synchronization task. You can allocate the number of CUs for Task Resource Usage.
For Source Data Source, select the added
MySQL
data source. For Destination Data Source, select the addedMaxCompute
data source, and then click Test Connectivity.After confirming that both the source and destination data sources are connected, click Next.
Step 3: Select the databases and tables to synchronize
In this step, you can select the tables from the source data source that you want to synchronize in the Source Database Tables section, and click the icon to move them to the Selected Database Tables section on the right.
Step 4: Set destination table properties
You need to click the Configure button after Partition Initialization Settings to uniformly initialize the partitions for the new destination tables. The modifications made here will overwrite the partition settings for all new destination tables.
Step 5: Configure full and incremental control
Configure the synchronization mode for the synchronization task.
If you selected both Synchronization Steps Full Synchronization and Incremental Synchronization, this section defaults to one-time full synchronization and periodic incremental synchronization, which cannot be changed.
If you selected Synchronization Steps Full Synchronization, you can choose whether the task type is one-time full synchronization or periodic full synchronization.
If you selected Synchronization Steps Incremental Synchronization, you can choose whether the task type is one-time incremental synchronization or periodic incremental synchronization.
NoteIn this example, the synchronization mode of one-time synchronization of full data and periodic synchronization of incremental data is used.
Configure the parameters for periodic scheduling.
If your task involves periodic synchronization, you can click Scheduling Parameters to configure the settings.
Step 6: Map destination tables
After selecting the tables to synchronize in the previous step, the tables are automatically displayed on this page. However, the properties of the destination tables are in the Pending Refresh Mapping state by default. You need to define and confirm the mapping relationship between the source tables and destination tables, which determines how data is read and written. Then, click Refresh Mapping to proceed to the next step. You can directly refresh mappings between source tables and destination tables, or you can refresh mappings after configuring destination table settings.
You can select the tables to be synchronized and click Batch Refresh Mapping. If no mapping rule is configured, the default table name rule is
${source_database_name}_${table_name}
. If a table with the same name does not exist in the destination, a new table is automatically created.Because periodic scheduling is required, you need to define the related properties for the periodic scheduling task, including Scheduling Cycle, Scheduling Time, Scheduling Resource Group, and more. The scheduling configuration for the current synchronization is the same as the scheduling configuration for nodes in Data Development. For more information about the parameters, see Node scheduling.
You need to set Incremental Condition to filter the source using a WHERE clause. You only need to write the WHERE clause here, without the WHERE keyword. If you configure the scheduling parameters for implementing periodic synchronization of incremental data, you can use the system parameter variables.
In the Custom Destination Table Name Mapping column, you can click the Edit button to customize the destination table name rule.
You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, when creating a mapping rule, you can add a suffix to a variable that indicates a source table name to form a destination table name.
1. Edit field type mapping
The synchronization task has default mappings between source field types and destination field types. You can click Edit Field Type Mapping in the upper-right corner of the table to customize the field type mapping relationship between source tables and destination tables. After configuration, click Apply And Refresh Mapping.
2. Edit destination table structure and add field assignments
When the destination table is in the To Be Created state, you can add fields to the destination table based on the original table structure. To do this, perform the following steps:
Add fields to destination tables.
Add fields to a single table: Click the button in the Destination Table Name
column to add fields.
Add fields in batches: Select all tables to be synchronized, and at the bottom of the table, select
.
Assign values to the fields. You can perform one of the following operations to assign values to the fields:
Assign values to a single table: Click the Configure button in the Destination Table Field Assignment column to assign values to the destination table fields.
Assign values in batches: At the bottom of the list, select
to assign values to the same fields in the destination tables in batches.
NoteWhen assigning values, you can assign constants and variables. You can switch the assignment mode by clicking the
icon.
3. Customize advanced parameters
If you need to make detailed configurations for the task to meet custom synchronization requirements, you can click Configure in the Custom Advanced Parameters column to modify the advanced parameters.
Before modifying the configurations of advanced parameters, make sure that you understand the meanings of the parameters to prevent unexpected errors or data quality issues.
4. Set source split column
You can select a field from the source table in the source split column drop-down list or select No Split.
Step 7: Configure alerts
To prevent synchronization task failures from causing latency in business data synchronization, you can configure different alert rules for the synchronization task.
Click Alert Configuration in the upper-right corner of the page to go to the alert settings page.
Select the scheduling task for the synchronization table and configure alerts for it. For more information, see Alert information.
Step 8: Configure advanced parameters
You can change the values of specific parameters configured for the synchronization task based on your business requirements. For example, you can specify an appropriate value for the Maximum read connections parameter to prevent the current synchronization task from imposing excessive pressure on the source database and affecting data production.
To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of parameters before changing their values.
Click Advanced Parameter Configuration in the upper-right corner of the interface to go to the advanced parameter configuration page.
Modify the parameter values on the Advanced Parameter Configuration page.
Step 9: Configure resource groups
You can click Resource Group Configuration in the upper-right corner of the interface to view and switch the resource group used by the current task.
Step 10: Execute the synchronization task
After completing all configurations, click Complete Configuration at the bottom of the page.
On the
page, find the created synchronization task and click Start in the Operation column.Click the Name/ID of the corresponding task in the Task List to view the detailed execution process of the task.
Perform O&M operations on the synchronization task
View the status of the synchronization task
After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks created in the workspace and their basic information.
You can Start or Stop the synchronization task in the Operation column. In the More menu, you can Edit, View, and perform other operations on the synchronization task.
For started tasks, you can see the basic status of the task in Execution Overview. You can also click the corresponding overview area to view the execution details.
In the offline synchronization task from MySQL to MaxCompute:
If your task synchronization step is Full Synchronization, this section shows structure migration and full synchronization.
If your task synchronization step is Incremental Synchronization, this section shows structure migration and incremental synchronization.
If your task synchronization step is Full Synchronization + Incremental Synchronization, this section shows structure migration, full synchronization, and incremental synchronization.
Task rerun
Direct rerun: Without modifying the task configuration, directly click the Rerun operation.
Effect: Reruns one-time tasks and updates periodic task properties.
Modify and rerun (add or remove tables): Edit the task, add or remove tables, and click Complete. At this point, the task operation will change to Apply Update. Clicking Apply Update will directly trigger the modified task to rerun.
Effect: Only newly added tables will be synchronized. Previously synchronized tables will not be synchronized again.
Modify and rerun (modify table name information): Edit the task, modify the destination table name or switch the destination table for synchronization, and click Complete. At this point, the task operation will change to Apply Update. Clicking Apply Update will directly trigger the modified task to rerun.
Effect: Synchronizes the modified tables. Unmodified tables will not be synchronized again.