Data Integration supports offline synchronization of entire databases from sources such as AnalyticDB for MySQL 3.0, ClickHouse, Hologres, and PolarDB to Hologres. This topic describes how to synchronize full data and incremental data from an AnalyticDB for MySQL V3.0 database to Hologres in offline mode.
Prerequisites
A serverless resource group or an exclusive resource group for Data Integration is purchased.
An AnalyticDB for MySQL 3.0 data source and a Hologres data source are created. For more information, see Create a data source for Data Integration.
Network connections between the resource group and data sources are established. For more information, see Network connectivity solutions.
Procedure
Step 1: Select a synchronization task type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Synchronization Task, and then click Create Synchronization Task at the top of the page to go to the synchronization task creation page. Configure the following basic information:
Source And Destination:
AnalyticDB for MySQL (V3.0)
→Hologres
New Node Name: Specify a name for the synchronization task.
Synchronization Method:
Offline synchronization of the entire database
.Synchronization Mode: Select Full Initialization and Incremental Synchronization.
Step 2: Configure network and resources
In the Network And Resource Configuration section, select the Resource Group to be used for the synchronization task. You can allocate the number of CUs for Task Resource Usage.
For Source Data Source, select the added
AnalyticDB for MySQL (V3.0)
data source. For Destination Data Source, select the addedHologres
data source, and then click Test Connectivity.After you ensure that both the source data source and the destination data source are connected successfully, click Next.
Step 3: Select tables for synchronization
In this step, you can select the tables that you want to synchronize from the Source Table section and click the icon to move them to the Selected Tables section on the right.
Step 4: Configure full and incremental synchronization
Configure the synchronization mode for the synchronization task.
If you select Full Initialization and Incremental Synchronization for the Synchronization Mode parameter, One-time Synchronization is automatically selected for the Full Synchronization parameter, and Periodical Synchronization is automatically selected for the Method of Incremental Synchronization parameter. The two values cannot be changed.
If you select Full Initialization for the Synchronization Mode parameter, you can select One-time Synchronization or Periodical Synchronization for the Full Synchronization parameter.
If you select Incremental Synchronization for the Synchronization Mode parameter, you can select One-time Synchronization or Periodical Synchronization for the Method of Incremental Synchronization parameter.
NoteIn this example, the synchronization mode of one-time synchronization of full data and periodic synchronization of incremental data is used.
Configure the parameters for periodic scheduling for the synchronization task.
If your task involves periodic synchronization, you can click Configure Scheduling Parameters For Periodical Scheduling to configure the parameters.
Step 5: Configure destination table mapping
After you select the tables to be synchronized in the previous step, they will automatically be displayed on this page, but the properties of the destination tables are in the pending refresh mapping status by default. You need to define and confirm the mapping relationship between source tables and destination tables, which is the relationship between data reading and writing, and then click Refresh before proceeding to the next step. You can directly refresh mappings between source tables and destination tables. You can also refresh mappings between source tables and destination tables after you configure settings related to destination tables.
You can select the tables to be synchronized and click Batch Refresh Mapping Results. If no mapping rule is configured, the default table name rule is
${Source table name}
. If a table with the same name does not exist in the destination, a new table will be automatically created.Because periodic scheduling is required, you need to define the related properties for the periodic scheduling task, including Scheduling Cycle, Scheduling Time, and Resource Group For Scheduling. The scheduling configuration for the current synchronization task is the same as the scheduling configuration for nodes in Data Development. For more information about the parameters, see Node scheduling.
You need to set the Condition For Incremental Synchronization parameter to specify a WHERE clause to filter data in the source. When you configure the parameter, you do not need to include the WHERE keyword in the clause. If you configure the scheduling parameters for implementing periodic synchronization of incremental data, you can use the system parameter variables.
In the Custom Destination Schema Name Mapping column, you can click the Configure button to customize the destination schema name rule.
You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, when you create a mapping rule, you can add a suffix to a variable that indicates a source schema name to form a destination schema name.
In the Custom Destination Database Name Mapping column, you can click the Configure button to customize the destination database name rule.
You can concatenate built-in variables and specified strings into a final destination database name. You can edit built-in variables. For example, when you create a mapping rule, you can add a suffix to a variable that indicates a source database name to form a destination database name.
In the Custom Destination Table Name Mapping column, you can click the Edit button to customize the destination table name rule.
You can concatenate built-in variables and specified strings into a final destination table name. You can edit built-in variables. For example, when you create a mapping rule, you can add a suffix to a variable that indicates a source table name to form a destination table name.
1. Edit field data type mapping
The synchronization task has default mappings between source field types and destination field types. You can click Edit Mapping Of Field Data Types in the upper-right corner of the table to customize the mapping relationship between source table field types and destination table field types. After configuration, click Apply And Refresh Mapping.
2. Edit destination table structure and assign values to fields
When the destination table is in the To Be Created status, you can add fields to the destination table based on the original table structure. To configure advanced parameters, perform the following operations:
Add fields to one or more destination tables.
Add fields to a single table: Click the
button in the Destination Table Name column to add fields.
Add fields to multiple tables: Select all tables to be synchronized, and at the bottom of the table, select
.
You can perform one of the following operations to assign values to the fields:
Assign values to fields in a single table: Click the Configure button in the Value Assignment For Destination Table Fields column to assign values to the destination table fields.
Assign values to fields in multiple tables: At the bottom of the list, select
to batch assign values to the same fields in the destination tables.
NoteWhen assigning values, you can assign constants and variables. You can switch the assignment mode by clicking the
icon.
3. Configure custom advanced parameters
If you need to make fine-grained configurations for the task to meet custom synchronization requirements, you can click Configure in the Custom Advanced Parameters column to modify the advanced parameters.
Before you modify the configurations of advanced parameters, make sure that you understand the meanings of the parameters to prevent unexpected errors or data quality issues.
Step 6: Configure alert rules
To prevent the failure of the synchronization task from causing latency on business data synchronization, you can configure different alert rules for the synchronization task.
Click Alert Configuration in the upper-right corner of the page to go to the alert settings page.
Select the scheduling task for the synchronization table and set alerts for it. For more information, see Alert information.
Step 7: Configure advanced parameters
You can change the values of specific parameters configured for the synchronization task based on your business requirements. For example, you can specify an appropriate value for the Maximum read connections parameter to prevent the current synchronization task from imposing excessive pressure on the source database and data production from being affected.
To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of the parameters before you change the values of the parameters.
Click Configure Advanced Parameter in the upper-right corner of the interface to go to the advanced parameter configuration page.
Modify the relevant parameter values on the Configure Advanced Parameters page.
Step 8: Configure resource groups
You can click Resource Group Configuration in the upper-right corner of the interface to view and switch the resource group used for the current task.
Step 9: Execute the synchronization task
After all configurations are complete, click Complete at the bottom of the page.
On the
page, find the created synchronization task and click Start in the Actions column.Click Name/ID of the corresponding task in the Task List to view the detailed execution process of the task.
Perform O&M operations on the data synchronization solution
View the status of the data synchronization solution
After the data synchronization solution is created, you can go to the Synchronization Task page to view all data synchronization solutions that are created in the workspace and the basic information of each data synchronization solution.
You can Start or Stop the synchronization task in the Actions column. You can also Edit, View, and perform other operations on the synchronization task from the More menu.
For tasks that have been started, you can see the basic status of the task in Execution Overview, or click the corresponding overview area to view execution details.
In the offline synchronization task from AnalyticDB for MySQL (V3.0) to Hologres:
If your task's synchronization mode is Full Initialization, the Schema Migration and Full Synchronization sections are displayed.
If your task's synchronization mode is Incremental Synchronization, the Schema Migration and Incremental Sync sections are displayed.
If your task's synchronization mode is Full Initialization + Incremental Synchronization, the Schema Migration, Full Synchronization, and Incremental Sync sections are displayed.
Rerun the synchronization task
Directly rerun the synchronization task: Find the synchronization task in the Tasks section, click More in the Actions column, and then select Rerun.
Effect: The one-time subtask is rerun and the properties of the periodic subtask are updated.
Add tables to or remove tables from the synchronization task and then rerun the synchronization task: Find the synchronization task in the Tasks section, add tables to or remove tables from the synchronization task, and then click Complete. In this case, Apply Updates is displayed in the Actions column of the synchronization task in the Tasks section. Click Apply Updates to rerun the modified synchronization task for the modifications to take effect.
Effect: If you add tables to the synchronization task, only data in the added tables is synchronized. Data in the original tables in the synchronization task is not re-synchronized.
Modify the task and rerun the task: Modify the names of the destination tables or use other tables for data synchronization and click Complete. In this case, Apply Updates is displayed in the Actions column of the synchronization task in the Tasks section. Click Apply Updates to rerun the modified synchronization task for the modifications to take effect.
Effect: Data is re-synchronized to the destination table on which a change is made. Data is not re-synchronized to destination tables on which no change is made.
Data development scenarios
You can refer to Node scheduling to configure scheduling dependencies for the auto triggered subtask generated by the synchronization task to meet data development requirements. You can view information about the auto triggered subtask in the Scheduling Configuration column here.