All Products
Search
Document Center

DataWorks:Periodically synchronize full data and incremental data from an AnalyticDB for MySQL 3.0 database to Hologres

Last Updated:Jan 15, 2024

This topic describes how to synchronize full data and incremental data from an AnalyticDB for MySQL 3.0 database to Hologres in offline mode.

Limits

In this example, only exclusive resource groups for Data Integration are supported.

Prerequisites

  • The data sources that you want to use as the source and destination are configured.

    In this example, an AnalyticDB for MySQL 3.0 data source is used as the source, and a Hologres data source is used as the destination. You must add the AnalyticDB for MySQL 3.0 data source and the Hologres data source to the current workspace on the Data Source page in SettingCenter in the DataWorks console. For information about the synchronization capabilities that are supported by the two types of data sources, see AnalyticDB for MySQL 3.0 data source and Hologres data source.

  • An exclusive resource group for Data Integration is purchased and associated with the current DataWorks workspace. Network connections between the resource group and data sources are established.

    In this example, only exclusive resource groups for Data Integration are supported. You must purchase and configure an exclusive resource group for Data Integration, and establish network connections between the resource group and the AnalyticDB for MySQL 3.0 data source and Hologres data source. For information about how to purchase and configure an exclusive resource group for Data Integration, see Create and use an exclusive resource group for Data Integration.

Create and configure a synchronization task

  1. Log on to the DataWorks console and go to the Nodes page in Data Integration. On the Nodes page, click Create Node to create and configure a synchronization task.

  2. Configure basic information for the synchronization task.

    1. Task name: In this section, specify a name for the synchronization task based on your business requirements.

    2. Synchronization Type: In this section, select AnalyticDB for MySQL 3.0 as the source type and Hologres as the destination type. After you select the source and destination types, Offline Full Sync is automatically selected for the Synchronization Method parameter, which cannot be changed. You can configure the Synchronization Mode parameter based on your business requirements.

      • If you want to perform one-time or periodic synchronization of full data, select full database offline synchronization for the Synchronization Mode parameter.

      • If you want to perform one-time or periodic synchronization of incremental data, select inc offline synchronization for the Synchronization Mode parameter.

      • If you want to perform one-time synchronization of full data and periodic synchronization of incremental data, select full database offline synchronization and inc offline synchronization for the Synchronization Mode parameter.

    3. Network and Resource Configuration: In this section, select the AnalyticDB for MySQL 3.0 data source, Hologres data source, and exclusive resource group for Data Integration that you prepared, and click Test Connectivity for All Resource Groups and Data Sources to test the network connectivity between the resource group and data sources.

  3. Configure the information that is specific to your synchronization task.

    In this example, a synchronization task used to synchronize data from an AnalyticDB for MySQL 3.0 data source to a Hologres data source is configured based on two common scenarios. You can configure a synchronization task based on your business requirements. For more information, see the Scenario 1: Configure a synchronization task to periodically synchronize full data and Scenario 2: Configure a synchronization task to synchronize full data at a time and periodically synchronize incremental data sections in this topic.

Scenario 1: Configure a synchronization task to periodically synchronize full data

If you select Structural migration and full database offline synchronization for the Synchronization Mode parameter, you must make the following configurations for the synchronization task.

  1. Select the tables from which you want to read data.

    Select the tables from which you want to read data from the Source Table list in the Select Data Sources and Tables for Data Synchronization section and click the rightward arrow to move the selected tables to the Selected Tables list. You can use a regular expression to search for the desired table by name.

  2. Configure the mode of full synchronization.

    In the Configurations for Full and Incremental Synchronization section, select Periodical Synchronization for the Full Sync parameter. Then, click Configure Scheduling Parameters for Periodical Scheduling to configure scheduling parameters for periodic scheduling. For information about scheduling parameters, see Supported formats of scheduling parameters.

    image.png

  3. Refresh mappings between tables in the source and tables in the destination.

    • In the Mapping Rules for Destination Tables section, find a source table and click Refresh Mapping in the Actions column. Then, the system displays a schema name in the Dst Schema Name column of the source table.

      • If a table is newly created in the destination, you can click the image.png icon in the Destination Table Name column to modify the schema of the table. For example, you can add fields to the table or specify a field as a primary key for the table.

      • If a table already exists in the destination, you can only view the table.

    • Select all source tables and click Batch Refresh Mapping Results to refresh mappings between the source tables and destination tables.

  4. Configure the setting for clearing all data in a destination table before data synchronization.

    Clearing all data in a destination table before full synchronization can ensure that the data synchronized to the destination table is the same as the data in the mapped source table. In the upper-right corner of the configuration page, click Configure Advanced Parameters, click the Writer Config tab in the Configure Advanced Parameters panel, set the Clear table before synchronization parameter to true, and then click OK.

    image.png

  5. Configure the settings for periodic scheduling at the table level.

    You can configure scheduling parameters for periodic scheduling at the table level. You can modify the configurations for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure parameters such as Scheduling period and Scheduling Resource Groups. For more information about scheduling parameters for periodic scheduling, see Overview.

    image.png

  6. After the preceding configuration is complete, click Complete.

    For information about other parameters that you can configure, see the Configure other advanced settings in this topic.

Scenario 2: Configure a synchronization task to synchronize full data at a time and periodically synchronize incremental data

If you select Structural migration, full database offline synchronization, and inc offline synchronization for the Synchronization Mode parameter, you must make the following configurations for the synchronization task.

  1. Select the tables from which you want to read data.

    In the Select Data Sources and Tables for Data Synchronization section, select the tables from which you want to read data from the Source Table list and click the rightward arrow to move the selected tables to the Selected Tables list. You can use a regular expression to search for the desired table by name.

  2. Configure the modes of full synchronization and incremental synchronization.

    One-time Synchronization is automatically selected for the Full Sync parameter, and Periodical Synchronization is automatically selected for the Method of Incremental Synchronization parameter. The two values cannot be changed. You can click Configure Scheduling Parameters for Periodical Scheduling to configure the scheduling parameters for implementing incremental synchronization. The configured scheduling parameters can be used for various purposes. For example, you can use the scheduling parameters in a filter condition, and assign the scheduling parameters to additional fields as values in a destination table. For more information about scheduling parameters, see Supported formats of scheduling parameters.

    image.png

  3. Refresh mappings between tables in the source and tables in the destination.

    • In the Mapping Rules for Destination Tables section, find a source table and click Refresh Mapping in the Actions column. Then, the system displays a schema name in the Dst Schema Name column of the source table.

      • If a table is newly created in the destination, you can click the image.png icon in the Destination Table Name column to modify the schema of the table. For example, you can add fields to the table or specify a field as a primary key for the table.

      • If a table already exists in the destination, you can only view the table.

    • Select all source tables and click Batch Refresh Mapping Results to refresh mappings between the source tables and destination tables.

  4. Configure the condition for incremental synchronization at the table level.

    You can configure the condition for incremental synchronization at the table level. You can modify the configuration for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure the Condition For Incremental Synchronization parameter to specify a WHERE clause to filter data in the source. When you configure the parameter, you do not need to include the WHERE keyword in the clause. If you configure the scheduling parameters for implementing periodic synchronization of incremental data, you can use the system parameter variables.

    For example, you can configure a WHERE clause to enable the synchronization task to synchronize incremental data that is generated during the time range from 00:00 on the previous day to 00:00 on the current day.

    STR_TO_DATE('${bizdate}', '%Y%m%d') <= columnName AND columnName < DATE_ADD(STR_TO_DATE('${bizdate}', '%Y%m%d'), interval 1 day)

    image.png

  5. Configure the settings for periodic scheduling at the table level.

    You can configure the scheduling parameters for periodic scheduling at the table level. You can modify the configurations for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure parameters such as Scheduling period and Scheduling Resource Groups. For more information about the scheduling parameters for periodic scheduling, see Overview.

    image.png

  6. After the preceding configuration is complete, click Complete.

    For information about other parameters that you can configure, see Configure other advanced settings.

Configure other advanced settings

Configure a custom name for a destination table

You can concatenate a built-in variable and a specified string into a destination table name. You can edit built-in variables. For example, you can replace built-in variables with strings.

image.png

Configure the modes of full synchronization and incremental synchronization

  • If you select full database offline synchronization and inc offline synchronization for the Synchronization Mode parameter in the Synchronization Type section of the configuration page, One-time synchronization is automatically selected for the Full Sync parameter and Periodical Synchronization is automatically selected for the Method of Incremental Synchronization parameter. The two values cannot be changed.

    image.png

  • If you select full database offline synchronization for the Synchronization Mode parameter in the Synchronization Type section of the configuration page, you can select One-time Synchronization or Periodical Synchronization for the Full Sync parameter.

    image.png

  • If you select inc offline synchronization for the Synchronization Mode parameter in the Synchronization Type section, you can select One-time Synchronization or Periodical Synchronization for the Method of Incremental Synchronization parameter.

    image.png

Modify data type mappings between fields in the source and fields in the destination

If the data type mappings that are automatically established do not meet your business requirements, you can click Edit Mapping of Field Data Types to modify data type mappings between fields in the source and fields in the destination, and click Apply and Refresh Mapping.

image.png

Assign scheduling parameters to additional fields as values

You can assign scheduling parameters to additional fields as values.

  1. In the Mapping Rules for Destination Tables section, find a destination table whose schema you want to modify and click the image.png icon in the Destination Table Name column to modify the schema of the destination table.

  2. In the dialog box that appears, click Add field on the Field tab of the Edit Table Schema section to add fields. After the fields are added, click Apply and Refresh Mapping.

    image.png

  3. In the Mapping Rules for Destination Tables section, select the destination table, click Batch Modify, and then select Value assignment. In the Value assignment dialog box, you can assign scheduling parameters to the additional fields as values. In this example, cyc_time is assigned to an additional field as the value.

    image.png

    image.png

  4. In the Value assignment dialog box, enter ${cyc_time} in the Manual assignment field in the Assignment column of the additional field and click OK.

    image.png

Configure advanced parameters

You can use one of the following methods to configure advanced parameters:

  • Click Configure Advanced Parameters in the upper-right corner of the configuration page. In the Configure Advanced Parameters panel, configure fine-grained settings for the reader, writer, and channel control. For example, you can configure the maximum numbers of connections that are allowed for the source and destination, and throttling-related parameters for the batch synchronization subtask.

    You can configure advanced parameters to control the action of a synchronization task. The advanced parameters that you configure for a synchronization task are global settings, which take effect for each table involved in the synchronization task.

  • In the Mapping Rules for Destination Tables section, you can separately configure advanced parameters for a destination table based on your business requirements. The advanced parameters that you configure in the section take precedence over the advanced parameters that you configure in the Configure Advanced Parameters panel.

    image.png

Perform O&M operations on the synchronization task

Start the synchronization task

After the configuration of the synchronization task is complete, the Nodes page appears. On the Nodes page, you can find the synchronization task and click Start in the Actions column to start the synchronization task.

image.png

View the details of the synchronization task

You can click the name of the synchronization task or click Running Details in the Actions column of the synchronization task to view the details of the synchronization task. The running details page displays the following information about the synchronization task:

  • Basic information: You can view the basic information of the synchronization task, such as the data sources, the resource group used to run the synchronization task, and the synchronization task type.

  • Running details: You can view the details of the synchronization steps after the synchronization task is subdivided.

    • If you select full database offline synchronization for the Synchronization Mode parameter when you configure the synchronization task, the Schema Migration and Full Data Synchronization sections are displayed.

    • If you select inc offline synchronization for the Synchronization Mode parameter in the Synchronization Type section when you configure the synchronization task, the Schema Migration and Incremental Synchronization sections are displayed.

    • If you select full database offline synchronization and inc offline synchronization for the Synchronization Mode parameter in the Synchronization Type section when you configure the synchronization task, the Schema Migration, Full Data Synchronization, and Incremental Synchronization sections are displayed.

  • Details: You can view the steps that are performed during schema migration, full synchronization, and incremental synchronization.

    image.png

    • The Schema Migration tab displays the generation method of a destination table. The generation methods of a destination table include Use Existing Table and Create Table. If the generation method of a destination table is Create Table, the DDL statement that is used to create the table is displayed.

    • The Full Data Synchronization tab displays the information about the tables involved in batch synchronization, the synchronization progress, and the number of data records that are written to the destination.

    • The Incremental Synchronization tab displays information about metrics related to the latest instance of the auto triggered subtask generated by the synchronization task.

Rerun the synchronization task

  • Directly rerun the synchronization task: Find the synchronization task on the Nodes page, click More in the Actions column, and then select Rerun.

    image.png

    Effect: The one-time subtask is rerun and the properties of the periodic subtask are updated.

  • Add tables to or remove tables from the synchronization task and then rerun the synchronization task: Find the synchronization task on the Nodes page, add tables to or remove tables from the synchronization task, and then click Complete. In this case, Apply Updates is displayed in the Actions column of the synchronization task on the Nodes page. Click Apply Updates to rerun the modified synchronization task for the modifications to take effect.

    Effect: If you add tables to the synchronization task, only data in the added tables is synchronized. Data in the original tables in the synchronization task is not re-synchronized.

  • Change the name of a destination table and then rerun the synchronization task: Find the synchronization task, change the name of a destination table or change the destination table to which you want to write data, and then click Complete. In this case, Apply Updates is displayed in the Actions column of the synchronization task on the Nodes page. Click Apply Updates to rerun the modified synchronization task for the changes to take effect.

    Effect: Data is re-synchronized to the destination table on which a change is made. Data is not re-synchronized to destination tables on which no change is made.

Data development scenarios

You can refer to Scheduling dependency configuration guide to configure scheduling dependencies for the auto triggered subtask generated by the synchronization task to meet data development requirements. You can view information about the auto triggered subtask in the Scheduling Configuration section.

image.png

image.png