All Products
Search
Document Center

DataWorks:Periodically synchronize full data and incremental data from a Hologres database to MaxCompute

Last Updated:Dec 10, 2024

This topic describes how to synchronize full data and incremental data from a Hologres database to MaxCompute in offline mode.

Limits

In this example, only exclusive resource groups for Data Integration are supported.

Prerequisites

  • The data sources that you want to use as the source and destination are configured.

    In this example, a Hologres data source is used as the source, and a MaxCompute data source is used as the destination. You must add the Hologres data source and the MaxCompute data source to the current workspace on the Data Sources page in Management Center in the DataWorks console. For information about the synchronization capabilities that are supported by the two types of data sources, see Hologres data source and MaxCompute data source.

  • An exclusive resource group for Data Integration is purchased and associated with a DataWorks workspace, and network connections between the resource group and data sources are established.

    In this example, only exclusive resource groups for Data Integration are supported. You must purchase and configure an exclusive resource group for Data Integration and establish network connections between the resource group and the Hologres data source and MaxCompute data source. For information about how to purchase and configure an exclusive resource group for Data Integration, see Create and use an exclusive resource group for Data Integration.

Create and configure a synchronization task

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane of the Data Integration page, click Synchronization Task. On the Synchronization Task page, select Hologres from the Source drop-down list and MaxCompute from the Destination drop-down list, and click Create Synchronization Task.

  3. Configure basic information for the synchronization task.

    1. Source And Destination: Select Hologres as the source type and MaxCompute as the destination type.

    2. New Node Name: Specify a name for the synchronization task based on your business requirements.

    3. Synchronization Method: Select Offline synchronization of the entire database.

    4. Synchronization Mode:

      • If you want to perform one-time or periodic synchronization of full data, select Full initialization for this parameter.

      • If you want to perform one-time or periodic synchronization of incremental data, select Incremental synchronization for the Synchronization Mode parameter.

      • If you want to perform one-time synchronization of full data and periodic synchronization of incremental data, select Full initialization and Incremental synchronization for the Synchronization Mode parameter.

    5. Network and Resource Configuration: In this section, select the Hologres data source, MaxCompute data source, and exclusive resource group for Data Integration that you prepare in advance, and click Test Connectivity to test the network connectivity between the resource group and the data sources.

  4. Configure the information that is specific to your synchronization task.

    In this example, a synchronization task used to synchronize data from a Hologres data source to a MaxCompute data source is configured based on two common scenarios. In an actual business scenario, you can configure a synchronization task based on your business requirements. For more information, see the Scenario 1: Configure a synchronization task to periodically synchronize full data and Scenario 2: Configure a synchronization task to synchronize full data at a time and periodically synchronize incremental data sections in this topic.

Scenario 1: Configure a synchronization task to periodically synchronize full data

If you select Structural migration and Full initialization for the Synchronization Mode parameter, you must make the following configurations for the synchronization task.

image

  1. Select the tables from which you want to read data.

    In the Select Data Sources and Tables for Data Synchronization section, select the tables from which you want to read data from the Source Table list and click the rightward arrow to move the selected tables to the Selected Tables list. You can use a regular expression to search for the desired table by name.image

  2. Configure the mode of full synchronization.

    In the Configurations for Full and Incremental Synchronization section, select Periodical Synchronization for the Full Synchronization parameter. Then, click Configure Scheduling Parameters for Periodical Scheduling to configure scheduling parameters for periodic scheduling. For information about scheduling parameters, see Supported formats of scheduling parameters.

    image.png

  3. Refresh mappings between tables in the source and tables in the destination.

    • In the Mapping Rules for Destination Tables section, find a source table and click Refresh in the Actions column. Then, the system displays a destination table name in the Destination Table Name column.

      • If a table is newly created in the destination, you can click the image.png icon in the Destination Table Name column to modify the schema of the table. For example, you can add fields to the table or specify a field as a primary key for the table.

      • If a table already exists in the destination, you can only view the table.

    • Select all source tables and click Batch Refresh Mapping Results to refresh mappings between the source tables and destination tables.

  4. Configure the setting for clearing all data in a destination table before data synchronization.

    Clearing all data in a destination table before full synchronization can ensure the data synchronized to the destination table is the same as the data in the mapped source table. In the upper-right corner of the configuration page, click Configure Advanced Parameters, click the Writer Config tab in the Configure Advanced Parameters panel, set the Clear table before synchronization parameter to true, and then click OK.

    image.png

  5. Configure the settings for periodic scheduling at the table level.

    You can configure the scheduling parameters for periodic scheduling at the table level. You can modify the configurations for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure parameters such as Scheduling Cycle and Resource Group for Scheduling. For more information about scheduling parameters for periodic scheduling, see Overview.

    image.png

  6. After the preceding configuration is complete, click Complete.

    For information about other parameters that you can configure, see the Configure other advanced settings section in this topic.

Scenario 2: Configure a synchronization task to synchronize full data at a time and periodically synchronize incremental data

If you select Structural migration, Full initialization, and Incremental synchronization for the Synchronization Mode parameter, you must make the following configurations for the synchronization task.

image

  1. Select the tables from which you want to read data.

    In the Select Data Sources and Tables for Data Synchronization section, select the tables from which you want to read data from the Source Table list and click the rightward arrow to move the selected tables to the Selected Tables list. You can use a regular expression to search for the desired table by name.

  2. Configure the modes of full synchronization and incremental synchronization.

    One-time Synchronization is automatically selected for the Full Synchronization parameter, and Periodical Synchronization is automatically selected for the Method Of Incremental Synchronization parameter. The two values cannot be changed. You can click Configure Scheduling Parameters for Periodical Scheduling to configure the scheduling parameters for implementing incremental synchronization. The configured scheduling parameters can be used for various purposes. For example, you can use the scheduling parameters in a filter condition, and assign the scheduling parameters to additional fields as values in a destination table. For more information about scheduling parameters, see Supported formats of scheduling parameters.

    image.png

  3. Refresh mappings between tables in the source and tables in the destination.

    • In the Mapping Rules for Destination Tables section, find a source table and click Refresh in the Actions column. Then, the system displays a destination table name in the Destination Table Name column.

      • If a table is newly created in the destination, you can click the image.png icon in the Destination Table Name column to modify the schema of the table. For example, you can add fields to the table or specify a field as a primary key for the table.

      • If a table already exists in the destination, you can only view the table.

    • Select all source tables and click Batch Refresh Mapping Results to refresh mappings between the source tables and destination tables.

  4. Configure the condition for incremental synchronization at the table level.

    You can configure the condition for incremental synchronization at the table level. You can modify the configuration for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure the Condition for Incremental Synchronization parameter to specify a WHERE clause to filter data in the source. When you configure the parameter, you do not need to include the WHERE keyword in the clause. If you configure scheduling parameters for implementing incremental synchronization, you can use the system parameter variables.

    For example, you can configure a WHERE clause to enable the synchronization task to synchronize incremental data that is generated during the time range from 00:00 on the previous day to 00:00 on the current day.

    STR_TO_DATE('${bizdate}', '%Y%m%d') <= columnName AND columnName < DATE_ADD(STR_TO_DATE('${bizdate}', '%Y%m%d'), interval 1 day)

    image.png

  5. Configure the settings for periodic scheduling at the table level.

    You can configure the scheduling parameters for periodic scheduling at the table level. You can modify the configurations for a single table-mapping record or for multiple table-mapping records at a time based on your business requirements. You can configure parameters such as Scheduling Cycle and Resource Group for Scheduling. For more information about scheduling parameters for periodic scheduling, see Overview.

    image.png

  6. After the preceding configuration is complete, click Complete.

    For information about other parameters that you can configure, see the Configure other advanced settings section in this topic.

Configure other advanced settings

Configure a custom name for a destination table

You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.

image.png

Configure the modes of full synchronization and incremental synchronization

  • If you select Full initialization and Incremental synchronization for the Synchronization Mode parameter, One-time Synchronization is automatically selected for the Full Synchronization parameter and Periodical Synchronization is automatically selected for the Method Of Incremental Synchronization parameter. The two values cannot be changed.

    image.png

  • If you select Full initialization for the Synchronization Mode parameter, you can select One-time Synchronization or Periodical Synchronization for the Full Synchronization parameter.

    image.png

  • If you select Incremental synchronization for the Synchronization Mode parameter, you can select One-time Synchronization or Periodical Synchronization for the Method Of Incremental Synchronization parameter.

    image.png

Modify data type mappings between fields in the source and fields in the destination

If the data type mappings that are automatically established do not meet your business requirements, you can click Edit Mapping of Field Data Types to modify data type mappings between fields in the source and fields in the destination, and click Apply and Refresh Mapping.

image.png

Assign scheduling parameters to additional fields as values

You can assign scheduling parameters to additional fields as values.

  1. In the Mapping Rules for Destination Tables section, find a destination table whose schema you want to modify and click the image.png icon in the Destination Table Name column to modify the schema of the destination table.

  2. In the dialog box that appears, click Add Field in the Field section to add fields. After the fields are added, click Apply and Refresh Mapping.

    image.png

  3. In the Mapping Rules for Destination Tables section, select the destination table, click Batch Modify, and then select Value assignment. In the Value assignment dialog box, you can assign scheduling parameters to the additional fields as values. In this example, cyc_time is assigned to an additional field as the value.

    image.png

    image.png

  4. In the Value assignment dialog box, enter ${cyc_time} in the Manually Assign Value field in the Assignment column of the additional field and click OK.

    image.png

Configure advanced parameters

You can use one of the following methods to configure advanced parameters:

  • In the upper-right corner of the configuration page, click Configure Advanced Parameters. In the Configure Advanced Parameters panel, configure fine-grained settings for the reader, writer, and channel control. For example, you can configure the maximum numbers of connections that are allowed for the source and destination, and throttling-related parameters for the batch synchronization subtask.

    You can configure advanced parameters to control the action of a synchronization task. The advanced parameters that you configure for a synchronization task are global settings, which take effect for each table involved in the synchronization task.

  • In the Mapping Rules for Destination Tables section, you can separately configure advanced parameters for a destination table based on your business requirements. The advanced parameters that you configure in the section take precedence over the advanced parameters that you configure in the Configure Advanced Parameters panel.

    image.png

Perform O&M operations on the synchronization task

Start the synchronization task

After the configuration of the synchronization task is complete, you are navigated to the Tasks section of the Synchronization Task page. You can find the synchronization task and click Start in the Operation column to start the synchronization task.

image.png

View the details of the synchronization task

You can click the name of the synchronization task or click Running Details in the Operation column of the synchronization task to view the details of the synchronization task. The running details page displays the following information about the synchronization task:

  • Basic information: You can view the basic information of the synchronization task, such as the data sources, the resource group used to run the synchronization task, and the synchronization task type.

  • Running details: You can view the details of the synchronization steps after the synchronization task is subdivided.

    • If you select Full initialization for the Synchronization Mode parameter when you configure the synchronization task, the Schema Migration and Full Data Synchronization sections are displayed.

    • If you select Incremental synchronization for the Synchronization Mode parameter when you configure the synchronization task, the Schema Migration and Incremental Synchronization sections are displayed.

    • If you select Full initialization and Incremental synchronization for the Synchronization Mode parameter when you configure the synchronization task, the Schema Migration, Full Data Synchronization, and Incremental Synchronization sections are displayed.

  • Details: You can view the steps that are performed during schema migration, full synchronization, and incremental synchronization.

    • The Schema Migration tab displays information such as whether a destination table is a newly created table or an existing table. For a newly created table, the DDL statement that is used to create the table is displayed.

    • The Full Data Synchronization tab displays the information about tables involved in batch synchronization, the synchronization progress, and the number of data records that are written to the destination.

    • The Incremental Synchronization tab displays information about metrics related to the latest instance of the auto triggered subtask generated by the synchronization task.

Rerun the data synchronization task

  • Rerun the task directly: Find the synchronization task in the Tasks section, click More in the Operation column, and select Rerun to rerun the task without modifying the task configurations.

    image.png

    Effect: The one-time subtask is rerun and the properties of the periodic subtask are updated.

  • Add tables to or remove tables from the synchronization task and then rerun the synchronization task: Find the synchronization task in the Tasks section, add tables to or remove tables from the synchronization task, and then click Complete. In this case, Apply Updates is displayed in the Operation column of the synchronization task in the Tasks section. Click Apply Updates to rerun the modified synchronization task for the modifications to take effect.

    Effect: If you add tables to the synchronization task, only data in the added tables is synchronized. Data in the original tables in the synchronization task is not re-synchronized.

  • Change the name of a destination table and then rerun the synchronization task: Find the synchronization task, change the name of a destination table or change the destination table to which you want to write data, and then click Complete. In this case, Apply Updates is displayed in the Operation column of the synchronization task in the Tasks section. Click Apply Updates to rerun the modified synchronization task for the modifications to take effect.

    Effect: Data is re-synchronized to the destination table on which a change is made. Data is not re-synchronized to destination tables on which no change is made.

Data development scenarios

You can refer to Scheduling dependency configuration guide to configure scheduling dependencies for the auto triggered subtask generated by the synchronization task to meet data development requirements. You can view the information about the auto triggered subtask in the Scheduling Configuration section.

image.png

image.png