All Products
Search
Document Center

DataWorks:Create a real-time synchronization solution to synchronize data to AnalyticDB for MySQL 3.0

Last Updated:Jan 15, 2024

You can create a real-time synchronization solution and use the solution to synchronize full data and incremental data from a data source to an AnalyticDB for MySQL 3.0 data source. The real-time synchronization solution first synchronizes full data from the data source at a time and then synchronizes incremental data from the data source in real time to the AnalyticDB for MySQL data source. This topic describes how to create a real-time synchronization solution to synchronize data to AnalyticDB for MySQL 3.0 in real time.

Prerequisites

  1. The required data sources are configured. Before you configure a synchronization task, you must configure the data sources from which you want to read data and to which you want to write data. This way, you can select the data sources when you configure a synchronization task. For information about the supported data source types and the configuration of a data source, see Supported data source types and synchronization operations.

    Note

    For information about the items that you need to understand before you configure a data source, see Overview.

  2. The data source environments are prepared. Before you configure a synchronization task, you must create an account that can be used to access a database and grant the account the permissions required to perform specific operations on the database based on your configurations for data synchronization. For more information, see Overview.

Procedure

  1. Step 1: Select a synchronization solution

  2. Step 2: Configure network connections for data synchronization

  3. Step 3: Configure the source and synchronization rules

  4. Step 4: Configure a destination table

  5. Step 5: Configure rules to process DDL messages

  6. Step 6: Configure the resources required by the data synchronization solution

  7. Step 7: Run the data synchronization solution

Step 1: Select a synchronization solution

Go to the Data Integration page in the DataWorks console and click Create Data Synchronization Solution. On the Create Data Synchronization Solution page, select a source type and a destination type for data synchronization from the drop-down lists. Then, select One-click realtime synchronization to AnalyticDB for MySQL 3.0 and click Next.

Step 2: Configure network connections for data synchronization

Select a source, a destination, and a resource group that is used to run nodes. Test the network connectivity to make sure that the resource group is connected to the source and destination. For more information, see Configure network connections for data synchronization.

Step 3: Configure the source and synchronization rules

  1. In the Basic Configuration section, configure the parameters, such as the Solution Name and Location parameters, based on your business requirements.

  2. In the Data Source section, confirm the information about the source.

  3. In the Source Table section, select the tables from which you want to read data from the Source Table list. Then, click the Icon icon to add the tables to the Selected Source Table list.

    The Selected Source Table list displays all tables in the source. You can select all or specific tables.

  4. In the Mapping Rules for Table Names section, click Add Rule, select a rule type, and then configure a mapping rule of the selected type.

    By default, data in the source tables is written to the destination tables that are named the same as the source tables. You can also configure mapping rules to specify the names of the destination tables to which you want to write data. Data Integration allows you to use a regular expression to configure a mapping rule to specify the names of destination tables. You can also concatenate built-in variables to specify the names of the destination tables. You can configure a mapping rule to synchronize data from multiple tables in the source to the same table in the destination. You can also configure a mapping rule to synchronize data from source tables whose names start with a specified prefix to the destination tables whose names start with another specified prefix. For more information, see Configure a synchronization task in Data Integration.

Step 4: Configure a destination table

  1. Confirm the setting of the Write Mode parameter.

    The Write Mode parameter supports only Replay. If a data record is inserted into a source table, the same data record is also inserted into the related destination table. If data records in a source table are updated or deleted, the same data records are also updated in or deleted from the related destination table.

  2. Configure mappings between the source tables and destination tables.

    You can click Refresh source table and AnalyticDB for MySQL 3.0 Table mapping to generate a destination table based on the rules that you configured in the Mapping Rules for Table Names section in Step 3. If no mapping rule is configured in Step 3, data in a source table is written to the destination table that is named the same as the source table. If no destination table that is named the same as the source table exists, the system automatically creates such a destination table. You can change the method used to generate the destination table and add additional fields to the destination table.

    Note

    The name of the destination table is generated based on the mapping rules that you configured in the Mapping Rules for Table Names section.

    Operation

    Description

    Select a primary key for a source table

    Source tables that do not have primary keys cannot be synchronized. If a source table does not have a primary key, you can click the Edit icon in the Synchronized Primary Key column to specify one or more fields in the table as the primary key of the table.

    Select the method used to generate a destination table

    You can select Create Table or Use Existing Table from the drop-down list in the Table creation method column. If you select Create Table, the name of the table that is automatically created appears in the AnalyticDB for MySQL 3.0Table name column. If you select Use Existing Table, you can select a destination table from the drop-down list in the AnalyticDB for MySQL 3.0Table name column.

    Specify a table type

    In the Table Type column, you can view and modify the type of a destination table. For information about partitioned tables and dimension tables, see Terms.

    Select a distribution key

    You can click the Edit icon in the Distribution Field Column to select an appropriate distribution key for a destination table. By default, the primary key of a source table is used as the distribution key of the related destination AnalyticDB for MySQL 3.0 table.

    Note

    The distribution field is the distribution key of the destination AnalyticDB for MySQL 3.0 table. The distribution key is used to specify the distribution of the data in the destination AnalyticDB for MySQL 3.0 table based on the hash value of the distribution field.

    Edit additional fields

    You can click Edit additional fields in the Actions column to add additional fields to a destination table and assign values to the fields. The values can be constants or variables.

    Note

    You can add additional fields to a destination table only if Create Table is selected from the drop-down list in the Table creation method column of the table.

    The following additional variables are supported by Data Integration:

    EXECUTE_TIME: the execution time
    UPDATE_TIME: the update time
    DB_NAME_SRC: the name of the original database
    DB_NAME_SRC_TRANSED: the converted name of the database
    DATASOURCE_NAME_SRC: the name of the source
    DATASOURCE_NAME_DEST: the name of the destination
    DB_NAME_DEST: the name of the destination database
    TABLE_NAME_DEST: the name of the destination table
    TABLE_NAME_SRC: the name of the source table

    Edit destination tables

    By default, the data synchronization solution generates destination table schemas based on the source tables. As a result, field type conversion may occur. For example, if the data types of the fields in a destination table are different from the data types of the fields in a source table, the data synchronization solution converts the fields in the source table to the data types that can be written to the destination table. You can click a destination table name in the AnalyticDB for MySQL 3.0Table name column to modify field type mappings of the destination table.

  3. After you confirm that the configurations in the Configure Source and Synchronization Rules step are correct, click Next.

Step 5: Configure rules to process DDL messages

DDL operations are performed on a source. Data Integration provides default rules to process DDL messages. You can also configure processing rules for different DDL messages based on your business requirements. For more information, see Rules for processing DDL messages.

Step 6: Configure the resources required by the data synchronization solution

After you create a synchronization solution, the synchronization solution generates batch synchronization nodes for full data synchronization and real-time synchronization nodes for incremental data synchronization. You must configure the parameters in the Configure Resources step.

You can configure the exclusive resource groups for Data Integration that you want to use to run real-time synchronization nodes and batch synchronization nodes, and the resource groups for scheduling that you want to use to run batch synchronization nodes. You can also click Advanced Configuration to configure the Number of concurrent writes on the target side and Allow Dirty Data Records parameters.

Note
  • DataWorks uses resource groups for scheduling to issue batch synchronization nodes to resource groups for Data Integration and runs the nodes on the resource groups for Data Integration. Therefore, a batch synchronization node also occupies the resources of a resource group for scheduling. You are charged fees for using the resource group for scheduling to schedule the batch synchronization nodes. For information about the node issuing mechanism, see Mechanism for issuing nodes.

  • We recommend that you use different resource groups to run batch and real-time synchronization nodes. If you use the same resource group to run batch and real-time synchronization nodes, the nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two types of nodes may affect each other. In this case, the batch synchronization nodes may slow down, or the real-time synchronization node may be delayed. Out-of-memory (OOM) errors may also occur due to insufficient resources.

Step 7: Run the data synchronization solution

  1. Go to the Tasks page in Data Integration and find the created data synchronization solution.
  2. Click Submit and Run in the Actions column to run the data synchronization solution.
  3. Click Execution details in the Actions column to view the execution details of the data synchronization solution.

What to do next

After a data synchronization solution is configured, you can manage the solution. For example, you can add tables to or remove tables from the solution, configure alerting and monitoring settings for the nodes that are generated by the solution, and view information about the running of the nodes. For more information, see Perform O&M on a full and incremental synchronization task.