This topic describes how to use Data Integration in DataWorks or Realtime Compute for Apache Flink to synchronize data from tables in various ApsaraDB RDS for MySQL databases to a Hologres table. You can select a service based on your business requirements.

Background information

In actual business scenarios, data cannot be synchronized by using only one or several simple batch or real-time synchronization tasks. Instead, multiple batch synchronization tasks, real-time synchronization tasks, and data processing tasks are required to synchronize data. This involves complex configurations. The synchronization of data from tables in various ApsaraDB RDS for MySQL databases to a single Hologres table requires even more complex configurations that involve multiple tasks. These tasks may be hard to be scheduled and managed.

To resolve this problem in business scenarios, Data Integration in DataWorks provides a solution for you to synchronize data from multiple source tables with a few clicks. This solution improves the efficiency of data synchronization. Realtime Compute for Apache Flink also provides a powerful feature that allows you to ingest data into data warehouses or data lakes in real time. You can use this feature to write data from various data sources to Hologres.

Prerequisites

Note Make sure that the activated services reside in the same region.

Use DataWorks to synchronize data from tables in various ApsaraDB RDS for MySQL databases to Hologres

Hologres is deeply integrated with Hologres. You can use the solution provided by DataWorks to synchronize data from multiple source tables at a time to Hologres. The following steps describe the procedure for ApsaraDB RDS for MySQL data:

  1. Prepare data in source ApsaraDB RDS for MySQL databases.
    Prepare data in tables of different ApsaraDB RDS for MySQL databases. In this example, data in three tables of two databases is prepared. The following table describes the details.
    Database name Table name Data amount
    hmtest1 product_20220420 6301
    hmtest1 product_20220421 6331
    hmtest2 product_20220422 6227
    The three tables have the same schema, but only partial data in the three tables is duplicate. The following code shows the DDL statement of the table named product_20220420:
    CREATE TABLE product_20220420 (
     value_id int8,
     attribute_id int8 not null,
     id_card int8,
     name text,
     potion text,
        ds text,
    PRIMARY KEY (`value_id`)
    );
  2. Synchronize data to Hologres in real time.
    1. Open the Create Data Synchronization Solution wizard and specify the source service, destination service, and type of the real-time synchronization solution that you want to create.
      1. Go to Data Integration to open the wizard for creating a real-time synchronization solution that synchronizes data to Hologres. For more information, see Select a synchronization solution. In the Select Synchronization Solution step, select MySQL as the source service and Hologres as the destination service. Select a synchronization solution
      2. Click Next.
    2. Configure network connections for data synchronization.
      1. Select the source ApsaraDB RDS for MySQL instance, the exclusive resource group, and the destination Hologres instance. Then, respectively click Test Connectivity for the source and destination. Configure network connections
      2. After the source and destination pass the connectivity tests, click Next.
    3. Configure basic information about the source and mapping rules for data synchronization.
      1. Specify a name for the synchronization solution and configure basic information about the source, including the time zone of the source data and the tables from which you want to synchronize data. Configure basic information about the source and mapping rules for data synchronization
      2. Configure the mapping rules for the names of the source and destination tables. In this example, regular expressions are used to specify the source databases and tables. This way, you can migrate data from different sources to the same destination table. Configure the mapping rules.
      3. Click Next.
    4. Configure the destination table.
      1. Click Refresh source table and Hologres Table mapping.
        Note Mapping rules specify the mapping relationships between the source and destination tables. If the source tables are mapped to the same destination table, data in the source tables are synchronized to the same destination table.
      2. Add additional fields.
        To distinguish between different source tables, add additional fields to the destination table.
        1. Select all configured mapping rules and click Batch Edit Additional Fields in Destination Table.
        2. In the Batch Edit Additional Fields in Destination Table dialog box, click New field and add the db_name and table_name fields.
        3. Click the Add fields icon for each field to add the additional fields to the destination table. In this example, DB_NAME_SRC is selected to indicate the name of the source database, and TABLE_NAME_SRC is selected to indicate the name of the source table.
        4. Optional. Set an additional field as the primary key.
          If you want to synchronize a large amount of data from many source tables, we recommend that you set an additional field as the primary key. Then, you can use the additional field and the primary keys of the source tables to compose a composite key. This prevents data conflicts among primary keys of different source tables. You can also set an additional field as the distribution key to ensure that duplicate data is synchronized to the same shard. This improves the performance of data synchronization.
          1. Click the table name. The Table creation statement dialog box appears.
          2. Modify the table creation statement to set the table_name field as the primary key and the distribution key of the destination table.
            Note
            • We recommend that you use the field for the source table name to compose a composite key if a composite key is required based on your business scenario.
            • You can also create more indexes on the destination table to ensure better data synchronization performance. For more information, see CREATE TABLE.
            Set an additional field as the primary key
            BEGIN;
            CREATE SCHEMA IF NOT EXISTS "holotest";
            CREATE TABLE IF NOT EXISTS "holotest"."product" (
             "value_id" int8,
             "attribute_id" int8,
             "id_card" int8,
             "name" text,
             "potion" text,
             "ds" text, 
             "db_name" text,
             "table_name" text,
            PRIMARY KEY ("value_id","table_name")
            );
            call set_table_property('holotest_product','distribution','table_name');
            comment on table "hmtest1"."product" is 'Please write your comments';
            COMMIT;
                                                                            
        5. Click Next.
    5. Configure rules for DML statements.
      After you configure fields for the destination table, configure rules for DML statements. You can configure rules for one or more tables at a time based on your business requirements.
      1. Select all configured mapping rules and click Batch Configure DML Rule.
      2. In the Configure DML Rule dialog box, set the rules of all types of DML statements to Normal. Batch Configure DML Rule
      3. Click OK to save the configuration.
      4. Click Next.
    6. Configure processing rules for DDL messages.
      1. Configure processing rules for DDL messages based on your business requirements. In this example, the processing rules are configured as shown in the following figure. Configure processing rules for DDL messages
      2. Click Next.
    7. Configure resources.
      1. Configure the resources required by the synchronization solution based on your business requirements. You can configure the resource group, connections, and concurrency.
      2. Click Complete Configuration.
  3. Run the synchronization solution.
    After you configure and run the synchronization solution, you can click Execution details to view the details.
  4. Query the synchronized data.
    After you run the synchronization solution, full data in the source tables is first synchronized to Hologres, and then incremental data in the source tables is synchronized in real time. After full data is synchronized, you can go to Hologres to query the data. In this example, the query result is displayed as shown in the following figure. Query the synchronized data.In this figure, the names of the source tables and databases are populated in the additional fields. The result shows that data is synchronized from various source tables to the same destination table.
If the source tables contain incremental data, the incremental data is also synchronized to Hologres in real time. This example describes how to synchronize data from tables in various ApsaraDB RDS for MySQL databases to a Hologres table by using the synchronization solution feature. You can use this feature provided by Data Integration to perform other types of synchronization operations based on your business requirements.

Use Realtime Compute for Apache Flink to synchronize data from tables in various ApsaraDB RDS for MySQL databases to Hologres

For more information about how to use Realtime Compute for Apache Flink to synchronize data from tables in various ApsaraDB RDS for MySQL databases to Hologres, see Ingest data into data warehouses or data lakes in real time.