This topic describes how to use Data Integration in DataWorks or Realtime Compute
for Apache Flink to synchronize data from tables in various ApsaraDB RDS for MySQL
databases to a Hologres table. You can select a service based on your business requirements.
Background information
In actual business scenarios, data cannot be synchronized by using only one or several
simple batch or real-time synchronization tasks. Instead, multiple batch synchronization
tasks, real-time synchronization tasks, and data processing tasks are required to
synchronize data. This involves complex configurations. The synchronization of data
from tables in various ApsaraDB RDS for MySQL databases to a single Hologres table
requires even more complex configurations that involve multiple tasks. These tasks
may be hard to be scheduled and managed.
To resolve this problem in business scenarios, Data Integration in DataWorks provides
a solution for you to synchronize data from multiple source tables with a few clicks.
This solution improves the efficiency of data synchronization. Realtime Compute for
Apache Flink also provides a powerful feature that allows you to ingest data into
data warehouses or data lakes in real time. You can use this feature to write data
from various data sources to Hologres.
Prerequisites
Note Make sure that the activated services reside in the same region.
Use DataWorks to synchronize data from tables in various ApsaraDB RDS for MySQL databases
to Hologres
Hologres is deeply integrated with Hologres. You can use the solution provided by
DataWorks to synchronize data from multiple source tables at a time to Hologres. The
following steps describe the procedure for ApsaraDB RDS for MySQL data:
- Prepare data in source ApsaraDB RDS for MySQL databases.
Prepare data in tables of different ApsaraDB RDS for MySQL databases. In this example,
data in three tables of two databases is prepared. The following table describes the
details.
Database name |
Table name |
Data amount |
hmtest1 |
product_20220420 |
6301 |
hmtest1 |
product_20220421 |
6331 |
hmtest2 |
product_20220422 |
6227 |
The three tables have the same schema, but only partial data in the three tables is
duplicate. The following code shows the DDL statement of the table named product_20220420:
CREATE TABLE product_20220420 (
value_id int8,
attribute_id int8 not null,
id_card int8,
name text,
potion text,
ds text,
PRIMARY KEY (`value_id`)
);
- Synchronize data to Hologres in real time.
- Open the Create Data Synchronization Solution wizard and specify the source service,
destination service, and type of the real-time synchronization solution that you want
to create.
- Go to Data Integration to open the wizard for creating a real-time synchronization
solution that synchronizes data to Hologres. For more information, see Select a synchronization solution. In the Select Synchronization Solution step, select MySQL as the source service and Hologres as the destination service.
- Click Next.
- Configure network connections for data synchronization.
- Select the source ApsaraDB RDS for MySQL instance, the exclusive resource group, and
the destination Hologres instance. Then, respectively click Test Connectivity for the source and destination.
- After the source and destination pass the connectivity tests, click Next.
- Configure basic information about the source and mapping rules for data synchronization.
- Specify a name for the synchronization solution and configure basic information about
the source, including the time zone of the source data and the tables from which you
want to synchronize data.
- Configure the mapping rules for the names of the source and destination tables. In
this example, regular expressions are used to specify the source databases and tables.
This way, you can migrate data from different sources to the same destination table.
- Click Next.
- Configure the destination table.
- Click Refresh source table and Hologres Table mapping.
Note Mapping rules specify the mapping relationships between the source and destination
tables. If the source tables are mapped to the same destination table, data in the
source tables are synchronized to the same destination table.
- Add additional fields.
To distinguish between different source tables, add additional fields to the destination
table.
- Select all configured mapping rules and click Batch Edit Additional Fields in Destination Table.
- In the Batch Edit Additional Fields in Destination Table dialog box, click New field and add the db_name and table_name fields.
- Click the icon for each field to add the additional fields to the destination table. In this
example, DB_NAME_SRC is selected to indicate the name of the source database, and TABLE_NAME_SRC is selected to indicate the name of the source table.
- Optional. Set an additional field as the primary key.
If you want to synchronize a large amount of data from many source tables, we recommend
that you set an additional field as the primary key. Then, you can use the additional
field and the primary keys of the source tables to compose a composite key. This prevents
data conflicts among primary keys of different source tables. You can also set an
additional field as the distribution key to ensure that duplicate data is synchronized
to the same shard. This improves the performance of data synchronization.
- Click the table name. The Table creation statement dialog box appears.
- Modify the table creation statement to set the table_name field as the primary key and the distribution key of the destination table.
Note
- We recommend that you use the field for the source table name to compose a composite
key if a composite key is required based on your business scenario.
- You can also create more indexes on the destination table to ensure better data synchronization
performance. For more information, see CREATE TABLE.
BEGIN;
CREATE SCHEMA IF NOT EXISTS "holotest";
CREATE TABLE IF NOT EXISTS "holotest"."product" (
"value_id" int8,
"attribute_id" int8,
"id_card" int8,
"name" text,
"potion" text,
"ds" text,
"db_name" text,
"table_name" text,
PRIMARY KEY ("value_id","table_name")
);
call set_table_property('holotest_product','distribution','table_name');
comment on table "hmtest1"."product" is 'Please write your comments';
COMMIT;
- Click Next.
- Configure rules for DML statements.
After you configure fields for the destination table, configure rules for DML statements.
You can configure rules for one or more tables at a time based on your business requirements.
- Select all configured mapping rules and click Batch Configure DML Rule.
- In the Configure DML Rule dialog box, set the rules of all types of DML statements to Normal.
- Click OK to save the configuration.
- Click Next.
- Configure processing rules for DDL messages.
- Configure processing rules for DDL messages based on your business requirements. In
this example, the processing rules are configured as shown in the following figure.
- Click Next.
- Configure resources.
- Configure the resources required by the synchronization solution based on your business
requirements. You can configure the resource group, connections, and concurrency.
- Click Complete Configuration.
- Run the synchronization solution.
After you configure and run the synchronization solution, you can click Execution details to view the details.
- Query the synchronized data.
After you run the synchronization solution, full data in the source tables is first
synchronized to Hologres, and then incremental data in the source tables is synchronized
in real time. After full data is synchronized, you can go to Hologres to query the
data. In this example, the query result is displayed as shown in the following figure.
In this figure, the names of the source tables and databases are populated in the
additional fields. The result shows that data is synchronized from various source
tables to the same destination table.
If the source tables contain incremental data, the incremental data is also synchronized
to Hologres in real time. This example describes how to synchronize data from tables
in various ApsaraDB RDS for MySQL databases to a Hologres table by using the synchronization
solution feature. You can use this feature provided by Data Integration to perform
other types of synchronization operations based on your business requirements.
Use Realtime Compute for Apache Flink to synchronize data from tables in various ApsaraDB
RDS for MySQL databases to Hologres
For more information about how to use Realtime Compute for Apache Flink to synchronize
data from tables in various ApsaraDB RDS for MySQL databases to Hologres, see Ingest data into data warehouses or data lakes in real time.