After you configure data sources, network environments, and resource groups, you can
create and run a real-time sync solution to synchronize all data in a database. This
topic describes how to create a real-time sync solution to synchronize data in specific
or all tables in a database to Elasticsearch in offline mode and then synchronize
incremental data in the database to Elasticsearch in real time. This topic also describes
how to view the statuses of the nodes generated by the real-time sync solution.
Prerequisites
Before you create a data synchronization solution, make sure that the following operations
are performed:
Background information
You can use the hot-warm architecture that is provided by Elasticsearch in a fully
managed manner to store the real-time data of enterprises. DataWorks provides real-time
sync solutions that can be used to synchronize all data in a database to Elasticsearch
in real time based on the architecture. You can use such a solution to first synchronize
data in specific or all tables in a database to Elasticsearch in offline mode and
then synchronize incremental data in the database to Elasticsearch in real time. You
can also view the details of the solution, the statuses of the nodes generated by
the solution, and data updates in the database in real time. This facilitates subsequent
data searches, analysis, and development.
Real-time sync solutions that are used to synchronize all data in a database provide
the following benefits:
- Synchronizes the full data of a database.
You do not need to create multiple batch data synchronization nodes to synchronize
source tables one by one. You can directly create a batch synchronization solution
to synchronize some or all of the tables in a database at a time.
- You can configure synchronization rules in a flexible manner.
- You can configure synchronization rules for different DDL messages based on your business
requirements. For example, if you select Ignore for a DDL message that is specified in the source and used to drop a table in the
destination, the system ignores the message and does not drop the table in the destination
when the system receives the message.
- You can add or remove source tables for a sync solution that is running.
- You can configure synchronization rules for destination Elasticsearch indexes to determine
whether to synchronize the incremental data in source tables to destination Elasticsearch
indexes based on your business requirements. After the incremental data is synchronized,
the incremental data can be searched in destination Elasticsearch indexes.
- Requires only simple configurations.
You do not need to perform complex operations, such as creating synchronization nodes,
databases, and tables, configuring dependencies for nodes, and configure mappings
between sources and destinations. Instead, you need only to configure a batch synchronization
solution in a configuration wizard.
- Large amounts of data can be updated in real time. This improves the efficiency of
automated O&M.
Scenarios
If you want the system to monitor data updates in business databases in real time,
you can use real-time sync solutions to synchronize all data in the databases. This
way, upper-layer applications can search for, analyze, and develop data in real time.
Create a real-time sync solution to synchronize all data in a database
- Go to the Data Integration page and choose to go to the Tasks page.
- On the Tasks page, click New task in the upper-right corner. In the Select Source and Destination section, specify
the types of the source and the destination data sources by setting the Select source
parameters in the Source and the Destination parts.
- In the Create Data Synchronization Solution dialog box, click One-click realtime synchronization to Elasticsearch.
- In the Set Synchronization Sources and Rules step, configure basic information such
as the solution name for the data synchronization solution.
In the
Basic Configuration section, set the parameters.

Parameter |
Description |
Solution Name |
The name of the sync solution. The name can be up to 50 characters in length. |
Description |
The description of the sync solution. The description can be up to 50 characters in
length.
|
Location |
If you select Automatic Workflow Creation, DataWorks automatically creates a workflow
named in the format of clone_database_Source name+to+Destination name. All sync nodes generated by the sync solution are placed in the Data Integration folder of this workflow.
If you clear Automatically establish workflow, select a directory from the Select Location drop-down list. All sync nodes generated by the sync solution are placed in the specified
directory.
|
- Select a source and configure synchronization rules.
- In the Data Source section, specify the Type and Data source parameters.
Note
You can use a real-time sync solution to synchronize all data only from a MySQL or
PolarDB database to Elasticsearch.
- In the Source Table section, select the tables whose data you want to synchronize from the Source Table list. Then, click the
icon to move the tables to the Selected Source Table list. 
The Source Table list displays all tables in the selected source. You can synchronize
data in specific or all tables in the source.
Notice If a selected table has no primary key, you must customize a primary key when you
map the table to a destination Elasticsearch index. This primary key is used to remove
duplicate data during synchronization. For example, you can use one field or a combination
of several fields as the primary key of the table. For more information, see
Step 6 in this topic.
- In the Conversion Rule for Table Name section, click Add rule to select a rule.
Supported options include
Conversion Rule for Table Name and
Rule for Destination Index Name.
- Conversion Rule for Table Name: the rule for converting the names of source tables to those of destination Elasticsearch
indexes.
- Rule for Destination Index Name: the rule for adding a prefix and a suffix to the converted names of destination
Elasticsearch indexes.
- Click Next Step.
- Select the destination and configure destination Elasticsearch indexes.
- In the Set Destination Index step, specify the Destination parameter.
- Click Refresh source table and Elasticsearch Index mapping to configure the mappings between the source tables and destination Elasticsearch
indexes.
- View the mapping progress, source tables, and mapped destination Elasticsearch indexes.

No. |
Description |
1 |
The progress of mapping the source tables to destination Elasticsearch indexes.
Note The mapping may require a long period of time if you synchronize data from a large
number of tables.
|
2 |
- If the tables in the source database contain primary keys, the system removes duplicate
data based on the primary keys during the synchronization.
- If the tables in the source database do not contain primary keys, you can click the
icon to customize primary keys. You can use one field or a combination of several
fields as the primary keys of the tables. This way, the system removes duplicate data
based on the primary keys during the synchronization.
|
3 |
The method that is used to create an index. Valid values:
- Create Index: If you select this method, the name of the Elasticsearch index that is automatically
created appears in the Elasticsearch Index Name column. You can click the name of the index to change the values of the parameters
related to the index.
- Use Existing Index: If you select this method, select the name of the desired index from the drop-down
list in the Elasticsearch Index Name column. Then, you can click View Field Mapping to view the mappings between the source tables and destination Elasticsearch indexes.
|
If you set the
Index creation method parameter to
Create Index , you can click the Elasticsearch index name that appears in the Elasticsearch Index
Name column to change the values of the parameters related to the destination Elasticsearch
index based on your business requirements.

- Dynamic Mapping Status: specifies whether to dynamically synchronize new fields in the source tables to
the destination Elasticsearch indexes during synchronization. Valid values:
- true: If the system detects that the source tables contain new fields, the system synchronizes
the fields to the mapped destination Elasticsearch indexes, and the fields can be
searched in the indexes after synchronization. Default value: true.
- false: If the system detects that the source tables contain new fields, the system synchronizes
the fields to the mapped destination Elasticsearch indexes, but the fields cannot
be searched in the indexes after synchronization.
- strict: If the system detects that the source tables contain new fields, the system does
not synchronize the fields to the mapped destination Elasticsearch indexes, and an
error is reported. You can view the details of the error in the node logs.
For more information about dynamic mappings, see the description of the dynamic parameter for open source Elasticsearch.
- Shards and Replicas: the number of primary shards for the destination Elasticsearch index and the number
of replica shards for each primary shard. The shards are distributed on different
nodes in an Elasticsearch cluster to support distributed searches. This improves the
query efficiency of Elasticsearch. For more information, see Terms.
Note The values of the Shards and Replicas parameters cannot be changed after you specify them and the real-time sync solution
starts to run. The default values of the Shards and Replicas parameters are 1.
- Partition settings: You can use a column in a source table as a partition key column. This parameter
must be used together with the Shards and Replicas parameters. By default, the Enable Partitioning for Elasticsearch Indexes check box
is not selected.
- Data field structure: This section allows you to configure the types and extended attributes of the fields
in the mapped destination Elasticsearch indexes. For more information, see Field data types in open source Elasticsearch.
Note If you do not change the values of the parameters related to the destination Elasticsearch
indexes after the indexes are created, the system synchronizes data based on the default
values of the parameters.
- Click Next Step.
- Configure rules to process DDL messages.
Sources, such as MySQL, PolarDB, may contain multiple DDL messages. Before you synchronize
data, you can configure different rules to process DDL messages based on your business
requirements.
Note The rules apply when a real-time sync solution is run for the first time. If you want
to modify the rules in subsequent operations, go to the configuration page of the
real-time sync solution to perform the operation. For more information, see
Manage the real-time sync solution.
- In the Set Processing Policy for DDL Messages step, configure rules to process DDL messages during data synchronization.

The following table describes the processing rules for different DDL messages.
DDL message |
Rule |
CreateTable |
DataWorks processes a DDL message of the related type based on the following rules
after it receives the message:
- Normal: sends the message to the destination. Then, the destination processes the message.
Each destination may process DDL messages based on its own business logic. If you
select Normal for CreateTable, DataWorks only forwards the messages.
- Ignore: ignores the message and does not send it to the destination.
- Alert: ignores the message and records the alert in real-time synchronization logs. In
addition, the alert contains information about the reason indicating that a message
is ignored because of a running error.
- Error: returns an error when the real-time sync solution is running and terminates the
real-time sync solution.
|
DropTable |
AddColumn |
DropColumn |
RenameTable |
RenameColumn |
ChangeColumn |
TruncateTable |
- Click Next Step.
- Configure the resources required by the sync solution.
In the
Set Resources for Solution Running step, set the parameters as required.

- Offline Sync
Parameter |
Description |
Offline task name rules |
The name of the batch sync node that is used to synchronize the full data of the source.
After a sync solution is created, DataWorks first generates a batch sync node to synchronize
full data, and then generates real-time sync nodes to synchronize incremental data.
|
Resource Groups for Full Batch Sync Nodes |
The exclusive resource group for Data Integration that is used to run the batch sync
node.
Only exclusive resource groups for Data Integration can be used to run solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Scheduling Settings
Parameter |
Description |
Select scheduling Resource Group |
The resource group for scheduling that is used to run the nodes.
Only exclusive resource groups for Data Integration can be used to run sync solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Incremental Sync
Parameter |
Description |
Resource Groups for Incremental Batch Sync Nodes |
The exclusive resource group that is used to run the real-time sync nodes.
Only exclusive resource groups for Data Integration can be used to run solutions.
You can set this parameter to the name of the exclusive resource group for Data Integration
that you purchased. For more information, see Plan and configure resources.
Note If you do not have an exclusive resource group, click Create a new exclusive Resource Group to create one.
|
- Channel Settings
Parameter |
Description |
Maximum number of connections supported by source read |
The maximum number of Java Database Connectivity (JDBC) connections that are allowed
for the source. Specify an appropriate number based on the resources of the source.
Default value: 20.
|
- Click Complete Configuration. The real-time sync solution used to synchronize all data in a database is created.
Run the real-time sync solution
On the Tasks page, find the newly created sync solution and choose More > Submit and Run in the Operation column to run the sync solution.
View the statuses and results of the sync nodes
- On the Tasks page, find the solution that has been run and click in the Operation column. Then, you can view the execution details of all nodes generated
by the sync solution.

- Find a node whose execution details you want to view and click Execution details in the Status column. Then, you can click the link provided in the dialog box that
appears to go to the DataStudio page.
Manage the real-time sync solution