This topic describes how to use Data Transmission Service (DTS) to synchronize data from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch cluster. After you create a data synchronization task in the DTS console, you can synchronize production data in the ApsaraDB RDS for MySQL instance to the Elasticsearch cluster.
DTS is a data transmission service that integrates data migration, data subscription, and real-time data synchronization. For more information, see Data Transmission Service.
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the database load. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before synchronizing data, you must evaluate the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU usage of the source and destination databases is less than 30%.
- DDL operations cannot be synchronized. If a DDL operation is performed on the table in the source database during data synchronization, you must perform the following steps: Remove the table from the required objects, remove the index for the table from the Elasticsearch instance, and then add the table to the required objects. For more information, see Remove an object from a data synchronization task and Add an object to a data synchronization task.
- To add columns to the table that you want to synchronize, perform the following steps: Modify the mappings of the table in the Elasticsearch instance, perform DDL operations in the source MySQL database, and then pause and start the data synchronization task.
SQL statements supported for data synchronization
INSERT, DELETE, and UPDATE statements
Data type mappings
The data types of MySQL and Elasticsearch instances do not have one-to-one correspondence. During initial schema synchronization, DTS maps the data types of the source database to the destination database. For more information, see Data type mappings for initial schema synchronization.
- Create an Elasticsearch cluster and enable the Auto Indexing feature for the cluster.
For more information, see Create an Elasticsearch cluster.Note To ensure data security, Alibaba Cloud Elasticsearch disables Auto Indexing by default. When you use DTS to synchronize data to an Elasticsearch cluster, you must create indexes on the Elasticsearch cluster by submitting data instead of calling the Create index operation. Therefore, you must enable Auto Indexing for the Elasticsearch cluster. For more information, see Enable auto indexing.
- Create a database. You can use an ApsaraDB for RDS database or create a database on
your local server. An ApsaraDB RDS for MySQL database is used in this topic. The following
figure shows the schema and fields of the table.
For more information about how to create an ApsaraDB RDS for MySQL database and a table, see Quick Start.Notice We recommend that you create the ApsaraDB RDS for MySQL instance to which the ApsaraDB RDS for MySQL database belongs in the same region as the Elasticsearch cluster. If the instance and cluster are created in different regions, you may fail to interconnect them.
- Create a data synchronization task in the DTS console.
For more information, see Purchase a data synchronization instance. Select MySQL for Source Instance, Elasticsearch for Target Instance, and One-Way Synchronization for Synchronization Topology.
Configure a synchronization channel
- Log on to the Data Transmission Service console. In the left-side navigation pane, click Data Synchronization.
- On the page that appears, select the region. Then, find the target instance and click Configure Synchronization Channel in the Actions column.
- In the Create Data Synchronization Task wizard, configure the source ApsaraDB RDS for MySQL instance and destination Elasticsearch
cluster for synchronization.
Field/Section Parameter Description Synchronization Task Name None
- DTS automatically generates a task name. You do not need to use a unique task name.
- We recommend that you use an informative name for easy identification.
Source Instance Details Instance Type Select RDS Instance. Instance Region The region that you specified for Source Region when you created the data synchronization task. You cannot change the value of this parameter. Instance ID The ID of the ApsaraDB RDS for MySQL instance. Encryption Select Non-encrypted. If you select SSL-encrypted, more CPU resources are consumed. Destination Instance Details Instance Type The type of the destination cluster. The value can only be Elasticsearch. You cannot change the value of this parameter. Instance Region The region that you specified for Target Region when you created the data synchronization task. You cannot change the value of this parameter. Elasticsearch The ID of the Elasticsearch cluster. Database Account The username of the Elasticsearch cluster. Default value: elastic. Database Password The password of the Elasticsearch cluster. Enter the password that corresponds to the username specified by Database Account.
- Click Set Whitelist and Next. After the synchronization account is created, click Next.
Notice In this step, the IP address of the DTS server is automatically added to the whitelists of the ApsaraDB RDS for MySQL instance and Elasticsearch cluster. This ensures that DTS can communicate with both the instance and cluster.
- Select the objects that you want to synchronize.
Parameter Description Index Name
- Table Name
If you select Table Name, the indexes and tables created on the Elasticsearch cluster use the same names as those on the ApsaraDB RDS for MySQL instance.
If you select DatabseName_TableName, the indexes created on the Elasticsearch cluster are named in the format of Database name_Table name.
Processing Mode In Existed Target Table
- Pre-check and Intercept: The system checks whether the destination Elasticsearch cluster contains indexes
that have the same names as tables in the source database. If the destination Elasticsearch
cluster does not contain indexes that have the same names as tables in the source
database, the precheck is passed. Otherwise, the system displays an error message
during the precheck and does not start the data synchronization task.
Note If indexes in the destination Elasticsearch cluster have the same names as tables in the source database, and cannot be deleted or renamed, you can perform the operations described in Specify the name of an object in the destination instance to avoid table name conflicts.
- Ignore: The system skips the precheck for identical table names in the source database and
destination Elasticsearch cluster.
Warning If you select Ignore, data inconsistency may occur and your business may be affected.
- The source database and destination Elasticsearch cluster have the same mappings. If the primary key of a record in the destination Elasticsearch cluster is the same as that in the source database, the record remains unchanged during initial data synchronization. However, the record is overwritten during incremental data synchronization.
- The source database and destination Elasticsearch cluster have different mappings. This may cause initial data synchronization to fail, only some columns to be synchronized, or the entire data synchronization to fail.
Objects to be synchronized
Select objects from the Available section and click the button to move the objects to the Selected section.
- Table Name
- In the Selected section, move the pointer over the name of the table whose data you want to synchronize
and click Edit. In the Edit Table dialog box, configure parameters for the table in the Elasticsearch
cluster, such as Index Name and Type Name. Then, click OK.
Parameter Description Index Name For more information, see Terms. Type Name For more information, see Terms. Filter Specifies SQL filter conditions to filter data. Only data that meets the specific conditions is synchronized to the Elasticsearch cluster. For more information, see Use SQL conditions to filter data. IsPartition Specifies whether to set partitions. If you select Yes, you must also specify the partition key column and number of partitions. _id value
- the primary key of table
Composite primary key fields are merged into one column.
- Bis id
If you select a business key, you must also specify the business key column.
add param Select the required column param and column param value parameters. For more information, see Mapping parameters.
- the primary key of table
- In the lower-right corner of the page, click Precheck.
- You can start a data synchronization task only after the task passes the precheck.
- If the task fails to pass the precheck, click the icon next to each failed item to view the details. Troubleshoot the issues and run the precheck again.
- After the The precheck is passed. message appears, close the Precheck dialog box.
- Wait until the initial synchronization is completed and the synchronization task is in the Synchronizing state.
Verify the data synchronization result
- Log on to the Kibana console of the Elasticsearch cluster.
For more information, see Log on to the Kibana console.
- In the left-side navigation pane, click Dev Tools.
- On the Console tab of the page that appears, run the following command to check whether the data
is successfully synchronized:
GET /es_test_index/es_test_type/_searchIf the command is executed successfully, the result shown in the following figure is returned.