If you want to search for and analyze the production data in your ApsaraDB RDS for MySQL database, you can use Data Transmission Service (DTS) to dynamically synchronize the data to an Alibaba Cloud Elasticsearch cluster. The synchronization is implemented based on a real-time synchronization task. This topic describes the procedure in detail.
DTS can be used to synchronize data in real time, migrate data, and subscribe to data. For more information, see DTS.
You can use DTS to synchronize the following SQL operations: INSERT, DELETE, and UPDATE. For more information about the data source versions that support data synchronization, see Database types, initial synchronization types, and synchronization topologies.
This topic can be used to guide the real-time synchronization of data in relational databases.
- DTS uses read and write resources of the source and destination databases during initial full data synchronization. This may increase the database load. If the database performance is unfavorable, the specification is low, or the data volume is large, database services may become unavailable. For example, DTS occupies a large amount of read and write resources in the following cases: a large number of slow SQL queries are performed on the source database, the tables have no primary keys, or a deadlock occurs in the destination database. Before synchronizing data, you must evaluate the performance of the source and destination databases. We recommend that you synchronize data during off-peak hours. For example, you can synchronize data when the CPU usage of the source and destination databases is less than 30%.
- DTS cannot be used to synchronize data definition language (DDL) operations. If a DDL operation is performed on the source table during synchronization, you must remove the table from the synchronization task. Then, remove the index that corresponds to the table from your Elasticsearch cluster, and add the table to the synchronization task again. For more information, see Remove an object from a data synchronization task and Add an object to a data synchronization task.
- If you want to add columns to the source table, modify the mappings of the index that corresponds to the table. Then, perform DDL operations on the source table, pause the data synchronization task, and start the task again.
- Create an Alibaba Cloud Elasticsearch cluster and enable the Auto Indexing feature
for the cluster.
Note To ensure data security, Alibaba Cloud Elasticsearch disables Auto Indexing by default. When you use DTS to synchronize data to an Elasticsearch cluster, you must create indexes on the Elasticsearch cluster by submitting data instead of calling the create index operation. Therefore, before you use DTS to synchronize data, you must enable Auto Indexing for the cluster.
- Create a database and a table. Then, insert data into the table.
In this topic, an ApsaraDB RDS for MySQL database is created. You can also create a database on your on-premises machine. The following figure shows the schema and data of the table.
For more information about how to create an ApsaraDB RDS for MySQL database and a table in the database, see Quick start.Notice We recommend that you create an ApsaraDB RDS for MySQL instance in the same region as the Elasticsearch cluster. If the instance and cluster are created in different regions, you may fail to ensure connectivity between them.
Configure a data synchronization channel
- Create a data synchronization task in the DTS console.
- Log on to the DTS console.
- In the left-side navigation pane, click Data Synchronization.
- On the page that appears, click Create Data Synchronization Task. Then, purchase a data synchronization instance on the buy page as prompted.For more information, see Purchase a data synchronization instance. Set Source Instance to MySQL, Target Instance to Elasticsearch, and Synchronization Topology to One-Way Synchronization.
- Go back to the Data Synchronization page. In the upper part of the page, select the region where the data synchronization instance you created resides. Then, find the instance in the instance list and click Configure Synchronization Channel in the Actions column.
- In the Create Data Synchronization Task wizard, configure the ApsaraDB RDS for MySQL instance and Elasticsearch cluster for
Field/Section Parameter Description Synchronization Task Name None
- DTS automatically generates a task name. You do not need to use a unique task name.
- We recommend that you use an informative name for easy identification.
Source Instance Details Instance Type Select RDS Instance. Instance Region The region that you specified for Source Region when you purchased the data synchronization instance. You cannot change the value of this parameter. Instance ID The ID of the ApsaraDB RDS for MySQL instance. Database Account The account that is used to access the ApsaraDB RDS for MySQL database. Database Password The password that is used to access the ApsaraDB RDS for MySQL database. Encryption Select Non-encrypted. If you select SSL-encrypted, more CPU resources will be consumed. Destination Instance Details Instance Type The value of this parameter is Elasticsearch and cannot be changed. Instance Region The region that you specified for Target Region when you created the data synchronization instance. You cannot change the value of this parameter. Elasticsearch The ID of the Elasticsearch cluster. Database Account The username that is used to access the Elasticsearch cluster. Default value: elastic. Database Password The password that is used to access the Elasticsearch cluster. Enter the password that corresponds to the username specified by Database Account.
- Click Set Whitelist and Next.Notice In this step, the IP address of the DTS server is automatically added to the whitelists of the ApsaraDB RDS for MySQL instance and Elasticsearch cluster. This ensures that DTS can communicate with both the instance and cluster.
- Select the object that you want to synchronize.
Parameter Description Index Name
- Table Name
If you select Table Name, the index created in the Elasticsearch cluster uses the same name as the table in the ApsaraDB RDS for MySQL database.
If you select DatabseName_TableName, the index created in the Elasticsearch cluster is named in the format of Database name_Table name.
Processing Mode In Existed Target Table
- Pre-check and Intercept: The system checks whether the Elasticsearch cluster contains an index that has the
same name as the table in the ApsaraDB RDS for MySQL database. If the Elasticsearch
cluster does not contain such an index, the cluster passes the precheck. Otherwise,
the system displays an error message during the precheck and does not start the data
Note If the Elasticsearch cluster contains such an index, but the index cannot be deleted or renamed, you can perform the operations described in Specify the name of an object in the destination instance to avoid a table name conflict.
- Ignore: The system does not check whether the Elasticsearch cluster contains an index that
has the same name as the table in the ApsaraDB RDS for MySQL database.
Warning If you select Ignore, data inconsistency may occur, and your business may be affected.
- The schema of the source table matches the mappings of the destination index: If the primary key of a record in the Elasticsearch cluster is the same as that in the ApsaraDB RDS for MySQL database, the record remains unchanged during data initialization. However, the record is overwritten during incremental data synchronization.
- The schema of the source table does not match the mappings of the destination index: Data initialization may fail, only some columns are synchronized, or all data fails to be synchronized.
Objects to be synchronized
Select an object from the Available section and click the icon to move the object to the Selected section.
- Table Name
- In the Selected section, move the pointer over the name of the source table and click Edit. In the Edit Table dialog box, configure parameters for the destination index, such
as Index Name and Type Name. Then, click OK.
Parameter Description Index Name The name of the destination index. For more information, see Terms.Notice The name you entered must be unique in the Elasticsearch cluster. Otherwise, the
index already existserror message appears.
Type Name The name of the destination index type. For more information, see Terms. Filter Specifies SQL filter conditions to filter data. Only data that meets the specified conditions is synchronized to the Elasticsearch cluster. For more information, see Use SQL conditions to filter data. IsPartition Specifies whether to set partitions. If you select Yes, you must also specify the partition key column and number of partitions (shards). _id value
- the primary key of table
Composite primary key fields are merged into one column.
- Bis id
If you select a business key, you must also specify the business key column.
add param Select the required column param and column param value parameters. For more information, see Mapping parameters.Notice If you set index to false for a field you added, the field cannot be queried. For more information, see index.
- the primary key of table
- In the lower-right corner of the page, click Precheck.Notice
- Before you start the data synchronization task, the system performs a precheck. You can start the data synchronization task only after the task passes the precheck.
- If the task fails to pass the precheck, click the icon next to each failed item to view the details. Troubleshoot the issues and run the precheck again.
- After the The precheck is passed message appears, close the Precheck dialog box.The data synchronization task starts. Wait until the data synchronization channel is initialized and the data synchronization task enters the Synchronizing state.Notice ApsaraDB RDS for MySQL and Elasticsearch support different data types. During schema initialization, DTS maps the data types of the ApsaraDB RDS for MySQL database to those of the Elasticsearch cluster. For more information, see Data type mappings for initial schema synchronization.
Verify the data synchronization results
- Log on to the Kibana console of the Elasticsearch cluster.For more information, see Log on to the Kibana console.
- In the left-side navigation pane, click Dev Tools.
- On the Console tab of the page that appears, run the following command to query the synchronized
GET /es_test_index/es_test_type/_searchIf the command is successfully executed, the result shown in the following figure is returned.