This section explains how to use Data Transmission Service (DTS) to quickly create a real-time data synchronization task from an RDS for MySQL instance to an Alibaba Cloud Elasticsearch (ES) instance. DTS uses this synchronization feature to synchronize RDS for MySQL data to ES instances and query data in real time.
Real-time synchronization type
DTS instances under the same Alibaba Cloud account from RDS for MySQL to ES.
SQL operation types
The main SQL operation types supported are as follows:
If a table using DDL is encountered in an RDS for MySQL instance, the DML operations for the corresponding table may fail. To resolve this problem, complete the following steps:
- Delete the object from the synchronization list. For more information, see Delete synchronization objects.
- Delete the index corresponding to this table in the ES instance.
- Re-add the table to the synchronization list and re-initialize it. For more information, see Add a synchronization object.
If the DDL is used to add a column or modify a table, the order of DDL operations is as follows:
- Manually modify the corresponding mapping and new column in your ES instance.
- Modify the table schema and add a new schema in the source RDS for MySQL instance.
- Stop synchronizing instances in DTS, and restart DTS synchronization instances to reload the mapping relationship that was modified in ES.
Configure data synchronization
To synchronize data from an RDS for MySQL instance to an ES instance, complete these steps:
- Purchase a DTS synchronization instance
Log on to the Data Transmission Service console and go to the Data Synchronization pane. In the upper-right corner, click Create Synchronization Task to purchase a synchronization instance. You can then configure the synchronization instance.Note You must purchase a synchronization instance before you can configure it. Two billing modes are supported: Subscription and Pay-As-You-Go.
Select Data Synchronization.
- Source Instance
- Source Region
- Because this example uses the RDS for MySQL instance, you need to select the region where the RDS for MySQL instance is located.
- Target Instance
- Target Region
Select the region where your Elasticsearch instance is located. Note that after the synchronization instance has been purchased, you cannot change its region.Target Instance
Each instance specification corresponds to the performance of a synchronization instance. For more information, see Data synchronization specifications.
- Order Time
- If the synchronization instance is prepaid, the order time is one month by default.
By default, the quantity is 1.Note The region of your DTS synchronization instance is the target region that you selected. For example, if the synchronization instance is from the Hangzhou-region RDS for MySQL to the Hangzhou-region Elasticsearch, the region of the DTS synchronization instance is Hangzhou. To configure your synchronization instance, go to the instance list in that region in DTS, search for the synchronization instance you just purchased, and click Configure Synchronization Instance in the upper-right corner.
- Configure your synchronization instance
Synchronization task name
There are no requirements for the name of a synchronization instance.Source instance
This example uses RDS for MySQL as the data source. You need to set the instance type, region and ID, and database account and password.Target instance
You need to configure the ID, account, and password for the ES instance.
Once you complete these configurations, click Authorize Whitelist and Enter Next Step to add IPs to RDS for MySQL and ES instance whitelists.
- Authorize instance whitelists
Note If the source instance is RDS for MySQL, DTS automatically adds IPs to a whitelist or adds a security group.
If the source instance is RDS for MySQL, DTS adds the instance IP to the security group of an RDS instance’s whitelist. This means that, when creating synchronization tasks, you can avoid failures caused by a disconnection between the DTS instance and the RDS database. To ensure the stability of the synchronization task, do not delete the instance IP from the RDS instance.
After the whitelist is authorized, click Next to create a synchronization account.
- Select the synchronization object
To configure synchronization objects and naming rules for indexes, complete these steps:
Select a naming rule for indexes: table name or database name_table name.
- If you select a table name, the name of the index is the name of the table.
- If you select a database name_table name, the naming rule for the index is database name_table name. For example, if a database is named dbtest and a table is named sbtest1, after the table is synchronized to your ES instance, the index name would be dbtest_sbtest1.
- If two tables in different databases have the same name, we recommend that the index name be set to database name_table name.
Select a specific database, table, and column. The selectable granularity of the synchronization objects supports table-level operations. This means that you can synchronize several databases and tables.
The selectable granularity of the synchronization objects supports table-level operations. This means that you can synchronize several databases and tables.
By default, the docid of all tables is the primary key. If some tables do not have the primary key, configure their docid corresponding to the columns in the source tables. In the box of selected objects on the right, move the pointer over the corresponding table, and click Edit to enter the advanced settings pane.
In advanced settings, you can configure the index name, type name, partition column and quantity, and _id value column. If the value of _id is set to the business primary key, you need to select the corresponding business primary key column.
After synchronization objects are configured, proceed to the advanced setup.
- Advanced setup
Synchronization Initialization: We recommend that you select Structure Initialization and Data Initialization, which allows DTS to automatically create indexes and initialize data. If you do not select Schema Initialization, you need to define the mapping for indexes in ES manually before synchronizing. If you do not select Full Data Initialization, the starting time for incremental DTS data synchronization is the time at which synchronization starts.
Shard Configuration: There are 5 partitions and 1 replica by default. Once the configuration is adjusted, all indexes define partitions according to this configuration.
String Index is an analyzer that can select strings. By default, it is Standard Analyzer. Other values include: Simple Analyzer, Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, English Analyzer, and Fingerprint Analyzer. The string fields of all indexes define Analyzer according to this configuration.
Time Zone is where time fields synchronized to your ES instance are stored. The default time zone in China is UTC (UTC +8).
After synchronization task configurations are complete, DTS performs a pre-check. If the pre-check is verified, click Start to start the synchronization task.
After the synchronization task starts, go to the synchronization job list and verify whether the task’s status is Sync initialization. The time it takes to initialize depends on the amount of data that the synchronization object has in the source instance. After completing the initialization, the synchronization instance’s status is Synchronizing. The synchronization link between the source and target instances is established.
- Validate data
After completing all of the preceding steps, log on to the ES console to check the corresponding indexes created in your ES instances and the synchronized data.