Released by ELK Geek
Data Transmission Service (DTS) synchronizes production data from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch instance in real-time after you create a real-time data synchronization task in the DTS console. This article focuses on the various supported real-time synchronization types and SQL operations. It further elaborates on the configuration procedure to support synchronization.
Supported SQL operations include:
Note: Currently, DTS does not support Data Definition Language (DDL) synchronization and ignores DDL operations during synchronization.
In case a DDL operation of a table is encountered, the Data Manipulation Language (DML) operation of the table may fail. To rectify the failure, follow the steps below:
1) Delete the table from the synchronization object list. For more information, see "Remove an object from a data synchronization task.
2) Delete the index corresponding to this table in the Alibaba Cloud Elasticsearch instance.
3) Modify the synchronization task, re-add the table to the synchronization objects, and re-initialize the table. For more information, see "Add an object to a data synchronization task."
If a DDL operation is used to modify a table and add a column, the recommended DDL operation sequence is as follows:
1) Modify the mapping of the corresponding table and add a new column in the Alibaba Cloud Elasticsearch instance.
2) Modify the table structure and add a column in the source ApsaraDB RDS for MySQL instance.
3) Pause and restart the DTS synchronization task to allow DTS to reload the modified mapping in the Alibaba Cloud Elasticsearch instance.
Log on to the DTS console and click Data Synchronization in the left-side navigation pane. Click Create Data Synchronization Task in the upper-right corner and purchase a synchronization channel. Then, return to the DTS console to configure the synchronization channel.
Note: Purchase a synchronization channel before configuring it. Either select the monthly subscription or pay-as-you-go billing mode for the synchronization channel.
Parameters on the Synchronization Channel Purchase Page
Note: The DTS console displays synchronization instances by region. The region where the purchased synchronization instance is located is its destination region. For example, the synchronization instance here is used to synchronize data from the Hangzhou ApsaraDB RDS for MySQL instance to the Hangzhou Alibaba Cloud Elasticsearch instance. Therefore, it is located in Hangzhou. Go to the synchronization instance list for Hangzhou, find the purchased synchronization instance, and click Configure Synchronization Task next to the instance to configure the instance.
After completing the preceding configurations, click Set Whitelist and Next to add whitelists for the ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch instances.
Note: DTS automatically adds a whitelist or security group for an ApsaraDB RDS for MySQL instance.
If the source instance is an ApsaraDB RDS for MySQL instance, DTS adds its IP address to the security group of the RDS instance whitelist to prevent a synchronization task creation failure due to the DTS server's failure to connect to the database. To guarantee the stability of a synchronization task, do not delete the server IP addresses from the security group of the RDS instance whitelist during synchronization. After the whitelist is configured, click Next to create a synchronization account.
After the whitelist is configured, select the objects to be synchronized. In this step, configure the list of objects to be synchronized and the index naming rules.
1) Set Index Name to Table Name or DatabaseName_TableName.
2) Select the list of databases to be synchronized. The objects for real-time synchronization can be tables. This means you may select specific databases or tables to synchronize.
3) The docid is the primary key of all tables by default. If some tables do not have a primary key, configure a source table column corresponding to docid. In the Selected box on the right, move your cursor over the corresponding table and click Editcto enter the advanced settings page of the table.
4) On the advanced settings page, configure Index Name, Type Name, IsPartion, and _id value. If _id value is set to the primary key of the table, select the corresponding business primary key column.
After the objects to be synchronized are configured, proceed to the Advanced Settings step.
Main configuration settings include the following fields:
1) Synchronization Initialization: We recommend selecting Schema Initialization and Full Data Initialization so that DTS automatically creates indexes and initializes full data. If you do not select Schema Initialization, manually define the mapping for indexes in the Alibaba Cloud Elasticsearch instance before creating a synchronization task. If you do not select Full Data Initialization, DTS synchronizes incremental data starting from the time when synchronization is started.
2) Shard Configuration: Five shards and one replica are configured by default. Adjust the configuration according to specific business requirements. Once the configuration has been changed, all indexes will define shards according to the new configuration.
3) String Definition: Select a string analyzer whose default value is Standard Analyzer. Valid values include Standard Analyzer, Simple Analyzer, Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, English Analyzer, and Fingerprint Analyzer. The string fields of all indexes define Analyzer according to this configuration.
4) Time Zone: Configure the time zone for the time fields that are synchronized to the Alibaba Cloud Elasticsearch instance. The default time zone is UTC+8.
After a synchronization task precheck is configured, DTS performs a precheck prior to synchronization. After the precheck is passed, click Start to start the synchronization task.
After the synchronization task is started, the synchronization task list appears. The task just that started is in the synchronization initialization state. The initialization time depends on the data volume of the objects to be synchronized in the source instance. After initialization is completed, the synchronization channel is in the synchronizing state, and the synchronization channel between the source and destination instances is established.
After verifying that the preceding tasks have been completed, log on to the Elasticsearch console and check whether the corresponding index has been created in your Elasticsearch instance and whether the synchronized data is as expected.
Alibaba Clouder - March 1, 2021
Alibaba Clouder - May 28, 2019
Alibaba Clouder - January 29, 2021
Alibaba Clouder - July 9, 2020
ApsaraDB - November 17, 2020
Alibaba Clouder - January 12, 2021
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
This all-in-one omnichannel data solution helps brand merchants formulate brand strategies, monitor brand operation, and increase customer base.Learn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
More Posts by Alibaba Clouder