Alibaba Cloud Elasticsearch is fully compatible with open-source Elasticsearch and offers commercial features such as Security, Machine Learning, Graph, and APM. Designed for data analytics and search scenarios, it supports enterprise-level access control, security monitoring and alerts, and automatic report generation. You can use Data Transmission Service (DTS) to synchronize data from a self-managed MySQL database on an ECS instance to an Alibaba Cloud Elasticsearch instance to quickly build your datasets.
Prerequisites
-
You have created a destination Alibaba Cloud Elasticsearch instance of version 5.5, 5.6, 6.3, 6.7, or 7.x. For more information, see Create an Alibaba Cloud Elasticsearch instance.
-
The self-managed MySQL database is version 5.1, 5.5, 5.6, 5.7, or 8.0.
Notes
-
During initial full data synchronization, DTS consumes read and write resources from the source and destination databases, which increases the database load. If database performance is poor, instance specifications are low, or business traffic is heavy (for example, the source database has many slow SQL queries or tables without primary keys, or the destination database experiences deadlocks), the database load increases and may even cause the service to become unavailable. Before you synchronize data, evaluate the performance of your source and destination instances. We recommend performing data synchronization during off-peak hours, for example, when the CPU utilization of both instances is below 30%.
-
DTS does not synchronize DDL operations. If a DDL operation is performed on a table in the source database during data synchronization, you must remove the table from the synchronization objects, delete the corresponding index from the Alibaba Cloud Elasticsearch instance, and then add the table back to the synchronization objects. For more information, see Remove synchronization objects and Add synchronization objects.
-
To add columns to a table that you want to synchronize, first modify the mapping of the corresponding index in the Alibaba Cloud Elasticsearch instance. Then, run the DDL operation in the source MySQL database. Finally, pause and restart the DTS instance.
-
If data synchronized from a MySQL instance to an Alibaba Cloud Elasticsearch instance contains empty string values, the data is converted to the LONG type before it is written to the Alibaba Cloud Elasticsearch instance. This causes the task to fail.
-
If data synchronized from a MySQL instance to an Alibaba Cloud Elasticsearch instance contains location information and the latitude and longitude are stored in reverse order, an error occurs when the data is written to the Alibaba Cloud Elasticsearch instance.
-
Development and test specifications of Elasticsearch instances are not supported.
Billing
Synchronization type | Pricing |
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported SQL operations
INSERT, DELETE, UPDATE
Data type mapping
-
A source database and an Elasticsearch instance support different data types that cannot always be mapped directly. During structure initialization, DTS maps data types based on those supported by the target Elasticsearch instance. For more information, see Data type mapping for structure initialization.
NoteDuring the DTS schema migration process, DTS does not set the
mapping中的dynamicparameter. The behavior of this parameter depends on the settings of your Elasticsearch instance. If your source data is of the JSON type, you must ensure that for a specific key, its corresponding values have the same data type across all rows in a table. Otherwise, DTS may encounter synchronization issues. For more information, see dynamic. -
The mapping between Elasticsearch and a relational database varies by Elasticsearch version.
ImportantStarting with Elasticsearch 7.0, an index no longer supports multiple types, and types were completely removed in Elasticsearch 8.0. By default, when you configure a synchronization or migration task, DTS maps a table from a relational database to an index in Elasticsearch. You can change this mapping when you configure the objects to synchronize or migrate.
Elasticsearch 7.0 and later
Elasticsearch
Relational database
index
table
document
row
field
column
mapping
schema
Versions before Elasticsearch 7.0
Elasticsearch
Relational database
index
database
type
table
document
row
field
column
mapping
schema
Preparations
Create a database account for a self-managed MySQL database and configure binary logging
Procedure
-
Purchase a data synchronization instance. For more information, see Purchase procedure.
NoteWhen you purchase the instance, set the source instance to MySQL, the destination instance to Elasticsearch, and the synchronization topology to One-way Synchronization.
-
Log on to the DTS console.
NoteIf you are automatically redirected to the Data Management (DMS) console, you can click the
icon in the lower-right corner and then click
to return to the classic DTS console. -
In the left-side navigation pane, click Data Synchronization.
-
At the top of the Synchronization Tasks page, select the region where your destination instance is located.
-
Find the data synchronization task that you purchased and click Configure Task.
-
Configure the source and destination instances.
Category
Configuration
Description
N/A
Synchronization task name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source instance
Instance type
Select Self-managed database on an ECS instance.
Instance region
The region you selected for the source instance during purchase. This parameter cannot be changed.
ECS instance ID
Select the ID of the ECS instance where your self-managed MySQL database is deployed.
Database type
This is fixed as MySQL and cannot be changed.
Port
Enter the port number of the self-managed MySQL database.
Database account
Enter the account of the self-managed MySQL database.
NoteThe account must have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions on all synchronization objects.
Database password
Enter the password for the database account.
Destination instance
Instance type
This is fixed as Elasticsearch and cannot be changed.
Instance region
The region of the destination instance that you selected when you purchased the data synchronization instance. This parameter cannot be changed.
Elasticsearch
Select the ID of the Alibaba Cloud Elasticsearch instance.
Database account
Enter the database account used to connect to the Alibaba Cloud Elasticsearch instance. The default value is elastic.
Database password
Enter the password for the database account.
-
In the lower-right corner of the page, click Set Whitelist and Next.
If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers to the IP address whitelist of the instance. If the source or destination database is a self-managed database hosted on an Elastic Compute Service (ECS) instance, DTS automatically adds the CIDR blocks of DTS servers to the security group rules of the ECS instance, and you must make sure that the ECS instance can access the database. If the self-managed database is hosted on multiple ECS instances, you must manually add the CIDR blocks of DTS servers to the security group rules of each ECS instance. If the source or destination database is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers to the IP address whitelist of the database to allow DTS to access the database. For more information, see Whitelist DTS server IP addresses.
WarningAdding the public IP address blocks of the DTS service, either automatically or manually, may pose security risks. Using this product, you acknowledge that you understand and accept the potential security risks and that you must implement basic security measures. These measures include, but are not limited to, strengthening password security, limiting the ports open to each CIDR block, using authentication for internal API calls, and regularly checking and restricting unnecessary CIDR blocks. Alternatively, you can connect through a private network using a leased line, VPN Gateway, or Smart Access Gateway.
-
Configure the index name, the handling mode for existing tables, and the synchronization objects.
Configuration
Description
Index name
-
Table Name
If you select Table Name, the name of the index created in the destination Alibaba Cloud Elasticsearch instance is the same as the table name. In this example, the index name is
customer. -
Database Name_Table Name
If you select Database Name_Table Name, the name of the index created in the destination Alibaba Cloud Elasticsearch instance is in the format of
DatabaseName_TableName. In this example, the index name isdtstestdata_customer.
Handling mode for existing tables
-
Precheck and Report Errors: DTS checks for an existing index with the same name in the destination database. If one is found, the precheck fails, and the task does not start. Otherwise, the precheck passes.
NoteIf you cannot delete or rename the index with the same name in the destination database, you can specify a new name for the synchronization object in the destination instance to avoid name conflicts.
-
Ignore Errors and Proceed: DTS skips the check for indexes with the same name in the destination database.
WarningIf you select Ignore Errors and Proceed, data inconsistency may occur and expose your business to risks. For example:
-
If the mappings are consistent and a record in the destination database has the same primary key value as a record in the source database, the record in the destination database is retained during full data synchronization. During incremental data synchronization, it is overwritten by changes from the source.
-
If the mappings are inconsistent, the initial data synchronization may fail, result in partial data, or fail entirely.
-
Select synchronization objects
In the Source Objects box, click the object that you want to synchronize, and then click the
icon to move it to the Selected Objects box.You can select databases and tables as synchronization objects.
Change a mapping name
Change the names of synchronized objects in the destination instance. For more information, see Map databases, tables, and columns.
Replicate temporary tables during DMS online DDL
If you use Data Management (DMS) to perform online DDL changes on the source database, you can choose whether to synchronize the temporary tables generated by the DDL changes.
-
Yes: Synchronizes the temporary tables generated by online DDL changes.
NoteIf a large amount of temporary table data is generated by online DDL changes, the data synchronization task may be delayed.
-
No: Does not synchronize the temporary tables generated by online DDL changes. Only the original DDL operations from the source database are synchronized.
NoteThis option causes tables in the destination database to be locked.
Retry duration for network instability
If DTS cannot connect to the source or destination instance, it retries for 720 minutes (12 hours) by default. You can also specify a custom retry duration. If DTS reconnects to the source or destination instance within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.
NoteYou are billed for task run time during connection retries. Customize the retry duration based on your business needs, or release the DTS instance as soon as the source and destination instances are released.
-
-
In the Selected Objects box, hover over the table that you want to synchronize and click Edit. Configure the index name, type name, and other information for the table in the destination Alibaba Cloud Elasticsearch instance.
Parameter
Description
Index name
For more information, see Elasticsearch basic concepts.
Warning-
Index names and type names can contain only underscores (_) as special characters.
-
To synchronize multiple source tables with the same structure into a single destination index, you must configure the same destination index and type name for each source table. Otherwise, the task might fail or data could be lost.
Type name
Filter condition
You can set SQL filter conditions to filter the data to be synchronized. Only data that meets the filter conditions is synchronized to the destination instance. For more information, see Filter data by using SQL conditions.
Partition
Select whether to configure a partition. If you select Yes, you must also set Partition key column and Number of partitions.
Set _routing
The
_routingsetting can route a document to a specific shard of the destination Alibaba Cloud Elasticsearch instance. For more information, see _routing.-
Select Yes to use custom columns for routing.
-
Select No to use the
_idfor routing.
NoteIf the destination Alibaba Cloud Elasticsearch instance is version 7.x, you must select No.
_id value
-
Primary key column of table
A composite primary key is merged into a single column.
-
Business key
If you select Business key, you must also set the corresponding Business key column.
Add parameter
Select the required Parameter and Value. For more information about the field parameters and their values, see the official Elasticsearch documentation.
NoteDTS supports only the parameters that can be selected from the drop-down list.
-
-
After completing the preceding configurations, click Precheck and Start in the lower-right corner of the page.
Note-
A precheck runs before the synchronization task starts, and you can only start the task after it passes.
-
If the precheck fails, click the
icon next to the failed item to view the details.-
You can fix the issues based on the cause and run the precheck again.
-
If you do not need to fix the items that triggered warnings, you can click Ignore or Ignore Warnings and Rerun Precheck to skip the warnings and run the precheck again.
-
-
-
After the Precheck dialog box displays Precheck Passed, close the Precheck dialog box. The synchronization task starts automatically.
-
Wait for the task to finish initialization and enter the Synchronizing state.
You can view the status of the data synchronization task on the Data Synchronization page.
View the synchronized index and data
Once the task enters the Synchronizing state, connect to the Alibaba Cloud Elasticsearch instance to confirm that the synchronized index and data meet your requirements. In this example, the Elasticsearch-Head plug-in is used.
If the results do not meet your requirements, you can delete the index and its data, and then reconfigure the data synchronization task.
On the Browser tab of the Head plug-in, you will see that the dtstestdata_customer and dtstestdata_order indices exist. The table displays the synchronized records with fields such as address, id, and name. Click a record to view its raw JSON data.