Migrate data from PolarDB for MySQL to Elasticsearch - Data Transmission Service

Prerequisites

You have created the target Elasticsearch instance. For detailed instructions, see Basic Edition: From Instance Creation to Data Retrieval.
The storage space of the destination Elasticsearch instance must be larger than the storage space used by the source PolarDB for MySQL instance.

Notes

Type	Description
Source database restrictions	Bandwidth requirements: The server that hosts the source database must have sufficient outbound bandwidth. Otherwise, the data migration speed is affected. The tables to be migrated must have primary keys or UNIQUE constraints, and the fields in the tables must be unique. Otherwise, duplicate data may exist in the destination database. If you migrate objects at the table level and need to edit them, such as by mapping column names, a single migration task can migrate a maximum of 1,000 tables. To migrate more than 1,000 tables, split the tables into multiple migration tasks. You can also configure a migration task for the entire database. Otherwise, a request error may be reported after you submit the task. If you perform incremental migration: You must enable binary logging and set the loose_polar_log_bin parameter to on. Otherwise, the precheck reports an error and the data migration task cannot start. For more information about how to enable binary logging and modify parameters, see Enable binary logging and Modify parameters. Note Enabling binary logging for a PolarDB for MySQL cluster consumes storage space and incurs storage fees. The binary logs of the PolarDB for MySQL cluster must be retained for at least 3 days. We recommend a retention period of 7 days. Otherwise, DTS may fail to obtain the binary logs, which can cause the task to fail. In extreme cases, this can lead to data inconsistency or data loss. Issues caused by a binary log retention period shorter than the DTS requirement are not covered by the DTS Service-Level Agreement (SLA). Note For more information about how to set the retention period for binary logs of a PolarDB for MySQL cluster, see Modify the retention period. Limits on operations in the source database: During schema migration, do not perform DDL operations to change the schemas of databases or tables. Otherwise, the data migration task fails.
Other restrictions	To add a column to a table to be migrated in the source database, first modify the mapping of the corresponding table in the Elasticsearch instance. Then, perform the DDL operation in the source database. Finally, pause and restart the migration task. You cannot migrate data to an index that has a parent-child relationship or a Join field type mapping in the destination instance. Otherwise, the task may become abnormal or data queries in the destination instance may fail. You cannot migrate data from read-only nodes of the source PolarDB for MySQL instance. DTS does not support the migration of OSS external tables from the source PolarDB for MySQL instance. Migration of INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK is not supported. DTS does not support primary/standby switchover scenarios for the database instance during full data migration. In such a scenario, reconfigure the migration task promptly. During full data migration, DTS consumes read and write resources of the source and destination databases. This may increase the loads on the database servers. Before you migrate data, evaluate the performance of the source and destination databases. We recommend that you migrate data during off-peak hours. Do not use tools such as pt-online-schema-change to perform online DDL operations on migration objects in the source database. Otherwise, the migration fails. For columns of the FLOAT or DOUBLE data type, DTS uses `ROUND(COLUMN,PRECISION)` to read the values of these columns. If you do not define the precision, DTS uses a precision of 38 for FLOAT columns and 308 for DOUBLE columns. Confirm that the migration precision meets your business requirements. DTS tries to resume tasks that failed within the last seven days. Before you switch your business to the destination instance, end or release the instance. You can also run the `REVOKE` command to revoke the write permissions from the database account that DTS uses to access the destination instance. This prevents the task from being automatically resumed, which would cause data in the source instance to overwrite data in the destination instance. The data types supported by PolarDB for MySQL and Elasticsearch are different. Therefore, the data types cannot be mapped one-to-one. During initial schema synchronization, DTS maps the data types based on the data types supported by the destination database. For more information, see Data type mappings for initial schema synchronization. Development and test specifications of Elasticsearch instances are not supported. If a task fails, DTS support staff will attempt to restore it within eight hours. During restoration, they may restart the task or adjust its parameters. Note Only DTS task parameters are modified—not database parameters. Parameters that may be adjusted include those listed in Modify instance parameters.
Other notes	DTS periodically runs the CREATE DATABASE IF NOT EXISTS `test` command on the source database to advance the binary log offset.

Billing

Migration type	Instance configuration fee	Internet traffic fee
Schema migration and full data migration	Free of charge.	When the Access Method parameter of the destination database is set to Public IP Address, you are charged for Internet traffic. For more information, see Billing overview.
Incremental data migration	Charged. For more information, see Billing overview.

Migration types

Schema migration

DTS migrates the schema definitions of the migration objects from the source database to the destination database.
Full migration

DTS migrates all historical data of the specified migration objects from the source database to the destination database.
Incremental migration

After a full migration is complete, DTS migrates incremental data updates from the source database to the destination database. Incremental migration lets you smoothly migrate data without interrupting your self-managed applications.

Supported SQL operations for incremental migration

Operation type	SQL statement
DML	INSERT, UPDATE, DELETE Note Operations that use the UPDATE statement to remove fields are not supported.

Database account permissions

Database	Permission requirements	Method to create an account and grant permissions
PolarDB for MySQL cluster	Read permissions on the migration objects	Create and manage database accounts

Data type mappings

Because source databases and Elasticsearch instances support different data types, data types cannot always be mapped directly. During initial schema synchronization, DTS maps data types based on the types that the destination Elasticsearch instance supports. For more information, see Data type mappings for initial schema synchronization.
Note
DTS does not set the mapping parameter in the dynamic during schema migration. The behavior of this parameter depends on your Elasticsearch instance settings. If your source data is in JSON format, ensure that the values for the same key have the same data type across all rows in a table. Otherwise, DTS may report synchronization errors. For more information, see dynamic.
The following table describes the mappings between Elasticsearch and relational databases.
Elasticsearch
Relational database
Index
Database
Type
Table
Document
Row
Field
Column
Mapping
Database schema

Procedure

Navigate to the migration task list page for the destination region using one of the following methods.
From the DTS console
1. Log on to the Data Transmission Service (DTS) console.
2. In the navigation pane on the left, click Data Migration.
3. In the upper-left corner of the page, select the region where the migration instance is located.
From the DMS console

Note
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
1. Log on to the Data Management (DMS) console.
2. In the top menu bar, choose Data + AI > Data Transmission (DTS) > Data Migration.
3. To the right of Data Migration Tasks, select the region where the migration instance is located.
Click Create Task to navigate to the task configuration page.

Configure the source and destination databases.

Category	Configuration	Description
None	Task Name	DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.
Source Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select PolarDB for MySQL.
	Connection Type	Select Cloud Instance.
	Instance Region	Select the region where the source PolarDB for MySQL instance resides.
	Cross-account	This example demonstrates migration within the same Alibaba Cloud account. You can select Within The Same Account.
	PolarDB Instance ID	Select the ID of the source PolarDB for MySQL instance.
	Database Account	Enter the database account of the source PolarDB for MySQL instance. For more information about the permission requirements, see Database account permissions.
	Database Password	Enter the password for the database account.
	Encryption	You can select an option that meets your needs. For details about the Secure Sockets Layer (SSL) encryption feature, see Set SSL encryption.
Destination Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select Elasticsearch.
	Connection Type	Select Cloud Instance.
	Instance Region	Select the region where the destination Elasticsearch instance resides.
	Type	Select Cluster Edition or Serverless as needed.
	Instance ID	Select the ID of the destination Elasticsearch instance.
	Database Account	Enter the database account of the Elasticsearch instance. The default value is elastic.
	Database Password	Enter the logon password that you specified when you created the Elasticsearch instance.
	Encryption	Select HTTP or HTTPS as needed.

After you complete the configuration, click Test Connectivity and Proceed at the bottom of the page.
Note
- Ensure that the IP address segment of the DTS service is automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add DTS server IP addresses to a whitelist.
- If the source or destination database is a self-managed database (the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box that appears.

Configure the task objects.

On the Configure Objects page, configure the objects that you want to migrate.

Configuration	Description
Migration Types	If you only need to perform a full migration, select both Schema Migration and Full Data Migration. To perform a migration with no downtime, select Schema Migration, Full Data Migration, and Incremental Data Migration. Note If you do not select Schema Migration, you must ensure that a database and tables to receive the data exist in the destination database. You can also use the object name mapping feature in the Selected Objects box as needed. If you do not select Incremental Data Migration, do not write new data to the source instance during data migration to ensure data consistency.
Processing Mode for Existing Destination Tables	Precheck and Report an Error: Checks whether a table with the same name exists in the destination database. If no such table exists, the check passes. If a matching table exists, the precheck reports an error and the data synchronization task does not start. Note If deleting or renaming the conflicting table in the destination database is inconvenient, you can rename the table in the destination database instead. For more information, see Map table and column names. Ignore the Error and Continue: Skips the check for tables with the same name in the destination database. Warning Selecting Ignore the Error and Continue may result in data inconsistency and pose risks to your business. For example: If the table schemas match and a record with the same primary key value as in the source database exists in the destination database: During full migration, DTS retains the existing record in the destination cluster and does not synchronize the corresponding record from the source database. During incremental migration, DTS overwrites the existing record in the destination cluster with the record from the source database. If the table schemas differ, data initialization may fail, only some columns may be synchronized, or synchronization may fail entirely.
Index Name	Table Name If you select Table Name, the index created in the destination Elasticsearch instance uses the same name as the source table. DatabaseName_TableName If you select DatabaseName_TableName, the index created in the destination Elasticsearch instance follows the DatabaseName_TableName naming convention.
Case Policy for Destination Object Names	You can configure how DTS handles case sensitivity for database, table, and column names in the destination instance. By default, DTS Default Policy is selected. You can also choose a policy consistent with the default behavior of either the source or destination database. For more information, see Case-sensitivity policy for object names in the destination database.
Source Objects	Select one or more objects from the Source Objects section. Click the icon and add the objects to the Selected Objects section. Note You can select databases and tables as migration objects. If you select tables, other objects—such as views, triggers, and stored procedures—are not migrated to the destination database.
Selected Objects	To rename an object that you want to migrate to the destination instance, right-click the object in the Selected Objects section. For more information, see Individual table column mapping. To rename multiple objects at a time, click Batch Edit in the upper-right corner of the Selected Objects section. For more information, see Map multiple object names at a time. Note Only underscores (_) are supported as special characters in index names and type names. To apply a WHERE clause for data filtering—or to set the index name, type name, or column names after migration—right-click the table to be migrated in the Selected Objects box and configure the settings in the dialog box that appears. For more information about setting filter conditions, see Set filter conditions. To specify which SQL operations to migrate at the database or table level, right-click the migration object in the Selected Objects box and select the desired operations in the dialog box that appears. For more information about supported operations, see Supported SQL operations for incremental migration.

Click Next: Advanced Settings to configure advanced parameters.

Configuration	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks on a shared cluster. You do not need to select one. If you want more stable tasks, you can purchase a dedicated cluster to run DTS migration tasks.
Retry Time for Failed Connections	After the migration task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can customize the retry time to a value from 10 to 1440 minutes. We recommend that you set the duration to more than 30 minutes. If DTS reconnects to the source and destination databases within the specified duration, the migration task automatically resumes. Otherwise, the task fails. Note For multiple DTS instances that share the same source or destination, the network retry time is determined by the setting of the last created task. Because you are charged for the task during the connection retry period, we recommend that you customize the retry time based on your business needs, or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues	After the migration task starts, if a non-connectivity issue, such as a DDL or DML execution exception, occurs in the source or destination database, DTS reports an error and immediately begins to retry the operation. The default retry duration is 10 minutes. You can customize the retry time to a value from 1 to 1440 minutes. We recommend that you set the duration to more than 10 minutes. If the related operations succeed within the specified retry duration, the migration task automatically resumes. Otherwise, the task fails. Important The value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.
Enable Throttling for Full Data Migration	During full migration, DTS consumes read and write resources on the source and destination databases, which may increase the database load. If required, you can enable throttling for the full migration task. You can set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce the load on the destination database. Note This configuration item is available only if you select Full Data Migration for Migration Types. You can also adjust the full migration speed after the migration instance is running.
Enable Throttling for Incremental Data Migration	If required, you can also choose to set speed limits for the incremental migration task. You can set RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s) to reduce the load on the destination database. Note This configuration item is available only if you select Incremental Data Migration for Migration Types. You can also adjust the incremental migration speed after the migration instance is running.
Environment Tag	You can optionally select an environment tag to identify the instance. In this example, no tag is required.
Sharding Configuration	Set the number of primary shards and replica shards for the index based on the maximum shard configuration allowed by the destination Elasticsearch instance.
String Index	Specify how strings migrated to the destination Elasticsearch instance are indexed. analyzed: The string is analyzed before indexing. You must also select a specific analyzer. For more information about analyzers, see Analyzers. not analyzed: The string is not analyzed. Its original value is indexed directly. no: The string is not indexed.
Time Zone	When DTS migrates time-related data types—such as DATETIME and TIMESTAMP—to the destination Elasticsearch instance, you can select a time zone. Note If time-zone information is not required for these data types in the destination instance, configure the document type (type) for them in the destination instance before synchronization.
DOCID	By default, the DOCID corresponds to the table’s primary key. If the table has no primary key, Elasticsearch automatically generates an ID column as the DOCID.
Whether to delete SQL operations on heartbeat tables of forward and reverse tasks	Choose whether DTS writes heartbeat SQL information to the source database while the instance is running. Yes: Does not write heartbeat SQL information to the source database. The DTS instance may display latency. No: Writes heartbeat SQL information to the source database. This may interfere with source database operations like physical backups and cloning.
Configure ETL	Based on your business needs, select whether to configure the ETL feature to process data. Yes: Configures the ETL feature. You must also enter data processing statements in the text box. No: Does not configure the ETL feature.
Monitoring and Alerting	Select whether to set alerts and receive alert notifications based on your business needs. No: Does not set an alert. Yes: Configure alerts by setting an alert threshold and an alert notifications. If a migration fails or the latency exceeds the threshold, the system sends an alert notification.

After completing the preceding configurations, click Next: Configure Table and Field Settings to define the _routing policy and _id value for the tables being migrated to the destination Elasticsearch instance.

Name

Description

Is _routing Set?

You can set _routing to route and store documents on specific shards of the destination Elasticsearch instance. For more information, see _routing.

If you select Yes, you can use custom columns for routing.
If you select No, _id is used for routing.

Note

If the destination Elasticsearch instance that you created is of version 7.x, select No.

_id Value

Primary Key Column

Composite primary keys are merged into one column.
Business Primary Key

If you select Business Primary Key, you must also set the corresponding Business Primary Key Column.

Save the task and run a precheck.
- To view the parameters for configuring this instance when you call the API operation, move the pointer over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble that appears.
- If you do not need to view or have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
Note
- Before the migration task starts, DTS performs a precheck. The task starts only after it passes the precheck.
- If the precheck fails, click View Details next to the failed check item, fix the issue based on the prompt, and then run the precheck again.
- If a warning is reported during the precheck:
  
  For check items that cannot be ignored, click View Details next to the failed item, fix the issue based on the prompt, and then run the precheck again.
  
  For check items that can be ignored, you can click Confirm Alert Details, Ignore, OK, and Precheck Again to skip the alert item and run the precheck again. If you choose to ignore a warning, it may cause issues such as data inconsistency and pose risks to your business.

Purchase the instance.

When the Success Rate is 100%, click Next: Purchase Instance.

On the Purchase page, select the link specification for the data migration instance. For more information, see the following table.

Category	Parameter	Description
New Instance Class	Resource Group Settings	Select the resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
New Instance Class	Instance Class	DTS provides migration specifications with different performance levels. The link specification affects the migration speed. You can select a specification based on your business scenario. For more information, see Data migration link specifications.

After the configuration is complete, read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the OK dialog box that appears, click OK.

You can view the progress of the migration task on the Data Migration Tasks list page.
Note
- If the migration task does not include incremental migration, it stops automatically after the full migration is complete. After the task stops, its Status changes to Completed.
- If the migration task includes incremental migration, it does not stop automatically. The incremental migration task continues to run. While the incremental migration task is running, the Status of the task is Running.

Elasticsearch	Relational database
Index	Database
Type	Table
Document	Row
Field	Column
Mapping	Database schema