Elasticsearch:Select a synchronization method - Elasticsearch

If slow queries on ApsaraDB RDS are affecting your analytics workload, syncing data to Alibaba Cloud Elasticsearch lets you run fast, scalable search and analytics without load on your production database. Alibaba Cloud Elasticsearch is a Lucene-based, distributed search and analytics engine that lets you store, query, and analyze large amounts of datasets in near real time. Four tools support this workflow: Data Transmission Service (DTS), Logstash, DataWorks, and Canal.

Choose a synchronization method

Pick the method that matches your latency and infrastructure requirements:

Millisecond-level latency, fully managed: Use DTS. No extra infrastructure required.
Millisecond-level latency, self-managed: Use Canal. Same latency as DTS, but you must build and operate a Canal environment on an Elastic Compute Service (ECS) instance.
Second-level latency, custom query: Use Logstash with the logstash-input-jdbc plug-in. Polls the database on a schedule; lets you define the query.
Minute-level interval, offline or batch: Use DataWorks. Best for large-volume offline data with WHERE clause filtering.

The table below covers each method in detail.

Synchronization method	Mechanism	When to use	Limits	Tutorial
DTS	Subscribes to binary logs. Latency is millisecond-level with no impact on the source database. During full data initialization, DTS uses read and write resources on both the source database and destination cluster, which may increase load.	You need millisecond-level, real-time data sync without managing sync infrastructure.	Full data initialization increases load on the source database and destination cluster. Index mappings are customizable, but mapped fields must match the source database schema. Requires a paid data synchronization instance. See Purchase a DTS instance for purchasing steps and Billing overview for pricing details.	Sync MySQL data to Elasticsearch in real time using DTS
Logstash (logstash-input-jdbc plug-in)	Runs a polling loop that queries the database at a regular interval for records inserted or updated since the last poll, then writes them to Elasticsearch. Latency is second-level.	You can tolerate a few seconds of latency and want to sync full data. You want to query a specific subset of data on a schedule.	Upload a JDBC driver compatible with your ApsaraDB RDS version before use. Add the IP addresses of Logstash cluster nodes to the ApsaraDB RDS IP address whitelist. The Logstash cluster and ApsaraDB RDS instance must be in the same time zone to avoid inconsistent timestamps. The _id field in Elasticsearch must match the id field in ApsaraDB RDS. Records in ApsaraDB RDS must include a field that stores the insertion or update time.	Sync data from ApsaraDB RDS for MySQL to Elasticsearch using Logstash
DataWorks	A comprehensive data integration service that provides modules such as Data Integration, DataStudio, and Data Quality. You can use DataWorks to import, transform, and sync structured data to Elasticsearch. Minimum sync interval is minutes.	You need to sync large volumes of offline data at minute-level intervals. You want to filter data using a WHERE clause. You want to sync all data in your ApsaraDB RDS database.	Activate the DataWorks service before use. For high-throughput or complex network environments, customize resource groups. Add the IP address of your resource group to the ApsaraDB RDS IP address whitelist.	Sync data from a MySQL database to Elasticsearch using DataWorks
Canal	Subscribes to binary logs. Latency is millisecond-level with no impact on the source database.	You need millisecond-level, real-time data sync and are comfortable managing your own sync infrastructure.	Requires building a Canal environment on an ECS instance, which adds operational overhead and cost. Canal 1.1.4 cannot sync data to Elasticsearch V7.X clusters. Use Canal 1.1.5, Logstash, or DTS instead. Index mappings are customizable, but mapped fields must match the source database schema. Canal availability must be maintained — ECS instance restarts or Canal exceptions can interrupt sync and affect your business.	Sync MySQL data to Alibaba Cloud Elasticsearch using Canal