If slow queries on ApsaraDB RDS are affecting your analytics workload, syncing data to Alibaba Cloud Elasticsearch lets you run fast, scalable search and analytics without load on your production database. Alibaba Cloud Elasticsearch is a Lucene-based, distributed search and analytics engine that lets you store, query, and analyze large amounts of datasets in near real time. Four tools support this workflow: Data Transmission Service (DTS), Logstash, DataWorks, and Canal.
Choose a synchronization method
Pick the method that matches your latency and infrastructure requirements:
-
Millisecond-level latency, fully managed: Use DTS. No extra infrastructure required.
-
Millisecond-level latency, self-managed: Use Canal. Same latency as DTS, but you must build and operate a Canal environment on an Elastic Compute Service (ECS) instance.
-
Second-level latency, custom query: Use Logstash with the logstash-input-jdbc plug-in. Polls the database on a schedule; lets you define the query.
-
Minute-level interval, offline or batch: Use DataWorks. Best for large-volume offline data with WHERE clause filtering.
The table below covers each method in detail.
| Synchronization method | Mechanism | When to use | Limits | Tutorial |
|---|---|---|---|---|
| DTS | Subscribes to binary logs. Latency is millisecond-level with no impact on the source database. During full data initialization, DTS uses read and write resources on both the source database and destination cluster, which may increase load. | You need millisecond-level, real-time data sync without managing sync infrastructure. |
|
Sync MySQL data to Elasticsearch in real time using DTS |
| Logstash (logstash-input-jdbc plug-in) | Runs a polling loop that queries the database at a regular interval for records inserted or updated since the last poll, then writes them to Elasticsearch. Latency is second-level. |
|
|
Sync data from ApsaraDB RDS for MySQL to Elasticsearch using Logstash |
| DataWorks | A comprehensive data integration service that provides modules such as Data Integration, DataStudio, and Data Quality. You can use DataWorks to import, transform, and sync structured data to Elasticsearch. Minimum sync interval is minutes. |
|
|
Sync data from a MySQL database to Elasticsearch using DataWorks |
| Canal | Subscribes to binary logs. Latency is millisecond-level with no impact on the source database. | You need millisecond-level, real-time data sync and are comfortable managing your own sync infrastructure. |
|
Sync MySQL data to Alibaba Cloud Elasticsearch using Canal |