Select a synchronization method - Elasticsearch - Alibaba Cloud Documentation Center

If your business data is stored in PolarDB-X 1.0 and you want to perform full-text searches and semantic analytics on the data, you can synchronize the data to Alibaba Cloud Elasticsearch. Alibaba Cloud Elasticsearch is a distributed search and analytics engine based on Lucene. Alibaba Cloud Elasticsearch allows you to store, query, and analyze large amounts of datasets in near real time. You can use Logstash or DataWorks to synchronize data from PolarDB-X 1.0 to Alibaba Cloud Elasticsearch. This topic describes the use scenarios of each synchronization method. You can select a synchronization method based on your business requirements.


Synchronization method	Description	Use scenario	Limit	References
Use the logstash-input-jdbc plug-in to synchronize data	You can use the logstash-input-jdbc plug-in to query multiple data records in a PolarDB-X 1.0 database and synchronize the data to an Elasticsearch cluster. During data synchronization, the plug-in uses a round-robin method to identify the most recently inserted or updated data records in the PolarDB-X 1.0 database on a regular basis. Then, the plug-in queries all identified data records at a time and synchronizes the data records to an Elasticsearch cluster.	You want to synchronize full data and can accept latency of a few seconds. You want to query specific data at a time and synchronize the data.	Before you can use this method, you must upload an SQL Java Database Connectivity (JDBC) driver for the Logstash cluster. The version of the driver must be compatible with the version of the PolarDB-X 1.0 database. You must add the IP addresses of the nodes in your Logstash cluster to the IP address whitelist of your PolarDB-X 1.0 instance. Your Logstash cluster and PolarDB-X 1.0 instance must reside in the same time zone. This prevents inconsistent timestamps during data synchronization. You must make sure that the values of the _id field in your Elasticsearch cluster are the same as the values of the id field in the PolarDB-X 1.0 database. When you insert or update a data record in your PolarDB-X 1.0 database, you must make sure that the data record contains a field that indicates the insertion or update time.	None
Use DataWorks to synchronize offline data	DataWorks is a comprehensive service that provides various modules such as Data Integration, DataStudio, and Data Quality. You can use DataWorks to import and store structured data, convert and develop the data, and then synchronize the processed data to Elasticsearch clusters or other data systems.	You want to synchronize offline big data. DataWorks can collect offline data at a minimum interval of 5 minutes. You want to specify custom query statements, perform joint queries on multiple tables, and then synchronize data. You want to synchronize all data in your PolarDB-X 1.0 database.	You must activate the DataWorks service. If a high transmission speed is required or the network environment is complex, you must customize resource groups. You must add the IP address of the resource group that you use to the IP address whitelist of the PolarDB-X 1.0 instance.	Use DataWorks to synchronize data from a DRDS database to an Elasticsearch cluster in offline mode