All Products
Search
Document Center

Elasticsearch:Synchronize data from MySQL to Alibaba Cloud ES using DataWorks

Last Updated:Nov 24, 2025

You can use Alibaba Cloud Elasticsearch (ES) to perform full-text searches, multi-dimensional queries, or statistical analysis on data from a MySQL database. This topic describes how to use the Data Integration service of DataWorks to synchronize data from a MySQL database to an Alibaba Cloud ES cluster in minutes.

Background information

DataWorks is an end-to-end big data development and governance platform that is built on big data engines. It integrates features such as data development, task scheduling, and data management. You can use sync tasks in DataWorks to quickly synchronize data from various data sources to an Alibaba Cloud ES cluster.

  • Supported data sources include the following:

    • ApsaraDB databases (MySQL, PostgreSQL, SQL Server, MongoDB, and HBase)

    • Alibaba Cloud PolarDB-X (upgraded from the previous DRDS)

    • Alibaba Cloud MaxCompute

    • Alibaba Cloud OSS

    • Alibaba Cloud Tablestore

    • Self-managed versions of Hadoop Distributed File System (HDFS), Oracle, FTP, DB2, and the database types listed above

  • Scenarios:

Prerequisites

Note
  • You can synchronize data only to Alibaba Cloud ES. Self-managed Elasticsearch clusters are not supported.

  • The ApsaraDB RDS for MySQL instance, ES instance, and DataWorks workspace must be in the same region.

  • The ApsaraDB RDS for MySQL instance, ES instance, and DataWorks workspace must be in the same time zone. Otherwise, a time zone difference may exist between the source and destination data when you synchronize time-related data.

Billing

Procedure

Note

This topic uses offline synchronization as an example. To synchronize data in real time, see Real-time synchronization of an entire MySQL database to Elasticsearch.

Step 1: Prepare the source data

Create a database and a table in the ApsaraDB RDS for MySQL instance.

  • You can use an ApsaraDB for RDS database or a self-managed database on a local server. This topic uses an ApsaraDB RDS for MySQL database as an example to demonstrate how to create a database and a table. For more information, see Quick Start.

  • You can use the following statements to create the table and insert data.

    -- create table
    CREATE TABLE `es_test` (
        `id` bigint(32) NOT NULL,
        `name` varchar(32) NULL,
        `age` bigint(32) NULL,
        `hobby` varchar(32) NULL,
        PRIMARY KEY (`id`)
    ) ENGINE=InnoDB
    DEFAULT CHARACTER SET=utf8;
    
    -- insert data
    INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (1,'user1',22,'music');
    INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (2,'user2',23,'sport');
    INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (3,'user3',43,'game');
    INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (4,'user4',24,'run');
    INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (5,'user5',42,'basketball');

Step 2: Purchase and configure a resource group

Purchase a Data Integration resource group and attach a virtual private cloud (VPC) and a workspace to it.

  1. Log on to the DataWorks console.

  2. In the top menu bar, select the region. In the navigation pane on the left, click Resource Groups.

  3. Click Create Resource Group and configure the required parameters.

  4. In the Operations column of the resource group that you created, click Network Settings to attach a VPC to the resource group. For more information, see Network settings.

    The resource group must be able to connect to the VPCs of the ApsaraDB RDS for MySQL and ES instances to synchronize data. You must attach the VPC, Zone, and VSwitch to which the ApsaraDB RDS for MySQL and ES instances belong. To view the VPC information for an instance, see Switch the VPC and vSwitch for an ApsaraDB RDS for MySQL instance and View the basic information of an Elasticsearch instance.

    Important

    After you attach a VPC, you must add the VSwitch CIDR Block of the VPC to the internal-facing whitelists of the ApsaraDB RDS for MySQL and ES instances. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance and Configure a public or internal-facing access whitelist for an ES instance.

  5. In the upper-left corner of the page, click the back icon to return to the Resource Groups page.

  6. In the Operations column of the resource group that you created, click Attach Workspace to attach the target workspace to the resource group.

Step 3: Add data sources

Add the ApsaraDB RDS for MySQL and ES data sources to the Data Integration service of DataWorks.

  1. Go to the Data Integration page in DataWorks.

    1. Log on to the DataWorks console.

    2. In the navigation pane on the left, click Workspaces.

    3. In the Operations column of the target workspace, choose Quick Access > Data Integration.

  2. In the navigation pane on the left, click Data Source.

  3. Add a MySQL data source.

    1. On the Data Source List page, click Add Data Source.

    2. On the Add Data Source page, search for and select MySQL.

    3. In the Add MySQL Data Source dialog box, configure the data source parameters in the Basic Information section.

      For more information about the configuration, see Configure a MySQL data source.

    4. In the Connection Configuration section, click Test Connectivity. If the connectivity status is Connected, the connection is successful.

    5. Click Complete.

  4. Add an ES data source in the same way. For more information about the configuration, see Configure an ES data source.

Step 4: Configure and run an offline data synchronization task

An offline data synchronization task runs using a resource group. The resource group retrieves data from the data source in Data Integration and writes the data to the ES cluster.

Note
  1. Go to the Data Development page in DataWorks.

    1. Log on to the DataWorks console.

    2. In the navigation pane on the left, click Workspaces.

    3. In the Operations column of the target workspace, choose Quick Access > Data Development.

  2. Create an offline sync task.

    1. On the Data Development (image icon) tab in the navigation pane on the left, choose Create > Create Business Flow. Follow the on-screen instructions to create a business flow.

    2. Right-click the business flow that you created and choose Create Node > Data Integration > Offline Synchronization.

    3. In the Create Node dialog box, enter a node name and click Confirm.

  3. Configure the network and resources.

    1. In the Data Source section, set Data Source to MySQL and Data Source Name to the name of the data source from which to synchronize data.

    2. In the My Resource Group section, select a resource group.

    3. In the Data Destination section, set Data Destination to Elasticsearch and Data Source Name to the name of the data source to which to synchronize data.

  4. Click Next.

  5. Configure the task.

    1. In the Data Source section, select the table from which to synchronize data.

    2. In the Data Destination section, configure the parameters for the data destination.

    3. In the Field Mapping section, set the mapping relationship between the Source Field and the Destination Field.

    4. In the Channel Control section, configure the channel parameters.

    For more information about the configuration, see Configure a sync node in the codeless UI.

  6. Run the task.

    1. (Optional) Configure scheduling properties for the task. In the right-side pane, click Scheduling and configure the scheduling parameters as needed. For more information about the parameters, see Scheduling.

    2. In the upper-left corner of the node area, click the save icon to save the task.

    3. In the upper-left corner of the node area, click the commit icon to commit the task.

      If you configured scheduling properties for the task, the task runs automatically on a regular basis. You can also click the run icon in the upper-left corner of the node area to run the task immediately.

      If the log contains Shell run successfully!, it indicates that the task ran successfully.

Step 5: Verify the data synchronization result

  1. Log on to the Kibana console of the destination Alibaba Cloud ES instance. For more information, see Log on to the Kibana console.

  2. In the upper-left corner of the Kibana page, click the icon and choose Dev Tools.

  3. In the Console, run the following command to view the synchronized data.

    POST /es_test/_search?pretty
    {
    "query": { "match_all": {}}
    }
    Note

    Replace es_test with the index name that you specified in the data synchronization task.

    If the data is synchronized successfully, the following result is returned.image..png