All Products
Search
Document Center

Elasticsearch:Use DataWorks to synchronize data from RDS SQL Server to Alibaba Cloud ES

Last Updated:Aug 15, 2025

You can use Alibaba Cloud Elasticsearch to search and analyze data in RDS SQL Server. This topic describes how to use the Data Integration service of DataWorks to synchronize data offline to an Alibaba Cloud ES instance.

Background information

DataWorks is an end-to-end big data development and governance platform based on big data compute engines. DataWorks provides features such as data development, task scheduling, and data management. You can create synchronization tasks in DataWorks to rapidly synchronize data from various data sources to Alibaba Cloud Elasticsearch.

  • The following types of data sources are supported:

    • Alibaba Cloud databases: ApsaraDB RDS for MySQL, ApsaraDB RDS for PostgreSQL, ApsaraDB RDS for SQL Server, ApsaraDB for MongoDB, and ApsaraDB for HBase

    • Alibaba Cloud PolarDB for Xscale (PolarDB-X) (formerly DRDS)

    • Alibaba Cloud MaxCompute

    • Alibaba Cloud Object Storage Service (OSS)

    • Alibaba Cloud Tablestore

    • Self-managed databases: HDFS, Oracle, FTP, Db2, MySQL, PostgreSQL, SQL Server, MongoDB, and HBase

  • The following synchronization scenarios are supported:

Prerequisites

Note
  • Data can be synchronized only to Alibaba Cloud ES. Self-managed Elasticsearch instances are not supported.

  • The ApsaraDB RDS for SQL Server instance, ES instance, and DataWorks workspace must be in the same region.

  • The ApsaraDB RDS for SQL Server instance, ES instance, and DataWorks workspace must be in the same time zone. Otherwise, a time zone difference may occur when you synchronize time-related data.

Billing

Procedure

Step 1: Prepare the source data

The following test data is used in the ApsaraDB RDS for SQL Server instance.

Note

The data in this topic is for testing only. Do not use it in a production environment.

测试数据

  • CREATE TABLE statement

    CREATE TABLE students(
        id INT,
        name VARCHAR(20),
        age INT
    )
  • INSERT statement

    INSERT INTO [students]( [id] , [name] , [age] )values('1', 'Xiaoming','21');

Step 2: Purchase and configure an exclusive resource group

Purchase an exclusive resource group for Data Integration and attach a VPC and a workspace to the resource group. Exclusive resource groups ensure fast and stable data transmission.

  1. Log on to the DataWorks console.

  2. In the top menu bar, select a region. In the left navigation pane, click Resource Group.

  3. On the Exclusive Resource Groups tab, click Create Legacy Resource Group > Data Integration Resource Group.

  4. On the DataWorks Exclusive Resource (Subscription) purchase page, set Exclusive Resource Type to Exclusive Resource for Data Integration, enter a name for the resource group, and then click Buy Now.

    For more configuration information, see Step 1: Purchase a resource group.

  5. To attach a VPC to the exclusive resource group, click Network Settings in the Operation column. For more information, see Attach a VPC.

    Note

    This topic uses an example in which data is synchronized over a VPC internal network using an exclusive resource group for Data Integration. For more information, see Add a whitelist.

    The exclusive resource group must be connected to the VPCs of the RDS SQL Server and Elasticsearch instances to synchronize data. To establish this connection, attach the VPC, Zone, and VSwitch where the RDS SQL Server and Elasticsearch instances reside. To view the VPC information for the instances, see Switch the vSwitch for an ApsaraDB RDS for SQL Server instance and View the basic information of an Elasticsearch instance.

    Important

    After you attach the VPC, you must add the VSwitch CIDR Block to the internal-facing whitelists for the RDS SQL Server and Elasticsearch instances. For more information, see Configure a whitelist for an ApsaraDB RDS for SQL Server instance and Configure a public or internal-facing whitelist for an Elasticsearch instance.

  6. In the upper-left corner of the page, click the back icon to return to the Resource Group List page.

  7. In the Operation column for the exclusive resource group, click Attach Workspace to attach the target workspace.

    For more information, see Step 2: Attach a workspace.

Step 3: Add data sources

Add the ApsaraDB RDS for SQL Server and Elasticsearch data sources to the Data Integration service of DataWorks.

  1. In DataWorks, go to the Data Integration page.

    1. Log on to the DataWorks console.

    2. In the left navigation pane, click Workspaces.

    3. In the Operation column of the target workspace, choose Quick Access > Data Integration.

  2. In the navigation pane, click Data Source.

  3. Add an ApsaraDB RDS for SQL Server data source.

    1. On the Data Source List page, click Add Data Source.

    2. On the Add Data Source page, search for and select the SQL Server data source.

    3. In the Add SQL Server Data Source dialog box, configure the parameters in the Basic Information area.

      For more information, see Configure a SQL Server data source.

    4. In the Connection Configuration area, click Test Connectivity. A status of Connected indicates a successful connection.

    5. Click Finish.

  4. Add an Elasticsearch data source in the same way. For more information, see Configure an Elasticsearch data source.

Step 4: Configure and run an offline data sync task

The offline sync task uses the exclusive resource group to run. The exclusive resource group retrieves data from the data source in Data Integration and writes the data to Elasticsearch.

Note

You can configure an offline sync task in two ways. This topic uses the codeless UI as an example. You can also use the code editor. For more information, see Configure an offline sync task using the code editor and Elasticsearch Writer.

  1. Go to the Data Development page in DataWorks.

    1. Log on to the DataWorks console.

    2. In the left navigation pane, click Workspaces.

    3. In the Operation column of the target workspace, choose Quick Access > Data Development.

  2. Create an offline sync task.

    1. In the navigation pane on the left, click the image icon and choose New > New Business Flow.

    2. Right-click the new business flow and choose New Node > Offline Synchronization.

    3. In the New Node dialog box, enter a node name and click Confirm.

  3. Configure the network and resources.

    1. In the Data Source area, set Data Source to SQL Server and select a data source for Data Source Name.

    2. In the My Resource Groups area, select the exclusive resource group.

    3. In the Data Destination area, set Data Destination to Elasticsearch and select a data source for Data Source Name.

  4. Click Next.

  5. Configure the task.

    1. In the Data Source area, select the table to sync.

    2. Configure the parameters in the Data Destination area.

    3. In the Field Mapping area, configure the mapping between the Source Field and the Destination Field. For more information, see Configure an offline sync task using the codeless UI.

    4. In the Channel Control area, you can configure the channel parameters.

    For more configuration information, see Configure an offline sync task using the codeless UI.

  6. Run the task.

    1. (Optional) In the right pane, click Scheduling Configuration to configure the scheduling parameters for the task. For more information about each parameter, see Scheduling configuration.

    2. In the upper-right corner of the node configuration tab, click the save icon to save the task.

    3. In the upper-right corner of the node configuration tab, click the submit icon to submit the task.

      If you configured scheduling properties, the task runs automatically at the specified intervals. You can also click the run icon in the upper-right corner of the node configuration tab to run the task immediately.

      If the log contains the message Shell run successfully!, the task ran successfully.

Step 5: Verify the data synchronization result

  1. Log on to the Kibana console of the destination Alibaba Cloud ES instance. For more information, see Log on to the Kibana console.

  2. In the top-left corner of the Kibana page, click the icon and select Dev Tools.

  1. In the Console, you can run the following command to view the synchronized data.

    POST /dbo.students/_search?pretty
    {
       "query": { "match_all": {}}
    }
    Note

    Replace dbo.students with the value that you specified for the index parameter when you configured data synchronization in DataWorks.

    The expected result is shown in the following figure.预期结果