All Products
Search
Document Center

DataWorks:Synchronize an entire MySQL database to Elasticsearch

Last Updated:Mar 06, 2026

Data Integration lets you synchronize an entire database from a MySQL Source to Elasticsearch in real time. This topic uses MySQL as the Source and Elasticsearch as the Destination to demonstrate how to configure a real-time synchronization task that combines Full Synchronization with Incremental Synchronization (change data capture or CDC).

Prerequisites

Configure the task

Step 1: Create a synchronization task

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Synchronization Task. At the top of the page, click Create Synchronization Task and configure the task information:

    • Source Type: Select MySQL as the source.

    • Destination Type: Select Elasticsearch as the destination.

    • Specific Type: Select Real-time synchronization for an entire database.

    • Synchronization Mode:

      • Schema Migration: Automatically creates matching index structures (such as indices and field mappings) in the Destination, but does not include data.

      • Full Synchronization (Optional): Copies all historical data from the specified source tables to the Destination.

      • Incremental Sync (Optional): After the full synchronization is complete, continuously captures and synchronizes data changes (inserts, updates, and deletes) from the Source to the Destination.

Step 2: Configure data sources and compute resources

  1. For Source, select your MySQL data source. For Destination, select your Elasticsearch data source.

  2. In the Running Resources section, select the Resource Group for the synchronization task and assign Resource Group CUs to the task.

    Note

    If the task log displays a message like Please confirm whether there are enough resources..., it means the resource group has insufficient compute units (CUs) to start or run the task. To provide more compute resources, increase the CUs allocated to the task in the Resource group configuration panel.

    For recommended resource settings, see Recommended CUs for Data Integration. Adjust the settings based on your actual needs.

  3. Ensure that both the Source and Destination data sources pass the Connectivity Check check.

Step 3: Configure the synchronization solution

1. Configure the data source

  • In the Source databases and tables section, select the source tables to synchronize and click the image icon to move them to the Selected databases and tables list. If you have a large number of databases or tables, you can use the Database Filtering or Table filtering and enter a regular expression to select the required tables.

    image

  • To write data from multiple sharded databases and tables with identical schemas to a single destination index, you can use a regular expression to select tables.

    image
    Enter a regular expression in the source table configuration. DataWorks automatically identifies and collects data from all matching source tables and writes it to the destination index mapped by the expression.



    Note

    This method is suitable for merging and synchronizing sharded tables, similar to database and table sharding synchronization. It improves configuration efficiency by eliminating the need for multiple many-to-one synchronization rules.

2. Destination index mapping

Actions

Description

Refresh

The system automatically lists your selected source tables, but you must refresh the mapping to confirm the destination index attributes.

  • Select the tables to synchronize and click Batch Refresh Mapping.

  • Destination index name: The destination index name is automatically generated based on the Customize Mapping Rules for Destination Table Names rule. The default is ${source_database_name}_${table_name}. If an index with this name does not exist in the Destination, the system automatically creates one.

Custom Mapping Rule for Destination Index Name (Optional)

The system has a default index naming rule: ${source_database_name}_${table_name}. You can also add a custom rule by clicking the Edit button in the Customize Mapping Rules for Destination Table Names column.

  • Rule name: Define a name for the rule. We recommend using a name that clearly reflects its business purpose.

  • Destination index name: Construct the destination index name by clicking the image icon and combining options from Manual Input and Built-in Variable. Supported variables include the source data source name, source database name, and source table name.

  • Edit built-in variables: You can apply string transformations to the built-in variables.

This feature supports the following scenarios:

  1. Add a prefix or suffix: Add a prefix or suffix to the source table name by setting a constant.

    Rule configuration

    Result

    image

    image

  2. Unified string replacement: Replace the string dev_ in source table names with prd_.

    Rule configuration

    Result

    image

    image

  3. Write data from multiple tables to a single index: Set the destination index name to a constant.

    Rule configuration

    Result

    image

    image

Edit field type mapping (Optional)

The system provides a default mapping between Source type and Destination type. You can click Edit Mapping of Field Data Types in the upper-right corner of the table to customize the source-to-destination field type mapping. After configuration, click Apply and Refresh Mapping.

When editing field type mappings, ensure the conversion rules are correct. Incorrect rules can lead to type conversion failures, generate Dirty Data, and affect task execution.

Edit destination index (Optional)

The system uses the index name mapping rule to either create a new destination index or reuse an existing one with the same name.

DataWorks automatically generates the destination index structure based on the source table schema. In most cases, no manual intervention is needed.

When a destination index has a status of To be created, you can add new fields to its structure. Perform the following steps:

  1. Add a field to the destination index

    • Add a field to a single index: Click the image.png icon in the Destination Index Name column and edit the Statement Used to Create Index to add a field.

      • Dynamic Mapping Status: Controls whether new fields from the source table are added to the destination index during data synchronization. Valid values are:

        • true: If a new field is detected in the source table, it is added to the destination index and becomes searchable. This is the default value.

        • false: If a new field is detected in the source table, it is added to the destination index but will not be searchable.

        • strict: If a new field is detected in the source table, the synchronization of that field is rejected, and an error is thrown. You can view the error details in the logs.

        • runtime: If a new field is detected in the source table, it is not added to the destination index mapping. Instead, it is treated as a runtime field at query time, allowing it to be used in script calculations and searches.

        For more information about dynamic mapping, see dynamic.

      • Shards and Replica Shards: The number of primary shards and replica shards for the index. An index is divided into multiple shards and distributed across different Elasticsearch nodes to enable distributed search and improve query efficiency. For more information, see Basic concepts.

        Note

        The Shards and Replica Shards parameters cannot be changed after the index is created. The default value is 1.

    • Add fields in bulk: Select all tables to be synchronized, and at the bottom of the table, choose Batch Modify > Destination Index Structure - Batch Add Fields.

Value assignment

Source fields are automatically mapped if their names match fields in the Destination. You must manually assign values for the newly added fields and destination index properties from the previous step. Perform the following steps:

  • Assign values for a single index: Click the Configuration button in the Value assignment column to assign values.

  • Assign values in bulk: At the bottom of the list, choose Batch Modify > Value assignment to assign values to the same fields across multiple destination indices.

You can assign constants or variables. Switch the type in the Value Type menu. The following options are supported:

  • Destination index field:

    • Manual assignment: Enter a constant value, such as abc.

    • Source field: Use the value of a source table field as the destination field value. You can select either the field value or a time value.

      • Field value: Writes the source field's value directly to the destination.

      • Time value: If the source field contains a time value, you can process it based on different formats and specify a Destination Format to format the extracted value.

        • Time string: A string representing a time, such as "2018-10-23 02:13:56" or "2021/05/18". Parse the string into a time object by specifying the source and destination time formats. For the examples above, you can use formats like yyyy-MM-dd HH:mm:ss and yyyy/MM/dd.

        • Time object: Select this type if the source value is already a time type, such as Date or Datetime.

        • Unix timestamp (seconds): A timestamp in seconds. Also supports numbers or strings in a 10-digit timestamp format, such as 1610529203 or "1610529203".

        • Unix timestamp (milliseconds): A timestamp in milliseconds. Also supports numbers or strings in a 13-digit timestamp format, such as 1610529203002 or "1610529203002".

    • Select variable: Select a system-provided variable as the value source.

    • Function: Use a function to perform a simple transformation on a source field before assigning its value. For usage details, see Use a function expression to assign a value to a destination field.

  • Assign destination index properties: Assign a value to the destination index's primary key, which is used for synchronization. You can concatenate multiple source fields to create a composite primary key. Ensure the resulting value is unique.

Source split column

In the Source Split Column, you can select a field from the source table or select Not Split. When the synchronization task runs, it is split into multiple tasks based on this field to read data concurrently and in batches.

Use the table's primary key as the source split column. String, float, and date types are not supported.

Currently, the source split column is only supported for MySQL sources.

Skip full synchronization

If you enabled Full Synchronization for the task, you can selectively disable it for a specific table. This is useful if the full data for that table has already been synchronized to the destination through other means.

Full condition

Applies a filter to the source data during the full synchronization phase. Enter only the WHERE clause content, without the WHERE keyword.

Configure DML Rule

DML message handling allows you to filter and control the change data (Insert, Update, Delete) captured from the Source before it is written to the Destination. This rule only applies during the incremental synchronization phase.

Step 4: Configure advanced settings

Configure advanced parameters

To fine-tune the task and meet custom synchronization requirements, go to the Advanced Parameters tab to modify advanced parameters.

  1. In the upper-right corner of the page, click Advanced Settings to go to the advanced parameter configuration page.

  2. Modify the parameter values according to the descriptions provided.

  3. You can also use AI-powered configuration. Input a command in natural language, such as adjusting the task concurrency, and the large language model will generate recommended parameter values. You can then choose whether to accept the AI-generated parameters.

    image

Important

Modify these parameters only if you fully understand their purpose. Incorrect settings can cause task delays, excessive resource consumption that blocks other tasks, data loss, or other unexpected issues.

DDL handling configuration

Some real-time synchronization pipelines can detect schema changes (DDL operations) in the source table and notify the destination. The destination can then apply the same changes or take other actions, such as sending an alert, ignoring the change, or terminating the task.

In the upper-right corner of the page, click Configure DDL Capability to set a handling policy for each type of change. The supported policies vary by data channel.

  • Process normally: The Destination processes the DDL change from the Source.

  • Ignore: The change message is ignored, and no modification is made to the Destination.

  • Error: Terminates the real-time synchronization task and sets its status to Error.

  • Alert: An alert is sent to the user when this type of change occurs at the Source. You must configure a DDL notification rule in Configure Alert Rule.

Note

When a new column is added at the Source and synchronized to the Destination via DDL, the system does not backfill data for that column in existing rows of the destination index.

Step 5: Publish and run the task

  1. After completing all configurations, click Save at the bottom of the page to save the task configuration.

  2. Real-time database synchronization tasks cannot be debugged directly and must be published to the Operation Center to run. Therefore, you must Deploy any new or edited task for the changes to take effect.

  3. During publishing, if you select Start immediately after deployment, the task will start automatically. Otherwise, after publishing is complete, you must go to the Data Integration > Synchronization Task page and manually start the task from the Actions column.

  4. Click the Name/ID of the task in the Tasks to view detailed execution information.

Step 7: Other configurations

Alarm configuration

1. Add Alarm

image

(1) Click Create Rule to configure alarm rules.

Set Alert Reason to monitor metrics like Business delay, Failover, Task status, DDL Notification, and Task Resource Utilization for the task. You can set CRITICAL or WARNING alarm levels based on specified thresholds.

  • By setting Configure Advanced Parameters, you can control the time interval for sending alarm messages to prevent alert fatigue and message backlogs.

  • If you select Business delay, Task status, or Task Resource Utilization as the alarm reason, you can also enable recovery notifications to notify recipients when the task returns to normal.

(2) Manage alarm rules.

For created alarm rules, you can use the alarm switch to control whether the alarm rule is enabled. Send alarms to specific recipients based on the alarm level.

2. View Alarm

Expand More > Configure Alert Rule in the task list to enter the alarm event page and view the alarm information that has occurred.

Resource group configuration

You can manage the resource group used by the task and its configuration in the Configure Resource Group panel in the upper-right corner of the interface.

1. View and switch resource groups

  • Click Configure Resource Group to view the resource group currently bound to the task.

  • To change the resource group, switch to another available resource group here.

2. Adjust resources and troubleshoot "insufficient resources" errors

  • If the task log displays a message such as Please confirm whether there are enough resources..., the available computing units (CUs) of the current resource group are insufficient to start or run the task. You can increase the number of CUs occupied by the task in the Configure Resource Group panel to allocate more computing resources.

For recommended resource settings, see Data Integration Recommended CUs. Adjust the settings based on actual conditions.

Advanced parameter configuration

For custom synchronization requirements, click Configure in the Advanced Settings column to modify advanced parameters.

  1. Click Advanced Settings in the upper-right corner of the interface to enter the advanced parameter configuration page.

  2. Modify the parameter values according to the prompts. The meaning of each parameter is explained after the parameter name.

Important

Understand parameters fully before modification to prevent issues like task delays, excessive resource consumption blocking other tasks, or data loss.

Manage the task

Edit the task

  1. On the Data Integration > Synchronization Task page, find the created synchronization task. In the Operation column, click More, then click Edit to modify the task information. The steps are the same as for configuring a new task.

  2. For tasks that are not running, you can directly modify the configuration, save it, and publish the task to the production environment to apply the changes.

  3. For Running tasks, if you edit and publish the task without selecting Start immediately after deployment, the action button changes to Apply Updates. You must click this button for the changes to take effect in the production environment.

  4. When you click Apply Update, the system performs the following actions in sequence: Stop, Publish, and Restart.

    • If the change involves adding a table or switching an existing table:

      You cannot select a Checkpoint when applying the update. After you confirm, the system performs Schema Migration and Full Data Initialization for the new table. Once the full initialization is complete, it begins incremental synchronization along with the other original tables.

    • If you modify other information:

      You can select a Checkpoint when applying the update. After you confirm, the task resumes from the specified Checkpoint. If you do not specify a Checkpoint, it resumes from the last recorded Checkpoint.

    Unmodified tables are not affected. After the update and restart, they will resume from their last recorded Checkpoint.

View the task

After creating a synchronization task, you can view a list of all created tasks and their basic information on the Synchronization Task page.

  • In the Actions column, you can Start or Stop a synchronization task. Under More, you can perform other operations such as Edit and View.image

  • For a running task, you can see its basic status in the Execution Overview. Click a specific area in the overview to view detailed execution information.image

    A real-time database synchronization task from MySQL to Elasticsearch consists of three steps:

    • Schema Migration: Includes the creation method for the destination index (existing index/auto-created index). If the index is auto-created, the DDL statement is displayed.

    • Full Data Initialization: Includes information about the tables being synchronized, their progress, and the number of rows written.

    • Real-time Synchronization: Includes real-time synchronization statistics, such as real-time progress, DDL records, DML records, and alert information.

Rerun the synchronization task

In some special cases, if you add tables to or remove tables from the source, or change the schema or name of a destination table, you can click More in the Operation column of the synchronization task and then click Rerun to rerun the task after the change. During the rerun process, the synchronization task synchronizes data only from the newly added tables to the destination or only from the mapped source table to the destination table whose schema or name is changed.

  • If you want to rerun the synchronization task without modifying the configuration of the task, click More in the Actions column and then click Rerun to rerun the task to perform full synchronization and incremental synchronization again.

  • If you want to rerun the synchronization task after you add tables to or remove tables from the task, click Complete after the change. In this case, Apply Updates is displayed in the Actions column of the synchronization task. Click Apply Updates to trigger the system to rerun the synchronization task. During the rerun process, the synchronization task synchronizes data from the newly added tables to the destination. Data in the original tables is not synchronized again.

Breakpoint resume

Use cases

Manually resetting the Checkpoint when starting or restarting a task is useful in the following scenarios:

  • Task recovery and data continuation: If a task is interrupted, you may need to manually specify the interruption time as the new starting Checkpoint to ensure data resumes accurately from that point.

  • Data troubleshooting and rollback: If you find missing or abnormal data after synchronization, you can roll back the Checkpoint to a time before the issue occurred to reprocess and fix the problematic data.

  • Major task configuration changes: After making significant changes to the task configuration, such as modifying the destination index structure or field mappings, we recommend resetting the Checkpoint to a specific time to ensure data accuracy under the new configuration.

Instructions

When you click Start, choose whether to Whether to reset the site in the pop-up window:

  • Do not reset the Checkpoint: The task resumes from the last stop time (the last Checkpoint).

  • Reset the Checkpoint and select a time: The task starts from the specified time. Ensure the selected time is not earlier than the earliest available time in the source's binary log (binlog).

Important

If you encounter an error indicating that the Checkpoint is invalid or does not exist, try the following solutions:

  • Reset the Checkpoint: When starting the real-time synchronization task, reset the Checkpoint and select the earliest available time from the source database.

  • Adjust the log retention period: If the database Checkpoint has expired, consider increasing the log retention period in the database settings, for example, to 7 days.

  • Resynchronize data: If data has been lost, consider performing a new full synchronization or configure an offline synchronization task to manually synchronize the missing data.

FAQ

For frequently asked questions about real-time database synchronization, see Data Integration FAQ and Data Integration errors.