Sync MySQL Database to OSS Data Lake in Real Time - DataWorks

Data Integration lets you run a single task that migrates all historical data from MySQL to Object Storage Service (OSS) and then keeps the destination in sync as new changes arrive—no manual handoff between full and incremental jobs required. The destination data can be stored in Apache Hudi, Apache Paimon, or Apache Iceberg format, and the task automatically creates tables in Data Lake Formation (DLF) when DLF is activated in the same region.

Prerequisites

Before you begin, ensure that you have:

A Serverless resource group or an exclusive resource group for Data Integration purchased
A MySQL data source and an ApsaraDB for OceanBase data source created. For setup instructions, see Create a data source for Data Integration
Binary logging enabled for the MySQL data source. For setup instructions, see Prepare a MySQL environment
Network connectivity established between the resource group and your data sources. For solutions, see Network connectivity solutions

Limitations

Data constraints

Primary key column values in the source cannot be NULL or empty strings. Rows that violate this constraint cause the synchronization task to fail.

DDL constraints

Supported DDL operations: table creation and column addition. Destination tables and columns are adjusted automatically when the source schema changes.
Unsupported DDL operations: table deletion and table renaming. If either operation is performed on a source table, the synchronization task fails immediately.

DLF constraints

The task can create metadatabases and metatables in DLF automatically, but only when DLF is activated in the same region as your workspace. Cross-region metadatabase creation is not supported.

Synchronize MySQL to OSS

The procedure consists of 10 steps: select a task type, configure network and resources, configure destination settings, select source tables, configure destination table mapping, configure alerts, configure advanced parameters, configure DDL capabilities, configure resource groups, and run the task.

Step 1: Select a synchronization task type

Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the target workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Sync Task, then click Create Sync Task at the top of the page.

Configure the following basic settings:

Field	Value
Source and destination	`MySQL` → `OSS`
Task name	A name of your choice
Synchronization type	`Real-time Database Synchronization`
Synchronization steps	Select both Full Synchronization and Incremental Synchronization

Step 2: Configure network and resources

In the Network And Resource Configuration section, select the Resource Group for the task. Optionally, allocate the number of CUs under Task Resource Usage.
For Source Data Source, select the MySQL data source. For Destination Data Source, select the OSS data source. Click Test Connectivity.
After both data sources pass the connectivity test, click Next.

Step 3: Configure destination settings

Configure the following fields:

Write Format: Select the table format for the data lake — Hudi, Paimon, or Iceberg.
Storage Path Selection: Select the OSS path where synchronized data will be stored.
Location For Creating Metadatabase: Choose whether to let the system create a metadatabase in DLF.
Metadatabases can be created automatically only in DLF that is activated in the same region as your workspace.

Step 4: Select tables to synchronize

Select source tables using one of two methods:

Manual selection

In the Source Table list, select the tables to synchronize and click the icon to move them to Selected Tables.

Use Database Filter and Table Filter to narrow the list. To remove a table from the selection, select it in Selected Tables and click the icon to move it back.

Regular expression selection

Enter patterns in Database Filter and Table Filter to match tables dynamically. This method also supports adding or removing tables at runtime as the source schema evolves.

For example, to match databases whose names start with a and tables whose names start with order, enter a.* in Database Filter and order.* in Table Filter. Click Confirm Selection when done.

Step 5: Configure destination table mapping

After you select source tables, they appear in the Mapping Rules for Destination Tables section with their mapping status set to pending. Define the mapping between each source table and its destination table, then click Refresh in the Actions column to apply the mapping.

To apply the default table name rule (${tableName}) across all selected tables, click Batch Refresh Mapping. If a table with the same name does not exist at the destination, it is created automatically.
To customize the destination database name, click Configure in the Custom Destination Database Name Mapping column and build the name from built-in variables and literal strings—for example, appending a suffix to the source database name.
To customize destination table names, click Edit in the Customize Mapping Rules for Destination Table Names column and define a rule using built-in variables and strings.

Modify data type mappings

Default data type mappings exist between source and destination fields. To override them, click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section. After you make changes, click Apply and Refresh Mapping.

Add fields to destination tables

If a destination table is in the To Be Created state, you can add fields before the task runs:

Add fields:
- Single table: Find the table in Destination Table Name and click the icon. Add fields in the dialog box.
- Multiple tables at once: Select the tables, click Batch Modify at the bottom of the page, then click Destination Table Schema - Batch Modify and Add Field.
Assign values to the added fields:
- Single table: Find the table and click Configure in the Value assignment column. Assign values in the Additional Field dialog box.
- Multiple tables at once: Select the tables, click Batch Modify, then click Value assignment.
Click the icon to switch the value assignment method between constants and variables.

Configure DML processing rules

Data Integration applies default DML processing rules. To customize them:

Single table: Click Configure in the Configure DML Rule column for the target table.
Multiple tables at once: Select the tables, click Batch Modify, then click Configure DML Rule.

Step 6: Configure alerts

To get notified when the task encounters issues that could delay data synchronization:

Click Configure Alert Rule in the upper-right corner of the page.
In the Alert Rule Configurations for Real-time Synchronization Subnode panel, click Add Alert Rule and fill in the parameters.
Enable or disable individual alert rules and assign recipients by severity level.

Alert rules configured here apply to the real-time synchronization subtask generated by this task. After the task is saved, you can also manage alert rules from the Real-time Synchronization Task page. For details, see Run and manage real-time synchronization tasks.

Step 7: Configure advanced parameters

To tune task behavior—for example, setting Maximum read connections to limit the load on the source database:

Click Configure Advanced Parameters in the upper-right corner of the page.
In the Configure Advanced Parameters panel, update the values.

To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of the parameters before you change the values of the parameters.

Step 8: Configure DDL capabilities

If DDL operations such as CREATE TABLE or ADD COLUMN may occur on the source during synchronization, define how the task handles them:

Click Configure DDL Capability in the upper-right corner of the page.
Set the processing rules for DDL messages from the source.

For the full list of configurable rules, see Configure rules to process DDL messages.

Step 9: Configure resource groups

To review or change the resource groups used by this task, click Configure Resource Group in the upper-right corner of the page.

Step 10: Run the task

Click Complete at the bottom of the page to save the task.
On the Data Integration page, find the task in the Synchronization Task section and click Start in the Operation column.
Click the Name/ID of the task to view the detailed execution progress.

Monitor and manage the task

View task status

After the task starts, its status appears in the Synchronization Task list. Click the Execution Overview area for a real-time summary.

The task runs through three sequential phases:

Phase	Description
Structure migration	Creates destination tables—either from existing tables or by generating DDL statements automatically
Full initialization	Performs a one-time export of all selected tables, reporting progress and record count
Real-time data synchronization	Continuously replicates incremental changes, tracking DML records, DDL records, and any alerts

Use the Operation column to Start or Stop the task, or open the More menu to Edit, View, or perform other management operations.

Rerun the task

Use Rerun in two scenarios:

Rerun without configuration changes

Click More > Rerun to repeat full synchronization and incremental synchronization from the beginning.

Rerun after adding or removing tables

In some special cases, if you add tables to or remove tables from the source, or change the schema or name of a destination table, you can rerun the task after the change.

Update the task configuration with the new table selection and click Complete.
In the Operation column, click Apply Updates.

During this rerun, the synchronization task synchronizes data only from the newly added tables to the destination, or only from the mapped source table to the destination table whose schema or name is changed. Data in tables that were already synchronized is not re-synced.