Data Integration supports the offline synchronization of entire databases from sources such as AnalyticDB for MySQL 3.0, MySQL, Oracle, PolarDB, and PostgreSQL to OSS. This topic describes how to synchronize data from an entire MySQL database to an OSS data lake offline, using MySQL as the source and OSS as the destination.
Prerequisites
You have purchased a Serverless resource group or an exclusive resource group for Data Integration.
You have created a MySQL data source and an OSS data source. For more information, see Data Source Configuration.
NoteYou must enable the binary logging (binlog) feature. For more information, see MySQL data source.
You have established a network connection between the resource group and the data source. For more information, see Network connectivity solutions.
Procedure
1. Select a sync task type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the navigation pane on the left, click Sync Task. At the top of the page, click Create Sync Task to open the sync task creation page. Configure the basic information.
Source And Destination:
MySQL→OSSNew Task Name: Enter a custom name for the sync task.
Synchronization Type:
Offline Full Database.Synchronization Steps: Select Full Synchronization and Incremental Synchronization.
2. Configure network and resources
In the Network And Resource Configuration section, select a Resource Group for the sync task. You can allocate CUs for Task Resource Usage.
For Source Data Source, select your
MySQLdata source. For Destination Data Source, select yourOSSdata source. Then, click Test Connectivity.
After you confirm that the source and destination data sources are connected, click Next.
3. Configure basic destination settings
The offline full database synchronization to OSS supports multiple write formats, such as Paimon, Iceberg, CSV, text, Parquet, and ORC.
The configuration parameters vary based on the write format. Configure the parameters as required:
Paimon, Iceberg
Parameter | Description |
Storage Path Selection | Select the OSS path where the data is stored after it is ingested into the data lake. |
Select Metadatabase Auto-build Location | Select whether to automatically build a metadatabase in DLF. Note You can only build a metadatabase in a DLF instance that is in the same region. |
CSV, text
Parameter | Description |
Destination Root Path | Format: Note When you use the scheduling parameter |
Column Delimiter | The character that separates fields in your data, such as a comma (,). If the separator is not visible, enter its Unicode encoding, such as |
Prefix Conflict | When writing data, if the destination object has the same prefix as the object to be written, you can perform one of the following operations:
|
Output Table Header | Specify whether to output the table header as content. |
Parquet, ORC
Parameters to configure | Description |
Destination Root Path | Format: Note When you use the scheduling parameter |
Prefix Conflict | When writing data, if the destination object has the same prefix as the object to be written, you can perform one of the following operations:
|
4. Select the databases and tables to synchronize
In this step, you can select the tables that you want to synchronize from the source data source in the Source Database and Tables section, and click the
icon to move them to the Selected Database and Tables section on the right.

5. Configure full and incremental control
Configure the synchronization mode for the synchronization task.
If you selected both Full Synchronization and Incremental Synchronization in the Synchronization Steps, the default mode is one-time synchronization of full data and periodic synchronization of incremental data, which cannot be changed.
If you selected Full Synchronization in the Synchronization Steps, you can choose whether to perform one-time synchronization of full data or periodic synchronization of full data.
If you selected Incremental Synchronization in the Synchronization Steps, you can choose whether to perform one-time synchronization of incremental data or periodic synchronization of incremental data.
NoteIn this example, the synchronization mode of one-time synchronization of full data and periodic synchronization of incremental data is used.
Configure the parameters for periodic scheduling for the synchronization task.
If your task involves periodic synchronization, you can click Scheduling Parameters to configure the settings.
6. Configure destination table mapping
After you select the tables to synchronize, they are automatically displayed on the current page. By default, the properties of the object files have a status of 'mapping to be refreshed'. You must define and confirm the mapping between the source tables and object files, which defines the data read and write relationship. You can refresh the mapping directly or customize the object file rules before you click Refresh Mapping to proceed.
You can select the tables to synchronize and click Batch Refresh Mapping. If no mapping rule is configured, the default naming convention for the destination OSS object is
${Source Table Name}/data_${Data Timestamp}.Because recurring scheduling is required, you must define the properties for the recurring scheduling task. These properties include Scheduling Cycle, Rerun Property, and Scheduling Resource Group. The scheduling configuration for this synchronization is consistent with the node scheduling configuration in Data Studio. For more information about the parameters, see Node scheduling.
Depending on the Synchronization Steps, you need to set the Incremental Condition and Full Condition to filter the source by using a WHERE clause. You need to write only the WHERE clause without the WHERE keyword. If you configure settings for periodic scheduling for the synchronization task, you can use the system parameter variables.
In the Custom Destination Path Mapping and Custom Destination Filename Mapping columns, click Configure to customize the storage path and naming convention for the destination OSS objects. For more information, see Appendix: Description of destination OSS file paths and names.
1. Edit field type mapping
The synchronization task has default mappings between source field types and destination field types. You can click Edit Field Type Mapping in the upper-right corner of the table to customize the field type mapping relationship between source tables and destination tables. After you complete the configuration, click Apply And Refresh Mapping.
2. Add fields to the object file and assign values
You can add new fields to the object file that are not in the original table schema. To do this, perform the following steps:
Add a field and assign a value for a single table: Click Configure in the Add Field To Object File column. On the Add Field page, click Add Field to add a field to the object file and assign a value to it.
Assign values in batches: Select multiple tables. At the bottom of the list, choose to add the same field to the destination tables and assign values in batches.
NoteYou can assign constants and variables. Click the
icon to switch the assignment mode.
3. Customize advanced parameters
If you need to fine-tune the task to meet custom synchronization requirements, you can click Configure in the Custom Advanced Parameters column to modify the advanced parameters.
Before you modify the configurations of advanced parameters, make sure that you understand the meanings of the parameters to prevent unexpected errors or data quality issues.
4. Set the source chunking column
In the source chunking column, you can select a field from the source table in the drop-down list or select Do Not Chunk.
7. Configure advanced parameters
You can change the values of specific parameters configured for the synchronization task based on your business requirements. For example, you can specify an appropriate value for the Maximum read connections parameter to prevent the current synchronization task from imposing excessive pressure on the source database and data production from being affected.
To prevent unexpected errors or data quality issues, make sure that you understand the meanings of the parameters before you change the values of the parameters.
Click Advanced Parameter Settings in the upper-right corner of the page to go to the advanced parameter settings page.
Modify the parameter values on the Advanced Parameter Settings page.
8. Configure the resource group
You can click Resource Group Configuration in the upper-right corner of the page to view and switch the resource group used by the current task.
9. Run the sync task
After you complete all configurations, click Complete Configuration at the bottom of the page.
On the page, find the created synchronization task and click Start in the Actions column.
Click Task List and then click the Name/ID of the corresponding task to view the detailed execution procedure of the task.
10. Configure alerts
To prevent the failure of the synchronization task from causing latency on business data synchronization, you can configure different alert rules for the synchronization task.
Click Alert Configuration in the upper-right corner of the page to go to the alert settings page.
Select the scheduling task for the synchronization table and configure alerts for it. For more information, see Alert information.
Sync task O&M
View task running status
After you create a sync task, you can view the list of created sync tasks and their basic information on the sync task page.

In the Operation column, you can Start or Stop the sync task. From the More menu, you can perform other operations, such as Edit and View.
For a running task, you can view its basic running status in the Execution Overview section. You can also click the corresponding overview area to view execution details.

In an offline full database synchronization task from MySQL to OSS:
If your task's synchronization step is Full Synchronization, schema migration and full synchronization are displayed.
If your task's synchronization step is Incremental Synchronization, schema migration and incremental synchronization are displayed.
If your task's synchronization steps are Full Synchronization + Incremental Synchronization, schema migration, full synchronization, and incremental synchronization are displayed.
Rerun the task
Directly rerun the task: Do not modify the task configuration and directly click Rerun.
Effect: Rerun a one-time task or update the properties of a periodic task.
Modify the task and rerun the task (add or remove tables): Edit the task, add or remove tables, and click Complete. The action for the task changes to Apply Update. Click Apply Update to directly trigger the modified task to rerun.
Effect: Only the newly added tables are synchronized. The tables that have been synchronized are not synchronized again.
Modify the task and rerun the task: Modify the names of the destination Hive tables or use other Hive tables for data synchronization and click Complete. The action for the task changes to Apply Update. Click Apply Update to directly trigger the modified task to rerun.
Effect: Synchronize the modified tables. The tables that are not modified are not synchronized again.
Use cases
If you have downstream data dependencies and need to perform data development operations, you can refer to Node scheduling to set the upstream and downstream nodes. The corresponding auto-triggered task node information can be viewed in the Recurring Configuration column.

Appendix: Description of final destination OSS file paths and names
DataWorks Data Integration provides custom rules for mapping the destination OSS path and destination OSS filename during step 6. Configure destination table mapping.
Built-in custom rule for mapping the destination OSS path:
default_path_convert_rule.This rule uses the source database name as the destination OSS path. For example, if the source database name is di_ide_yufa, this name is used as the destination path in OSS. The storage path in OSS is di_ide_yufa.
Two built-in custom rules are available for mapping the destination OSS filename:
default_file_convert_rule_with_schedule_params: This rule is defined as${srcTableName}/data_${bizdate}. The source table name${srcTableName}is used as part of the OSS path, and the object file is nameddata_followed by the value of the scheduling parameter${bizdate}.NoteFor example, if the source table name is base_c_app_config and the scheduling date value is 20230101, the generated destination object name in OSS is base_c_app_config/data_20230101.
default_file_convert_rule: This rule is defined as${srcTableName}/data. The source table name${srcTableName}is used as part of the OSS path, and the default object file name isdata.NoteFor example, if the source table name is base_c_app_config, the converted destination object name is base_c_app_config/data.
The final OSS file write path and filename are formed by concatenating the following three parts.
The destination root path.

The object file path that is obtained from the custom destination OSS path mapping.
The object file name that is obtained from the custom destination OSS filename mapping rule.
