This topic describes how to perform an offline data synchronization from an entire MySQL database to Hive. This example uses MySQL as the source and Hive as the destination.
Prerequisites
Purchase a serverless resource group or an exclusive resource group for Data Integration.
Create a MySQL data source and a Hive data source. For more information, see Data source configuration.
Establish a network connection between the resource group and the data sources. For more information, see Overview of network connectivity solutions.
Procedure
1. Select a synchronization task type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the navigation pane on the left, click Synchronization Task. At the top of the page, click Create Synchronization Task to go to the task creation page. Configure the following basic information.
Source and Destination:
MySQL→HiveNew Task Name: Enter a name for the synchronization task.
Synchronization Type:
Entire Database Offline.Synchronization Steps: Select Full Synchronization and Incremental Synchronization.
2. Configure the network and resources
In the Network and Resource Configuration section, select a Resource Group for the synchronization task. You can also specify the number of CUs for Task Resource Usage.
Set Source Data Source to the
MySQLdata source and Destination Data Source to theHivedata source. Then, click Test Connectivity.
After the connectivity tests for the source and destination data sources are successful, click Next.
3. Select the databases and tables to synchronize
In the Source Table area, select the tables to sync from the source data source. Click the
icon to move the tables to the Selected Tables list.

4. Set destination database properties
This operation affects the schema of new tables created by Data Integration. The schemas of existing tables are not affected.
Storage Mode For New Tables: You can select Internal Table or External Table to specify whether the new destination table is an internal or external table.
Format For New Tables: You can select parquet, orc, or txt to specify the storage format of the new destination table.
Write Mode: Determines whether to clear the destination table or retain historical data during the write operation.
Partition Initialization Settings: Determines the initial partition value for a new table. By default, only a hash partition is created. You can click the configuration button to modify this setting.
5. Configure full and incremental synchronization
Configure the full and incremental sync type for the task.
If you select both Full initialization and Incremental synchronization in the Synchronization Mode, the task defaults to a one-time full sync and recurring incremental syncs. This setting cannot be changed.
If you selected Full initialization in the Synchronization Mode, you can configure the task for a one-time full sync or a recurring full sync.
If you select Incremental synchronization in the Synchronization Mode, you can configure the task as a one-time or recurring incremental sync.
NoteThe following steps use a one-time full sync and recurring incremental sync task as an example.
Configure recurring schedule parameters.
If you want the task to run on a recurring schedule, click Configure Scheduling Parameters for Periodical Scheduling.
6. Configure destination table mappings
After you select the tables to sync in the previous step, they are automatically displayed on this page. The destination tables have a status of 'mapping to be refreshed'. You must define the mapping between the source and destination tables, which specifies how data is read from the source tables and written to the destination tables. Then, click Refresh to proceed. You can refresh the mapping immediately or customize the destination table rules first.
Select the tables to sync and click Batch Refresh Mapping. If a mapping rule is not configured, the system applies the default naming rule to destination tables:
${SourceDatabaseName}_${TableName}. If a table with the specified name does not exist in the destination, it is automatically created.Because this task runs on a recurring schedule, you must configure its scheduling properties. These properties include Scheduling Cycle, Effective Date, and Skip Execution. The scheduling configuration for this sync task is the same as the node scheduling configuration in Data Development. For more information, see Node scheduling configuration.
Based on the selected Sync Step, set the Incremental Condition and Full Condition. These conditions apply a WHERE clause to filter the source data. Enter only the content of the clause, not the WHERE keyword. If you enable a recurring schedule, you can use system parameters.
In the Custom Destination Database Name Mapping column, click Configure to customize the destination database naming rule.
You can use built-in variables and manually entered strings to create the destination database name. You can also edit the built-in variables. For example, you can create a new database naming rule that adds a suffix to the source database name to form the destination database name.
In the Customize Mapping Rules column, click Edit to customize the destination table naming rule.
You can use built-in variables and manually entered strings to create the destination table name. You can also edit the built-in variables. For example, you can create a new table naming rule that adds a suffix to the source table name to form the destination table name.
1. Edit mapping of field data types
A sync task maps source field types to destination field types by default. To customize this mapping, click Edit Mapping of Field Data Types in the upper-right corner of the table. After you configure the mapping, click Apply and Refresh Mapping.
2. Edit the destination table schema and assign field values
If a destination table has a status of To Be Created, you can add fields to its schema. Follow these steps:
Add fields to the destination table.
To add a field to a single table, click the
button in the Target Table Name column.To add fields in batches, select all tables to sync. At the bottom of the table, choose .
Assign values to the fields. You can use the following operations to assign values to the fields that you just added.
To assign values to a single table: In the Destination Table Field Assignment column, click Configure.
To assign values in batches, at the bottom of the list, choose to assign values to identical fields across multiple destination tables.
NoteYou can assign constants or variables. Click the
icon to switch between assignment modes.
3. Custom advanced parameters
For fine-grained control over the task, click Configure in the Custom Advanced Parameters column.
Modify these parameters only if you fully understand what they do. Incorrect settings can cause unexpected errors or data quality issues.
4. Set the source chunking column
In the source chunking column, you can select a field from the source table in the drop-down list or select Do Not Chunk.
7. Configure advanced parameters
The sync task provides several parameters that you can modify as needed. For example, you can limit the maximum number of connections to prevent the sync task from exerting too much pressure on your production database.
Modify these parameters only if you fully understand what they do. Incorrect settings can cause unexpected errors or data quality issues.
In the upper-right corner of the page, click Configure Advanced Parameters to go to the advanced parameter configuration page.
On the Configure Advanced Parameters page, modify the parameter values.
8. Configure the resource group
In the upper-right corner of the page, click Resource Group Configuration to view or switch the resource group for the current task.
9. Run the synchronization task
After you finish the configuration, click Complete at the bottom of the page.
On the page, find the created sync task and click Deploy in the Operation column.
In the Tasks, click the Name/ID of the task to view the execution details.
10. Configure alerts
After the task runs, a scheduled job is generated in the Operation Center. To prevent task errors from causing data sync latency, you can set an alarm policy for the sync task.
In the Tasks, find the running sync task. In the Actions column, choose to open the task editing page.
Click Next. Then, click Configure Alert Rule in the upper-right corner of the page to open the alarm settings page.
In the Scheduling Information column, click the scheduled job to open the task details page in the Operation Center and retrieve the Task ID.
In the navigation pane on the left of the Operation Center, choose to go to the Rule Management page.
Click Create Custom Rule and set Rule Object, Trigger Condition, and Alert Details. For more information, see Rule management.
In the Rule Object field, search for the target task using the obtained Task ID and set an alert.
Synchronization task O&M
View task status
After you create a synchronization task, you can view the list of created tasks and their basic information on the synchronization task page.

In the Actions column, you can Start or Stop a sync task. Under More, you can perform other operations, such as Edit and View.
In the Execution Overview section, you can view the basic status of a running task and click the corresponding area to view its execution details.

For an offline synchronization task from MySQL to Hive:
If the synchronization step for your task is Full Synchronization, schema migration and full synchronization are displayed.
If the synchronization step for your task is Incremental Synchronization, schema migration and incremental synchronization are displayed.
If the synchronization step for your task is Full Synchronization + Incremental Synchronization, this section displays schema migration, full synchronization, and incremental synchronization.
Rerun a task
Click Rerun to rerun the task without changing the task configuration.
Effect: This operation reruns a one-time task or updates the properties of a recurring task.
To rerun a task after modifying it by adding or removing tables, edit the task and click Complete. The task status then changes to Apply Update. Click Apply Update to immediately trigger a rerun of the modified task.
Effect: Only the new tables are synced. Tables that were previously synced are not synced again.
After you edit a task (for example, by changing a destination table name or switching to a different destination table) and click Complete, the available operation for the task changes to Apply Update. Click Apply Update to immediately trigger a rerun of the modified task.
Effect: The modified tables are synced. Unmodified tables are not synced again.
Use cases
If you have downstream data dependencies and need to perform data development operations, you can set upstream and downstream dependencies for the node as described in Node scheduling configuration. You can view the corresponding recurring task node information in the Scheduling Configuration column.
