After you configure data sources, network environments, and resource groups, you can create and run sync nodes. This topic describes how to configure a sync solution to synchronize data to MaxCompute in real time and view the status of the nodes generated by the sync solution.
Configure a sync solution
- Go to the Create Data Synchronization Solution wizard. Select the source and the destination
for data synchronization from the drop-down lists. In this scenario, select MaxCompute
as the destination. After that, select One-click real-time synchronization to MaxCompute from the available sync solutions. For more information, see Select a data synchronization solution.
- In the Set synchronization sources and rules step, configure basic information such
as the solution name for the data synchronization solution.In the Basic configuration section, configure the parameters.
Parameter Description Scheme name The name of the data synchronization solution. The name can be a maximum of 50 characters in length. Description The description of the data synchronization solution. The description can be a maximum of 50 characters in length. Destination task storage location The Automatically establish workflow check box is selected by default. This indicates that DataWorks automatically creates a workflow named in the format of clone_database_Source data source name+to+Destination data source name in the Data Integration directory. All synchronization nodes generated by the data synchronization solution are placed in the directory of this workflow.
If you clear the Automatically establish workflow check box, select a directory from the Select Location drop-down list. All synchronization nodes generated by the data synchronization solution are placed in the specified directory.
- Select a source data source and configure synchronization rules.
- In the Data source section, specify the Type and Data source parameters.Note You can set Type only to MySQL, Oracle, or PolarDB.
- In the Select the source table for synchronization section, select the tables whose data you want to synchronize in the Source Table section, and click the icon to move the tables to the Selected Source table section.The Source Table section displays all the tables in the source data source. You can select all or some tables.Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
- In the Set synchronization rules section, click Add rule and select an option to configure the naming rules for destination tables.Supported options include Conversion Rule for Table Name and Rule for Destination Table name.
- Conversion Rule for Table Name: the rule used to convert the names of source tables to those of destination tables.
- Rule for Destination Table name: the rule used to add a prefix and a suffix to the converted names of destination tables.
- Click Next Step.
- In the Data source section, specify the Type and Data source parameters.
- Configure the destination and the formats for the destination tables.
- In the Set Destination Table step, set the Destination and Write Mode parameters.
- Click the icon next to Time automatic partition setting. In the Edit dialog box, modify the partition settings for the destination tables. You can configure daily partitions.
- Click Refresh source table and MaxCompute Table mapping to create the mappings between the source tables and destination MaxCompute tables.
- View the mapping progress, source tables, and mapped destination tables.
No. Description 1The progress of mapping the source tables to the destination tables.Note The mapping may require a long period of time if you want to synchronize data from a large number of tables. 2 The source of the destination table. Valid values: Create Table and Use Existing Table. 3The name of the destination table. The table name that appears varies based on the value that you selected from the drop-down list in the Table creation method column.
- If you set the Table creation method parameter to Create Table, the name of the destination table that is automatically created appears. You can click the table name to view and modify the table creation statements.
- If you set the Table creation method parameter to Use Existing Table, you must select a table name from the drop-down list in the MaxComputeBase Table name column.
4 If a source table does not have a primary key, an error message appears to remind you that the current source table does not have a primary key and cannot be synchronized. The synchronization can be performed if one of the selected source tables has a primary key. Source tables without primary keys are ignored during the synchronization.
- Click Next Step.
- Configure the resources required by the sync solution. In the Set Resources for Solution Running step, set the parameters that are described in the following table.
Parameter Description Synchronization engine The engine used for data synchronization. Default value: Default embedded engine. Select an exclusive resource group for real-time tasks The exclusive resource group used to run the real-time sync node generated by the sync solution. Select an exclusive resource group from the drop-down list.Note Only exclusive resource groups for Data Integration can be used to run real-time sync nodes. For more information, see Create and use an exclusive resource group for Data Integration. Real-time synchronization task name The name of the real-time sync node. Select scheduling Resource Group The exclusive resource groups used to run the real-time sync node and batch sync node generated by the sync solution. Only exclusive resource groups for Data Integration can be used to run sync nodes for sync solutions. For more information, see Create and use an exclusive resource group for Data Integration. Resource Groups for Full Batch Sync Nodes Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source. Specify an appropriate number based on the resources of the source. Offline task name rules The name of the batch sync node that is used to synchronize the full data of the source. After a sync solution is configured, DataWorks first runs a batch sync node to synchronize full data, and then runs a real-time sync node to synchronize incremental data.
- Click Complete Configuration. The sync solution is configured.
Run the sync solution
On the Tasks page, find the configured sync solution and click Submit and Run in the Operation column to run the sync solution.
View the running status and result of the data synchronization nodes
- On the Task list page, find the solution that is run and choose More > Execution details in the Operation column. Then, you can view the running details of all nodes.
- Find a node whose running details you want to view and click Execution details in the Status column. In the dialog box that appears, click the provided link to go to the DataStudio page.
Manage the data synchronization solution
- View or edit the data synchronization solution.
On the Task list page, find the solution that you want to view or edit, and choose More > Task configuration in the Operation column.Note You can click Task configuration in the Operation column that corresponds to the data synchronization solution in the Not running state to edit the data synchronization solution. If you click Task configuration in the Operation column that corresponds to a data synchronization solution in another state, you can only view information about that data synchronization solution.
- Delete the data synchronization solution.
Find the solution that you want to delete and choose More > Delete in the Operation column. In the Delete message, click OK.Note After you click OK, only the configuration record of the data synchronization solution is deleted. The generated synchronization nodes and tables are not affected.