After you configure data sources, network environments, and resource groups, you can create and run a data synchronization solution. This topic describes how to create a data synchronization solution and view the running status of the nodes generated by the solution.
Create a data synchronization solution
- Go to the Data Integration page and choose to go to the Task list page.For more information, see Go to the Sync Solutions page.
- On the Task list page, click New task in the upper-right corner.
- In the New synchronization solution dialog box, click One-click real-time synchronization to MaxCompute.
- In the Set synchronization sources and rules step, configure basic information such
as the solution name for the data synchronization solution.In the Basic configuration section, configure the parameters.
Parameter Description Scheme name The name of the data synchronization solution. The name can be a maximum of 50 characters in length. Description The description of the data synchronization solution. The description can be a maximum of 50 characters in length. Destination task storage location The Automatically establish workflow check box is selected by default. This indicates that DataWorks automatically creates a workflow named in the format of clone_database_Source data source name+to+Destination data source name in the Data Integration directory. All synchronization nodes generated by the data synchronization solution are placed in the directory of this workflow.
If you clear the Automatically establish workflow check box, select a directory from the Select Location drop-down list. All synchronization nodes generated by the data synchronization solution are placed in the specified directory.
- Select a source data source and configure synchronization rules.
- In the Data source section, specify the Type and Data source parameters.Note You can set Type only to MySQL, Oracle, or PolarDB.
- In the Select the source table for synchronization section, select the tables whose data you want to synchronize in the Source Table section, and click the icon to move the tables to the Selected Source table section.The Source Table section displays all the tables in the source data source. You can select all or some tables.Notice If a selected table does not have a primary key, the table cannot be synchronized in real time.
- In the Set synchronization rules section, click Add rule and select an option to configure the naming rules for destination tables.Supported options include Conversion Rule for Table Name and Rule for Destination Table name.
- Conversion Rule for Table Name: the rule used to convert the names of source tables to those of destination tables.
- Rule for Destination Table name: the rule used to add a prefix and a suffix to the converted names of destination tables.
- Click Next Step.
- In the Data source section, specify the Type and Data source parameters.
- Select the destination data source and configure formats for destination tables.
- In the Set Destination Table step, specify Target MaxCompute (ODPS) data source and Write mode.
- Click the icon next to MaxCompute (ODPS) time automatic partition settings. In the Edit dialog box, modify the partition settings for the destination tables. You can configure daily and hourly partitions.
- Click Refresh source table and MaxCompute (ODPS) Table mapping to configure the mappings between the source tables and destination MaxCompute tables.
- View the mapping progress, source tables, and mapped destination tables.
Serial number Description 1 The mapping progress between the source and destination tables.Note The mapping may require a long time if a large number of source tables need to be synchronized. 2 The source of the destination table. Valid values: Create Table and Use existing Table. 3 The name of the destination table. The information that appears here varies based on the value that you selected from the drop-down list in the Table creation method column.
- If you set Table creation method to Create Table, the name of the destination table that is automatically created appears. You can click the table name to view and modify the table creation statements.
- If you set Table creation method to Use existing Table, you must select a table from the drop-down list in the MaxCompute (ODPS) Base Table name column.
4 An error message appears if the selected source table does not have a primary key. The synchronization can be performed only for source tables that have a primary key. Source tables without primary keys are ignored during the synchronization.
- Click Next Step.
- Configure resources required for the data synchronization solution.In the Run resource settings step, configure the parameters.
Parameter Description Synchronization engine The engine used for data synchronization. Default value: Default embedded engine. Select an exclusive resource group for real-time tasks The exclusive resource group used to run the real-time synchronization node generated by the solution. Select an exclusive group from the drop-down list.Note Only exclusive resource groups for data integration can be used to run real-time synchronization nodes. For more information, see Create and use an exclusive resource group for Data Integration. Real-time synchronization task name The name of the real-time synchronization node. Select scheduling Resource Group The exclusive resource group used to run the real-time synchronization node and batch synchronization node generated by the data synchronization solution. Only exclusive resource groups for data integration can be used to run solutions. For more information, see Create and use an exclusive resource group for Data Integration. Resource Groups for Full Batch Sync Nodes Maximum number of connections supported by source read The maximum number of Java Database Connectivity (JDBC) connections that are allowed for the source database. Specify an appropriate number based on the resources of the source database. Offline task name rules The name of the batch synchronization node that is used to synchronize the full data of the source data source. After a data synchronization solution is created, DataWorks first generates a batch synchronization node to synchronize full data, and then generates a real-time synchronization node to synchronize incremental data.
- Click Complete configuration. The data synchronization solution is created.
Run the data synchronization solution
On the Task list page, find the created data synchronization solution and click Submit execution in the Operation column to run the data synchronization solution.
View the running status and result of the data synchronization nodes
- On the Task list page, find the solution that is run and choose More > Execution details in the Operation column. Then, you can view the running details of all nodes.
- Find a node whose running details you want to view and click Execution details in the Status column. In the dialog box that appears, click the provided link to go to the DataStudio page.
Manage the data synchronization solution
- View or edit the data synchronization solution.
On the Task list page, find the solution that you want to view or edit, and choose More > Task configuration in the Operation column.Note You can click Task configuration in the Operation column that corresponds to the data synchronization solution in the Not running state to edit the data synchronization solution. If you click Task configuration in the Operation column that corresponds to a data synchronization solution in another state, you can only view information about that data synchronization solution.
- Delete the data synchronization solution.
Find the solution that you want to delete and choose More > Delete in the Operation column. In the Delete message, click OK.Note After you click OK, only the configuration record of the data synchronization solution is deleted. The generated synchronization nodes and tables are not affected.