Create a node to synchronize MaxCompute data with a few clicks - DataWorks

DataWorks allows you to create a node to synchronize MaxCompute data to Hologres with a few clicks on the DataStudio page. This way, you can query data of MaxCompute tables in an accelerated manner. This topic describes how to create a node to synchronize MaxCompute data to Hologres with a few clicks.

Background information

Before you synchronize MaxCompute data to Hologres with a few clicks, you must create an external table in Hologres. The Hologres external table is used to synchronize data from a source MaxCompute table to a Hologres internal table. The schema of the Hologres external table is the same as that of the source MaxCompute table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

The performance of importing data from MaxCompute to Hologres based on SQL statements is higher than the performance of synchronizing data based on external tables. For more information about how to create a Hologres external table, see Create a node to synchronize schemas of MaxCompute tables with a few clicks.

Create a node to synchronize MaxCompute data to Hologres

Go to the DataStudio page.
1. Log on to the DataWorks console.
2. In the left-side navigation pane, click Workspaces.
3. In the top navigation bar, select the region where your workspace resides. Find your workspace and click DataStudio in the Actions column.
Create a workflow.
If you have an existing workflow, skip this step.
1. Move the pointer over the icon and select Create Workflow.
2. In the Create Workflow dialog box, configure the Workflow Name parameter.
3. Click Create.
Create a One-click MaxCompute data synchronization node.
1. Move the pointer over the icon and choose Create Node > Hologres > One-click MaxCompute data synchronization.
  You can also find the desired workflow, right-click the workflow, and then choose Create Node > Hologres > One-click MaxCompute data synchronization.
2. In the Create Node dialog box, configure the Name, Engine Instance, Node Type, and Path parameters.
3. Click Confirm. The configuration tab of the node appears.

Configure the node information.

On the configuration tab of the node, configure the information about the source MaxCompute table from which you want to synchronize data, the information about the destination table where you want to store the synchronized data, the data synchronization policy, and the SQL statement. One-click MaxCompute data synchronization

Configure the parameters in the MaxCompute Source table selection section.

The parameters that you configure in this section determine the source MaxCompute table from which you want to synchronize data. In this section, you must configure the information about the Hologres external table that maps to the source MaxCompute table. The following table describes the parameters.

Parameter	Description
Target connection	The name of the Hologres compute engine instance where the Hologres external table resides.
Target Library	The name of the database where the Hologres external table resides in the Hologres compute engine instance.
External table source	The source of the Hologres external table. The Hologres external table maps to the source MaxCompute table and is used to synchronize the data of the source MaxCompute table to a Hologres internal table. Valid values: External table already exists: You can select this option if the external table that you want to use already exists. If you select this option, you must specify the schema and name of the external table. New external table: You must select this option if no Hologres external table that maps to the source MaxCompute table exists. If you select this option, you must specify the server that is used by the external table, the name of the MaxCompute project to which the source MaxCompute table belongs, and the name of the source MaxCompute table. Note You can use the `odps_server` server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

Configure the parameters in the Target table settings section.

The parameters that you configure in this section are used to create a Hologres internal table where you want to store the synchronized data.

Parameter	Description
Target schema	The `schema` to which the Hologres internal table belongs.
Destination Table Name	The name of the Hologres internal table. If the name of the internal table you specify already exists, Hologres processes the existing internal table based on the following policies: Non-partitioned table: Hologres deletes the existing internal table and creates a new internal table with the same name. Partitioned table: Hologres does not delete the existing internal table. Hologres creates partitions in the table based on partition values and synchronizes data to the new partitions. Note An error is reported if the schema of the created internal table is different from the schema of the existing internal table with the same name.
Target table description	The description of the Hologres internal table.

Configure the parameters in the Synchronization settings section.

The parameters that you configure in this section determine the policy that is used to synchronize MaxCompute data to Hologres.

Parameter	Description
Synchronization field	The fields in the source MaxCompute table from which you want to synchronize data.
Partition configuration	The partitions in the source MaxCompute table from which you want to synchronize data. Note Hologres allows you to synchronize data only from level-1 partitions in the source MaxCompute table. If the source MaxCompute table contains multiple levels of partitions, you must specify the level-1 partition field of the source MaxCompute table for the destination table. Other partition fields in the source MaxCompute table are mapped to common fields in the destination table.
Index configuration	The index for the Hologres internal table that is used to store the synchronized MaxCompute data. You can query data based on the index. For more information about how to create an index, see Overview.

Generate an SQL script.
DataWorks parses the SQL statement that is used to run the current data synchronization node based on the synchronization configurations. You can go to the code editor of Hologres and run the data synchronization node in SQL mode.
Note
- You cannot edit the generated SQL script. If the synchronization configurations of the data synchronization node change, click Refresh to generate a new SQL statement.
- For more information about how to run the data synchronization node in SQL mode, see Import data from MaxCompute to Hologres by executing SQL statements.

Configure scheduling properties for the MySQL node.
If you want the system to periodically run the MySQL node, you can click Properties in the right-side navigation pane to configure scheduling properties for the node based on your business requirements.
- Configure basic properties for the MySQL node. For more information, see Configure basic properties.
- Configure the scheduling cycle, rerun properties, and scheduling dependencies of the MySQL node. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.
  Note You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the MySQL node.
- Configure resource properties for the MySQL node. For more information, see Configure the resource property. If you want to access the MySQL data source over the Internet or a VPC, you must use the exclusive resource group for scheduling that is connected to the MySQL data source to run the MySQL node. For more information, see Establish a network connection between a resource group and a data source.
Save the node configurations and run the node.
1. In the top navigation bar of the configuration tab of the node, click the icon to save the node configurations.
2. In the top navigation bar of the configuration tab of the node, click the icon to synchronize the data of the source MaxCompute table.
If the data synchronization node is created in a workspace in standard mode, you must click Deploy in the top navigation bar to deploy the node to the production environment after you commit the node. For more information, see Deploy nodes.
View the MySQL node.
1. Click Operation Center in the upper-right corner of the configuration tab of the MySQL node to go to Operation Center in the production environment.
2. View the scheduled MySQL node. For more information, see View and manage auto triggered nodes.
To view more information about the node, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.

What to do next

After the data of the source MaxCompute table is synchronized, you can go to the Workspace Tables page in DataStudio to view the data details. For more information, see Manage tables. You can also log on to the Hologres console and query MaxCompute data by using HoloWeb. For more information, see HoloWeb.