DataWorks provides the one-click MaxCompute data synchronization node to synchronize data from MaxCompute to a Hologres database. This enables you to efficiently query MaxCompute table data in Hologres. This topic describes how to use the one-click MaxCompute data synchronization node.
Background
You can also import data directly from MaxCompute to Hologres using SQL statements, which typically offer better performance. For more information, see Import data from MaxCompute by using SQL statements.
Prerequisites
-
A MaxCompute project and a Hologres instance are created.
-
The MaxCompute project and Hologres instance are bound as a DataWorks computing resource, and the connectivity test is successful.
Create a one-click MaxCompute data synchronization node
You have created a one-click MaxCompute data synchronization node.
Configure the synchronization node
Go to the editor page for the one-click MaxCompute data synchronization node to configure its parameters.
Select a source MaxCompute table
Configure the following parameters for the source table.
|
Parameter |
Description |
|
Project |
The name of the MaxCompute project that you created. |
|
Schema |
The schema of the MaxCompute project. |
|
Table name |
The name of the source MaxCompute table. |
|
Filter condition |
The system automatically generates a filter condition for the selected partitioned table. You can also adjust this condition as needed. Only data that meets the filter condition is synchronized. Note
The filter condition is the content that follows the |
Set a destination Hologres table
Configure the following parameters for the destination table.
|
Parameter |
Description |
|
Instance |
The destination Hologres instance. After you select a Hologres data source in the Select a data source section, the system automatically identifies the specific instance. Note
Click Pages for Managing Destination next to Select a data source to open the Holo console (instance monitoring), Slow Query, Active connection management, DB authorization, and User Management pages. |
|
Database |
The database in the destination Hologres instance. |
|
Schema |
The |
|
Table name |
The name of the internal table in Hologres. If a table with the same name already exists, the following policies apply:
|
|
Synchronization field |
Select the fields to synchronize. |
|
Partition configuration |
Select the MaxCompute table partitions to synchronize. Note
Hologres currently supports synchronizing data only from first-level partitions. If a MaxCompute table has multiple partition levels, the system maps them to a single partition level in Hologres and automatically converts the extra partition fields into regular fields. |
|
Index configuration |
Create indexes on the Hologres internal table to accelerate queries on the synchronized data. For more information about how to create an index, see CREATE TABLE. |
Configure more parameters
|
Parameter |
Description |
|
GUC parameter |
GUC parameters to set before you import data from MaxCompute. For a list of supported GUC parameters, see GUC parameters. Other SQL statements are not supported. |
|
External server |
The default value is |
|
SQL script |
|
Test the synchronization node
To test the synchronization node, configure the test parameters based on your business requirements.
-
Configure the node's debugging settings.
In the Run Configuration section on the right side of the editor, configure the Compute Resource and Resource Group. The following table describes the parameters.
Parameter
Description
Compute Resource
Select the Hologres computing resource that you have bound.
Resource Group
Select the resource group that passed the connectivity test when you bound the Hologres computing resource.
CUs for Scheduling
This node uses the default CU value. No change is needed.
Script parameter
If you define a variable in the filter condition using the ${Parameter Name} format, you must specify the Parameter name and Parameter Value in the Script Parameters section. When the task runs, the system replaces the variable with its specified value. For more information, see Node scheduling configuration.
-
To test the node, click Save and then click Run to start the synchronization task.
Next steps
-
Node scheduling configuration: If the node needs to run periodically, configure its Scheduling Policy in the Scheduling Settings pane on the right side of the editor.
-
Node publishing: To publish a task to the production environment, click the
icon to start the publishing process. Nodes in the project directory are periodically scheduled only after they are published to the production environment. -
After the data is synchronized from MaxCompute, you can use HoloWeb to query the data in the Hologres table. For more information, see HoloWeb.
FAQ
-
Error message:
get table columns occurs Invalid name:xxx. -
Solution: Verify that the source project name is correct and does not contain spaces or other invalid characters.