DataWorks provides the one-click synchronization feature to help you efficiently synchronize data from MaxCompute to a Hologres database. This capability makes the data available for analysis in Hologres with high performance and low latency. This topic describes how to configure and use the feature.
Background information
You can directly import MaxCompute data into a Hologres database by using SQL statements. This method typically provides better performance. For more information, see Import data from MaxCompute by using SQL statements.
Prerequisites
A workspace directory is created. For more information, see Workspace directories.
A MaxCompute project and a Hologres instance are created.
The MaxCompute project and the Hologres instance are configured as computing resources in DataWorks, and the connectivity tests are successful.
Create the node
Create the one-click MaxCompute data synchronization node.
Configure the node
Go to the one-click MaxCompute data synchronization node editing page and configure the node.
Select a source MaxCompute table
Configure the related parameters based on the information about the source table that you want to synchronize.
Parameter | Description |
Project | The name of the MaxCompute project that you created. |
Schema | The schema of the MaxCompute project. This parameter is displayed only when tenant-level schema syntax is enabled. |
Table Name | The name of the source MaxCompute table that you want to synchronize. |
Filter Condition | The system automatically generates a filter condition based on the partitioned table that you use. You can also adjust the filter condition. Only the data that meets the filter condition is retained. Note A filter condition is the content that follows the |
Set a destination Hologres table
Configure the related parameters based on the information about the destination table to which you want to synchronize data.
Parameter | Description |
Instance | The destination Hologres instance. After you configure the Hologres data source in the Connections, the system automatically identifies the specific instance. Note You can click Pages for Managing Destination next to Connections to go to the Holo console (instance monitoring), Slow Query, Active connection management, DB authorization, and User management pages. |
Database | The database of the destination Hologres instance. |
Schema | The schema of the destination Hologres instance. |
Table Name | The name of the Hologres internal table. If this name already exists, Hologres processes the existing internal table based on the following policies.
|
Synchronization Field | Select the table fields that you want to synchronize. |
Partition Configurations | Select the partition in the source MaxCompute table from which you want to synchronize data. Note Hologres supports receiving data synchronized from a single-level partitioned MaxCompute table. If the source table contains multiple partition levels, you must specify a single partition field to be used as the first-level partition in Hologres. All other partition fields are mapped to regular columns in the destination table. |
Index Configuration | Configure an index on the Hologres internal table to optimize queries on the synchronized MaxCompute data. For more information about how to create an index, see CREATE TABLE. |
Configure other parameters
Parameter | Description |
GUC Parameter | The GUC parameters that you need to set before you import MaxCompute data. For more information about the supported GUC parameters, see GUC parameters. Other SQL statements are not supported. |
External Server | The default value is |
SQL Script |
|
Test the node
Configure the test information based on your business requirements.
Configure the properties of the node for testing.
You can configure Computing Resource and Resource Group in the Debugging Configurations section on the right side of the data synchronization node editing page. The following table describes the parameters.
Parameter
Description
Computing Resource
Select the Hologres computing resource that you attached.
Virtual Warehouse
Use the default value.
Resource Group
Select the resource group that has passed the connectivity test when you attached the Hologres computing resource.
CUs for Computing
Use the default CU value.
Script Parameter
If you define a variable in the filter condition in the format of ${Parameter name}, you need to configure Parameter Name and Parameter Value in the Script Parameter section. When the task is running, the variable is dynamically replaced with the actual value. For more information, see Node scheduling.
When you test and run a node task, you can click Save and Run to run the data synchronization task.
Next steps
Node scheduling: If you want to periodically schedule and run a node in the project directory, you need to set Scheduling Policies in Properties on the right side of the node and configure the related scheduling properties.
Node publishing: If you want to publish a task to the production environment for execution, click the
icon to start the publishing process. This process publishes the task to the production environment. A node in the project directory is periodically scheduled only after the node is published to the production environment.After MaxCompute data is synchronized, you can use HoloWeb to query the data in the Hologres table. For more information, see HoloWeb.
FAQ
Error message:
get table columns occurs Invalid name:xxx.Solution: Check whether the project name that you configured for the source is correct. Check whether the project name contains spaces or other characters.