DataWorks allows you to create a node to synchronize data from a single MaxCompute table to Hologres. This can help you efficiently perform big data analysis and real-time queries. This topic describes how to configure a node to easily synchronize data from MaxCompute to Hologres and fully utilize the high-performance query capabilities of Hologres.
Background information
When you run a node to synchronize data from a MaxCompute internal table to a Hologres internal table, the data is first imported into a Hologres foreign table and then synchronized from the foreign table to a Hologres internal table. Data synchronization from MaxCompute to a Hologres foreign table is implemented by executing the IMPORT FOREIGN SCHEMA statement.
Prerequisites
A MaxCompute project and a Hologres instance are created. For more information, see Create a MaxCompute project and Purchase a Hologres instance.
The MaxCompute project and Hologres instance are associated with the workspace as computing resources, and the computing resources have passed the network connectivity test. For more information, see Associate a computing resource.
Limits
You can create a Hologres foreign table and read data from the foreign table only if the related MaxCompute internal table exists.
Create a synchronization node
You must first create a node used to synchronize data to Hologres and go to the configuration tab of the node. For more information, see Create an auto triggered node.
Manage the Hologres data source
After the data is synchronized to Hologres, you can perform the following operations to manage the Hologres data source on the configuration tab of the synchronization node:
Select the Hologres data source that is generated after you associate the Hologres instance with the workspace as a computing resource from the Connections drop-down list.
Click Pages for Managing Destination next to the drop-down list and perform operations on the Hologres instance to which the Hologres data source corresponds by selecting the following options:
Holo console (instance monitoring): Allows you to manage the Hologres instance in the Hologres console.
Slow Query: Allows you to view and analyze the historical slow queries of the Hologres instance in a visualized manner.
Active connection management: Allows you to diagnose and manage connections in the Hologres instance.
DB authorization: Allows you to create databases in the Hologres instance or grant permissions on the databases created in the Hologres instance.
User management: Allows you to use the user management module of the Hologres console to add users to or delete users from the Hologres instance and grant permissions to users.
Configure the synchronization node
After you select the Hologres data source, you can configure the synchronization node by referring to the following instructions:
Configure settings related to the source
You can configure the source based on the following parameter descriptions.
Parameter | Description |
Source Object Type | The type of the object from which you want to synchronize data. The value of this parameter is fixed as |
Project | The name of the MaxCompute project from which you want to synchronize data. |
Schema | The name of the MaxCompute schema that you want to use. |
Table Name | The name of the table from which you want to synchronize data. |
Filter Condition | The condition that you want to use to filter data. The system automatically generates a filter condition based on the partitioned table that you use. You can also modify the filter condition based on your business requirements. Data that meets the filter condition will be retained. Note A filter condition is the content of the clause after |
Configure settings related to the destination
You can configure the destination based on the following parameter descriptions.
Parameter | Description | |
Instance | The name of the Hologres instance that you want to use. The system automatically matches the Hologres instance based on the Hologres data source that you select from the Connections drop-down list. | |
Database | The name of the Hologres database that you want to use. The system automatically matches the database based on the Hologres data source that you select from the Connections drop-down list. | |
Schema | The name of the Hologres schema to which the desired Hologres internal table belongs. | |
Table Name | The name of a Hologres internal table. You can configure this parameter based on your business requirements. If the table name that you specify already exists, the policy used to process the situation varies based on the table type.
Note If the schema of the new table is different from that of the existing table, an error is reported. | |
Fields | Synchronization Field | You can select the fields to which you want to write data and configure the data types of the fields in the Hologres internal table. |
Partition Configurations | You can configure the partition key column of the Hologres internal table based on your business requirements. | |
Index Configuration | You can create an index for the Hologres internal table that stores the synchronized MaxCompute data to facilitate subsequent data queries. For information about how to create indexes, see CREATE TABLE.
| |
Configure advanced parameters
You can configure GUC parameters and an external server in the Configure Advanced Settings section of the configuration tab of the synchronization node.
Parameter | Description |
GUC Parameters | You must configure specific GUC parameters for the synchronization node. For information about the supported GUC parameters, see GUC parameters. Other SQL statements are not supported. |
External Server | The default value is |
Debug the synchronization node
To debug and run the synchronization node, configure debugging information based on your business requirements.
Configure properties for debugging the synchronization node.
You can click Debugging Configurations in the right-side navigation pane of the configuration tab of the synchronization node, and configure the following parameters.
Parameter
Description
Computing Resource
Select the Hologres computing resource that is associated with the workspace.
Resource Group
Select the resource group that has passed the connectivity test when you associate the Hologres computing resource with the workspace.
CUs for Computing
Specify the number of CUs that you want to use to run the synchronization node. The default value is
0.25.Script Parameters
If you define variables in the
${Parameter name}format in the filter condition, you must configure the Parameter Name and Parameter Value parameters in the Script Parameters section. When the synchronization node is run, the variables are replaced with actual values. For more information, see Node scheduling.To debug and run the synchronization node, click Save and Run.
What to do next
Node scheduling: If a node in a workspace directory needs to be periodically scheduled, you need to click Properties in the right-side navigation pane of the configuration tab of the node and configure the parameters in the Scheduling Policies section.
Node deployment: If you want to deploy a node to the production environment, you can click the
icon in the top toolbar of the configuration tab of the node to initiate the deployment process. Nodes in a workspace directory can be periodically scheduled only after they are deployed to the production environment.
Additional information
Field type mismatch: If you encounter field type mismatch issues when you configure the synchronization node, the node fails. You must check whether the data types of fields in the Hologres table are correctly configured. For information about mappings between MaxCompute data types and Hologres data types, see Data type mappings between MaxCompute and Hologres.
Inconsistency between the data that is actually synchronized from a partition and the original data in the partition: You must check whether the filter condition is correctly configured in the source.