DataWorks allows you to create a node to synchronize data from a single Hologres table to MaxCompute. This can help you efficiently store big data. This topic describes how to configure a node to easily synchronize data from Hologres to MaxCompute and fully utilize the high-performance processing capabilities of MaxCompute.
Prerequisites
A MaxCompute project and a Hologres instance are created. For more information, see Create a MaxCompute project and Purchase a Hologres instance.
The MaxCompute project and Hologres instance are associated with the workspace as computing resources, and the computing resources have passed the network connectivity test. For more information, see Associate a computing resource with a workspace (Participate in Public Preview of Data Studio turned on).
A node used to synchronize data to MaxCompute is created. For more information, see Create an auto triggered node.
Limits
Only data in internal databases in Hologres can be synchronized to MaxCompute.
For information about the limits on using Hologres external tables in MaxCompute, see Hologres external tables.
Data types supported by MaxCompute and those supported by Hologres are different. For information about the mappings between MaxCompute data types and Hologres data types, see Data type mappings between MaxCompute and Hologres.
Configure the synchronization node
Go to the configuration tab of the synchronization node and configure the synchronization node based on the following instructions:
Configure settings related to the source
You can configure the source based on the following parameter descriptions.
Parameter | Description |
Source Object Type | The type of object from which you want to synchronize data. The value of this parameter is fixed as |
Data Source | The Hologres computing resource from which you want to synchronize data. |
Instance | The ID of the Hologres instance. The system automatically obtains the value of this parameter, and the value cannot be changed. |
Database | The Hologres database from which you want to synchronize data. |
Schema | The Hologres schema from which you want to synchronize data. |
Table | The name of the table from which you want to synchronize data. |
Filter Conditions | The condition that you want to use to filter data. The system automatically generates a filter condition based on the partitioned table that you use. You can also modify the filter condition based on your business requirements. Data that meets the filter condition will be retained. Note A filter condition is the content of the clause after |
Configure settings related to the destination
You can configure the destination based on the following parameter descriptions.
Parameter | Description | |
Data Source | The MaxCompute computing resource to which you want to write data. | |
Project | The MaxCompute project that corresponds to the MaxCompute computing resource. The system automatically obtains the value of this parameter. | |
Schema | The MaxCompute schema in which you want to store data. This parameter is required only if the schema feature is enabled for the MaxCompute project that you want to use. If the schema feature is not enabled for the MaxCompute project, this parameter is not displayed. For information about how to enable the schema feature, see Enable the schema feature. | |
Table | The name of a MaxCompute internal table. You can configure this parameter based on your business requirements. | |
Lifecycle | The lifecycle of the MaxCompute internal table. From the last update time, if data in the MaxCompute internal table does not change within a specified period, MaxCompute automatically reclaims the table. | |
Fields | Synchronization Fields | You can select the fields that you want to synchronize and configure the data types of the fields in the MaxCompute internal table. |
Partition Configurations | You can configure the partition key column of the MaxCompute internal table based on your business requirements. You can select one of the following options to specify the source of data in the partition key column:
| |
Configure data synchronization settings
You can configure the following parameters on the Data Synchronization Settings section.
Parameter | Description |
Import Method | The method that you want to use to import data. Valid values:
|
Permissions to Access Hologres | The method that you want to use to access the Hologres instance. Valid values:
|
Location | During synchronization, the system automatically generates a MaxCompute table based on the Hologres external storage path. You can use the automatically generated storage path or configure the Hologres external storage path based on your business requirements. |
Debug the synchronization node
To debug and run the synchronization node, configure debugging information based on your business requirements.
Configure properties for debugging the synchronization node.
You can click Debugging Configurations in the right-side navigation pane of the configuration tab of the synchronization node, and configure the following parameters.
Parameter
Description
Computing Resource
Select the MaxCompute computing resource that is associated with the workspace.
Computing Quota
Select the computing quota generated when you created the MaxCompute project, or click Create Computing Quota displayed after you click the drop-down list to create a computing quota. For more information, see Manage quotas for computing resources in the MaxCompute console.
Resource Group
Select the resource group that has passed the connectivity test when you associate the MaxCompute computing resource with the workspace.
CUs for Computing
Retain the default value of this parameter.
Script Parameters
If you define variables in the ${Parameter name} format in the filter condition, you must configure the Parameter Name and Parameter Value parameters in the Script Parameters section. When the synchronization node is run, the variables are replaced with actual values. For more information, see Node scheduling.
To debug and run the synchronization node, click Save and Run.
What to do next
Node scheduling: If you want the system to periodically schedule a node in a workspace directory, you need to click Properties in the right-side navigation pane of the configuration tab of the node and configure the parameters in the Scheduling Policies section.
Node deployment: If you want to deploy a node to the production environment for running, you can click the
icon in the top toolbar of the configuration tab of the node to initiate a deployment process. Nodes in a workspace directory can be periodically scheduled only after they are deployed to the production environment.
Additional information
Field type mismatch: If you encounter field type mismatch issues when you configure the synchronization node, the node fails. You must check whether the data types of fields in the MaxCompute table are correctly configured. For information about mappings between MaxCompute data types and Hologres data types, see Data type mappings between MaxCompute and Hologres.
Inconsistency between the data that is actually synchronized and the data in the partition that you want to synchronize: You must check whether the filter condition is correctly configured in the source.