Use a DataWorks data synchronization node to synchronize data from a single MaxCompute table into Hologres for real-time queries and big data analytics.
How it works
The synchronization follows a two-stage process:
DataWorks imports the MaxCompute Internal Table into a Hologres External Table using the IMPORT FOREIGN SCHEMA command.
Data is then synchronized from the External Table into a Hologres Internal Table.
Prerequisites
Before you begin, ensure that you have:
A MaxCompute project and a Hologres instance
Both resources bound to DataWorks as compute resources and passing the Connectivity Test
Limitations
An External Table can only be created and read when the MaxCompute source table exists.
Synchronize a MaxCompute table to Hologres
Before you start, note that the node configuration involves four sets of decisions:
Decision area | What you configure |
Destination data source | Which Hologres instance and database to write to |
Source table | Which MaxCompute project, schema, and table to read from, and any row filter |
Destination table | Target schema, table name, fields, partitioning, and index options |
Run configuration | Compute resource and scheduling parameters |
Step 1: Create a synchronization node
Create a synchronization node for Hologres and open its configuration page before proceeding.
Step 2: Select the destination data source
On the node configuration page:
From the Data Source dropdown, select the Hologres data source you have bound.
Click Destination Management to open the management dialog. The available operations are:
Operation
Description
Manage the Hologres instance in the HoloWeb console
View and analyze historical slow queries
Diagnose and manage connections to the instance
Add a database or grant database permissions
Add or remove users and manage permissions in HoloWeb
Step 3: Select the MaxCompute source table
Parameter | Description |
Source Object Type | The default value is |
Project | The MaxCompute project that contains the data to synchronize. |
Schema | The schema within the project. |
Table | The table to synchronize. |
Filter Condition | Filters which rows to synchronize—equivalent to the |
Step 4: Configure the Hologres destination table
Parameter | Description |
Instance | Populated automatically from the data source selected in step 2. |
Database | Populated automatically from the data source selected in step 2. |
Schema | The schema to which the Hologres Internal Table belongs. |
Table | The name for the Hologres Internal Table. See table naming behavior below. |
Synchronization Field (under Field) | The fields to include in the synchronization and their data types in the destination table. |
Partition Configuration | The partition key fields for the destination table. |
Index Configuration | Index settings for faster queries. See index options below. |
Table naming behavior
If a table with the same name already exists in Hologres, the behavior depends on the table type:
Table type | Existing table exists | Behavior |
Non-partitioned | Yes | Hologres deletes the existing table and all its data, then creates a new table. |
Partitioned | Yes | Hologres keeps the existing table and data, and creates a new partition sub-table for the incoming partition value. |
An error occurs if the schema of the new table differs from the existing table.
For non-partitioned tables, the existing data is permanently deleted before the new table is created. Verify the target table name carefully before running the task.
Index options
Under Index Configuration, set the following properties:
Option | Description |
The table storage format: row store, column store, or hybrid row-column store. Choose based on your query pattern. | |
Time to Live (TTL) (Seconds) | How long Hologres retains data before clearing it. The TTL starts from when data is first written. Default: Permanent. |
Binlog | Whether to enable Binlog for this table. See Subscribe to Hologres binlogs. |
Binlog Time to Live | Retention period for Binlog data. Default: Permanent. |
Search for specific fields and configure their properties. |
For full details on creating indexes, see CREATE TABLE.
Step 5: Configure advanced settings
In the Advanced section:
Parameter | Description |
GUC Parameter | Configuration parameters required before importing data from MaxCompute. For supported parameters, see GUC parameters. Other SQL statements are not supported. |
External Server | The default value is |
Step 6: Run the synchronization node
In the Run Configuration tab, set the following:
Parameter
Description
Compute Engine Instance
The Hologres compute resource you have bound.
Resource Group
The resource group that passed the Connectivity Test when you bound the Hologres compute resource.
Compute CU
The number of compute units (CUs) to allocate for the task. Default:
0.25.Parameter
If the filter condition contains
${ParameterName}variables, configure Parameter Name and Parameter Value here. The task replaces each variable with its actual value at runtime. For details, see Node scheduling configuration.Click Save, then click Run.
What's next
Schedule the node: To run the synchronization on a recurring basis, set a Scheduling Policy in the Schedule tab. See Node scheduling configuration.
Deploy to production: Click the
icon to open the deployment dialog. After deployment, the node runs on the configured schedule in the production environment. See Deploy a node.
FAQ
The synchronization task fails with a field data type mismatch.
Check that the data types in the Hologres destination table match the corresponding fields in the MaxCompute source table. A mismatch at configuration time causes the task to fail at runtime.
Data is inconsistent after synchronizing a single partition.
Check the filter condition for the source partition. An incorrect filter condition is the most common cause of partial or mismatched partition data.