Use the Schema Synchronization from MaxCompute node in DataStudio to batch-create Hologres external tables that mirror the schemas of your MaxCompute internal tables. Once created, those external tables let you run accelerated queries against MaxCompute data directly from Hologres, without manually writing IMPORT FOREIGN SCHEMA statements.
Hologres is an end-to-end real-time data warehousing service developed by Alibaba Cloud and is seamlessly connected to MaxCompute at the underlying layer.
This node supports query acceleration for MaxCompute internal tables only. MaxCompute external tables and views are not supported.
Prerequisites
Before you begin, ensure that you have:
Create and configure the node
The end-to-end process has three steps: create the node, configure it, and run it. The following steps were performed in the China (Shanghai) region; other regions follow the same flow.
Step 1: Create the node
-
Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Integration > Data Integration, select the target workspace, and click Go to Data Integration to open DataStudio.
-
Create a workflow if you don't have one. Move the pointer over the
icon and select Create Workflow. In the Create Workflow dialog box, set Workflow Name and click Create. -
Create a Schema Synchronization from MaxCompute node. Move the pointer over the
icon and choose Create Node > Hologres > Schema Synchronization from MaxCompute. Alternatively, right-click an existing workflow and choose Create Node > Hologres > Schema Synchronization from MaxCompute. In the Create Node dialog box, set Name, Engine Instance, Node Type, and Path, then click Confirm. The configuration tab opens.
Step 2: Configure the node
On the configuration tab, fill in three sections.
Destination information
Specify the Hologres compute engine where the external tables will be stored.
| Parameter | Description |
|---|---|
| Destination Name | Name of the Hologres compute engine |
| Destination Database | Name of the database in the compute engine |
| Schema | Name of the schema in the database. Default: public |
Source (batch create tables based on the following data)
Specify the MaxCompute tables whose schemas you want to replicate.
| Parameter | Description |
|---|---|
| Type | Type of the source table. Only MaxCompute is supported. |
| Servers | Server where the source tables reside. Use the built-in odps_server server, which is based on postgres_fdw. |
| Source Project | Name of the MaxCompute project that contains the source tables |
| Select Tables for Query Acceleration | Tables to replicate. Choose All Tables in Database to replicate all tables in the project, or Selected Tables to pick specific tables by name. Fuzzy match is supported: enter a keyword to list all tables whose names contain it. |
Advanced settings
Choose how to handle conflicts when creating external tables.
| Parameter | Options |
|---|---|
| Action for Table Name Conflicts | Ignore Conflicts and Continue Creating Tables |
| Update and Change Names of Tables with Same Names | |
| Report Error and Create No Table | |
| Data Type Not Supported | Report Error and Import Failed: The Hologres external table fails to be created. |
| Ignore and Skip Unsupported Fields: The system skips fields whose data types are not supported and continues to create the Hologres external table. |
Step 3: Save and run the node
-
In the top navigation bar of the configuration tab, click the
icon to save the configuration. -
Click the
icon to create the Hologres external tables.
You must select a serverless resource group that is connected to the Hologres compute engine to run the node. For more information, see Network connectivity solutions.
What to do next
After the external tables are created:
-
View the tables: Go to the Workspace Tables pane in DataStudio. For details, see Manage tables.
-
Query MaxCompute data: Run Hologres commands to query data through the external tables. For details, see Create a foreign table in Hologres to accelerate queries on MaxCompute data.