This topic describes how to create metadata mappings and data synchronization tasks for Hologres based on MaxCompute.
Background information
In a traditional data warehouse architecture, upstream real-time or batch data is written to a data warehouse and analyzed using an online analytical processing (OLAP) engine. This process is shown in the upper part of the following figure. However, some scenarios require you to read Hologres data in MaxCompute, as shown in the lower part of the following figure. Examples of these scenarios include the following:
Real-time data exposure and archiving: Data must be quickly exposed from real-time data sources to business applications. After the real-time data warehousing requirements are met, the data is archived to the corresponding layers and subject areas of the enterprise-level data warehouse.
Business-first approach and data backflow: Business requirements are met first without unified processing through the data warehouse. However, after the data mart stabilizes, the data must flow back to the enterprise-level data warehouse and be integrated with the DWD and DWS layers.
The data access methods for these two scenarios are as follows:
Browsing data in the real-time data warehouse during the iteration and improvement of the data warehouse model.
Periodically archiving data from the real-time data warehouse or data mart to the enterprise-level data warehouse.
Writing data processed by the enterprise-level data warehouse to the Hologres ADS layer for business consumption.

Function introduction
This tutorial shows you how to build a metadata mapping pipeline from MaxCompute to Hologres with the following features:
Data type mappings differ between MaxCompute and Hologres. Some Hologres data types cannot be synchronized to MaxCompute. For more information, see Data type mappings between MaxCompute and Hologres.
Schema-level metadata mapping: You can use RAM role authentication to read Hologres metadata and data in real time through external schemas for schema-level data access.
Single-table-level metadata mapping: You can select tables in a Hologres data catalog and have the system automatically create MaxCompute external tables that map to the Hologres tables with a single click.
Data synchronization: You can configure tables that require recurring synchronization as data synchronization tasks with a single click. This meets the requirement of periodically synchronizing data to the enterprise-level data warehouse.
Prerequisites
A MaxCompute project is created, and the schema feature is enabled.
A RAM role is created and configured with a trust policy.
Workflow for the business scenario
This tutorial is based on a real-time data warehouse built using Flink and Hologres. In the DWD layer of the Hologres real-time data warehouse, a process is added to map MaxCompute to Hologres schemas and tables and to synchronize data from tables in the DWD layer. The following figure shows the details of this process.
Supported:
Mapping to Hologres schemas using external schemas.
Mapping to Hologres tables using external tables.
Specifying Hologres tables for one-time or recurring data synchronization.
Prepare upstream data
Preparations
ApsaraDB RDS for MySQL instances, Hologres instances, and Flink instances must be in the same region and same zone and use the same VPC. You can use the network detection feature in the Realtime Compute for Apache Flink console to check whether the network is connected between the Flink workspace and the ApsaraDB RDS for MySQL and Hologres instances. For more information, see Perform network detection.
Create an ApsaraDB RDS for MySQL instance and prepare a data source. This tutorial uses DMS to log on to the ApsaraDB RDS for MySQL instance. The key parameter configurations are as follows:
Configuration item
Description
Instance Name
test_dbs_rds
MySQL Database Name
test_dbs_mysqldb
Database Account
test_dbs_mysqluser
Account Type
Select Privileged Account.
Create a Hologres instance and a compute group. In this tutorial, the key parameter configurations are as follows:
Configuration item
Description
Purchase a Hologres instance
Product Type
Exclusive instance (pay-as-you-go)
Instance Type
Compute Group Type
Reserved Computing Resources Of Compute Group
64 CU
Instance Name
test_dbs_holo
Create a database
Database Name
order_dw
Default Schema
public
Add a compute group
read_warehouse_1
Reserved computing resources: Set to 32 CU.
Create a Flink workspace and catalogs. The Flink workspace created in this tutorial is named
test-dbs-flink.
Create a task in Flink to synchronize an entire RDS MySQL database (ODS layer tables)
Create a whole-database sync task for RDS for MySQL in Flink to synchronize data to the public schema of the Hologres database order_dw. This lets you use the read_warehouse_1 compute group of Hologres to query the operational data store (ODS) data.
You must enable binary logging for the ApsaraDB RDS for MySQL database in advance. You can run the show variables like "log_bin"; command in the ApsaraDB RDS for MySQL database to check whether binary logging is enabled. For more information, see MySQL server configurations.
Log on to the Realtime Compute for Apache Flink console and select a region in the top-left corner.
Create a CDAS synchronization job named ODS.
On the page, create an SQL streaming job named ODS and copy the following code to the SQL editor.
CREATE DATABASE IF NOT EXISTS dw.order_dw -- The table_property.binlog.level parameter was set when the catalog was created. Therefore, binary logging is enabled for all tables created using CDAS. AS DATABASE mysqlcatalog.test_dbs_mysqldb INCLUDING all tables -- You can select the tables from the upstream database that you want to ingest into the data warehouse. /*+ OPTIONS('server-id'='8001-8004') */ ; -- Specify the server-id range for the MySQL CDC instance.NoteBy default, this example synchronizes data to the public schema of the order_dw database. You can also synchronize data to a specified schema in the destination Hologres database. For more information, see Use as a destination catalog for CDAS. After you specify a schema, the table name format also changes when you use the catalog. For more information, see Use a Hologres catalog.
If the data structure of the source table changes, you must wait for data changes, such as deletions, insertions, or updates, to occur in the source table before the structural changes are visible in the sink table.
Click Deploy in the upper-right corner to deploy the job.
In the navigation pane on the left, click . For the ODS job that you just deployed, click Start in the Actions column, select Stateless Start, and then click Start.
Load data to the compute group.
Table groups are used to store data in Hologres. When you use read_warehouse_1 to query data in a table group (order_dw_tg_default in this example) in the order_dw database, you must load order_dw_tg_default to the read_warehouse_1 compute group. This enables you to write data with the
init_warehousecompute group and perform service queries with theread_warehouse_1compute group.On the HoloWeb developer page, click SQL Editor, confirm the instance name and database name, and then execute the following command. For more information, see Create a new compute group instance. After the command is successfully executed, you can see that the order_dw_tg_default table group has been loaded to the read_warehouse_1 virtual warehouse.
--View the table groups in the current database. SELECT tablegroup_name FROM hologres.hg_table_group_properties GROUP BY tablegroup_name; --Load a table group to the compute group. CALL hg_table_group_load_to_warehouse ('order_dw.order_dw_tg_default', 'read_warehouse_1', 1); --View the status of loading the table group to the compute group. SELECT * FROM hologres.hg_warehouse_table_groups;In the upper-right corner, switch the compute group to read_warehouse_1 for subsequent query and analysis.

In HoloWeb, execute the following commands to view the data of the three tables synchronized from MySQL to Hologres.
---Query data in the orders table. SELECT * FROM orders; ---Query data in the orders_pay table. SELECT * FROM orders_pay; ---Query data in the product_catalog table. SELECT * FROM product_catalog;
Create a DWD layer table in Flink
Use the Flink Catalog feature to create a wide table named dwd_orders at the DWD layer in Hologres.
On the Query Script tab of the page, copy the following code to the query script, select the target segment, and then click Run to the left of the code line.
-- The fields of the wide table must be nullable because different streams write to the same sink table, and any column may contain null values. CREATE TABLE dw.order_dw.dwd_orders ( order_id bigint not null, order_user_id string, order_shop_id bigint, order_product_id bigint, order_product_catalog_name string, order_fee numeric(20,2), order_create_time timestamp, order_update_time timestamp, order_state int, pay_id bigint, pay_platform int comment 'platform 0: phone, 1: pc', pay_create_time timestamp, PRIMARY KEY(order_id) NOT ENFORCED ); -- You can modify the properties of a Hologres physical table using a catalog. ALTER TABLE dw.order_dw.dwd_orders SET ( 'table_property.binlog.ttl' = '604800' --Change the timeout period of binary logging to one week. );Consume the binary logs of the orders and orders_pay tables at the ODS layer in real time.
On the page, create an SQL stream job named DWD, copy the following code to the SQL editor, and then Deploy and Start the job. This SQL job joins the orders table with the product_catalog table and writes the final results to the dwd_orders table to consolidate the data in real time.
BEGIN STATEMENT SET; INSERT INTO dw.order_dw.dwd_orders ( order_id, order_user_id, order_shop_id, order_product_id, order_fee, order_create_time, order_update_time, order_state, order_product_catalog_name ) SELECT o.*, dim.catalog_name FROM dw.order_dw.orders as o LEFT JOIN dw.order_dw.product_catalog FOR SYSTEM_TIME AS OF proctime() AS dim ON o.product_id = dim.product_id; INSERT INTO dw.order_dw.dwd_orders (pay_id, order_id, pay_platform, pay_create_time) SELECT * FROM dw.order_dw.orders_pay; END;View the data in the dwd_orders wide table.
On the HoloWeb development page, connect to the Hologres instance and log on to the destination database. Then, execute the following command in the SQL editor.
SELECT * FROM dwd_orders;
Bind MaxCompute and Hologres computing resources in DataWorks
You can bind MaxCompute and Hologres computing resources to a DataWorks workspace and view the corresponding catalogs in Data Catalog.
Create a new version of a DataWorks workspace. For more information, see Create a workspace. In this tutorial, the workspace name is set to
DBS_DW_TEST.Add computing resources.
Bind MaxCompute and Hologres computing resources. For more information, see Bind a computing engine. The computing resource names set in this tutorial are as follows:
Computing resource type
Computing resource name
MaxCompute
dbs_mc
Hologres
order_dw
Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the
icon in the navigation pane on the left to go to Data Catalog, where you can view the attached MaxCompute projects and Hologres instances.When you add a Hologres catalog in Data Catalog, different methods result in different behaviors, as described in the following table:
Method to add a Hologres catalog
Applicable scenario
Description
Add using the DataWorks Data Source pattern
Suitable for Data Integration scenarios.
For example, you can periodically synchronize data from a specific table in Hologres to MaxCompute.
A specific database is specified as a DataWorks data source, which carries the identity of the executor. You cannot query other databases without permission.
NoteIf you bind a Hologres computing resource after you create a new version of a DataWorks workspace, a Hologres catalog is added in DataWorks Data Source mode to Data Catalog by default.
Add using the Hologres - instance pattern
Suitable for scenarios that require browsing data across databases.
You can use the identity carried by the computing resource to view databases for which you have permissions. This method facilitates browsing data across databases.
You can create metadata mapping tasks, but cannot create synchronization tasks.

Hover the mouse pointer over the MaxCompute project name or Hologres instance name to view detailed information about the data source or instance.
Data source mode

Hologres - Instance mode

Create a MaxCompute external schema that maps to a Hologres schema
For tables mapped by an external schema, metadata and data are retrieved from Hologres in real time. You do not need to create tables that contain Data Definition Language (DDL) metadata information in MaxCompute. Changes in the Hologres source table structure or data can be detected and queried in real time.
Using this method, you can browse tables at the ODS and DWD layers of the real-time data warehouse, query data on demand, or prepare for subsequent recurring data synchronization.
In the MaxCompute console, on the page, create a foreign data source that connects to a Hologres database. For more information, see Create a Hologres foreign server.
The external data source created in this tutorial is named
dbs_holo_external.ImportantWhen you create a Hologres foreign data source:
For Authentication and Authorization Method, you must select Alibaba Cloud RAM Role.
You must set Host to a Hologres instance endpoint of the classic network type.
On the Hologres page, click Go To HoloWeb in the navigation pane on the left, click Security Center in the top menu bar, and then click User Management in the navigation pane on the left. Add the RAM role as a user of the Hologres instance to ensure that it has table permissions in Hologres. For more information, see User Management.

On the HoloWeb development page of Hologres, grant the Developer permission (SPM mode only) for the instance to the RAM role. For more information, see Manage databases.
Map the schema of the Hologres instance to a schema in the MaxCompute project and browse the tables in Hologres from MaxCompute.
Log on to the DataWorks console and select a region in the top-left corner.
In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio.
On the DataStudio page, click the
icon in the navigation pane on the left to go to the Data Catalog.Expand the Hologres data catalog, right-click the schema of the target instance (in this tutorial,
public), and select Metadata Mapping To MaxCompute.On the Metadata Mapping To MaxCompute page, configure the parameters for the Hologres source and MaxCompute destination.
The key parameter configurations in this tutorial are as follows. You can keep the default values for other parameters.
Parameter
Description
Project Search Method
Select From DataWorks Data Source.
Data Source
Select the name of the MaxCompute computing resource that is bound to DataWorks.
In this tutorial, it is
dbs_mc.External Schema Name
Specify the name of the external schema in MaxCompute to which the metadata from the source Hologres schema is mapped.
In this tutorial, it is configured as
public.External Data Source
Select the name of the Hologres federated data source that is created in MaxCompute.
In this tutorial, it is
dbs_holo_external.Click Run above Hologres (Source).
After the task runs successfully, you can see the MaxCompute external schema that has the same name as the Hologres schema (public).
You can directly browse the tables in Hologres and query the data in MaxCompute using the following SQL commands.SET odps.namespace.schema=true; SELECT * FROM public.dwd_orders;The result is as follows.
ImportantIf the schema-level mapping runs successfully but the mapped table names are not displayed in the MaxCompute directory of Data Catalog and "Query Failed" is displayed, check whether the permissions of the RAM role that you created are configured correctly. For more information, see Create a RAM role.
Create a MaxCompute external table that maps to a Hologres table
Unlike external schemas, external tables require you to build Hologres tables as external tables in MaxCompute. External tables support two authentication methods: RAM role and dual signature.
RAM role: Supports cross-account role assumption. You must complete the following operations on the Hologres side:
Add the RAM role as a user of the Hologres instance. For more information, see User management.
Grant the Developer permission (SPM mode only) for the instance to the RAM role. For more information, see Manage databases.
Dual signature: Uses the identity of the current task executor for authentication. This means that the current user can access Hologres data through MaxCompute external tables using the same identity that has permissions for tables in Hologres. For more information, see Hologres external tables.
You can select some or all fields for mapping. For information about mapping rules, see the tblproperties parameter section in Parameters.
Procedure:
Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the
icon in the navigation pane on the left to go to the Data Catalog.Expand the Hologres data catalog, right-click the dwd_orders table under the public schema of the target instance, and select Metadata Mapping To MaxCompute.
On the Metadata Mapping To MaxCompute page, configure the parameters for the Hologres source and MaxCompute destination.
The key parameter configurations in this tutorial are as follows. You can keep the default values for other parameters. For more information about the parameters, see Table-level metadata mapping.
Parameter
Description
Instance Search Method
Select From DataWorks Data Source.
Data Source
Select the name of the MaxCompute data source that is bound to DataWorks.
In this tutorial, the name is
dbc_mc.Schema
Specify the name of the external schema in MaxCompute to which the metadata from the source Hologres schema is mapped.
In this tutorial, it is configured as
default.External Table
Specify the name of the external table to be created in MaxCompute. The source table data is mapped to this table. By default, the external table has the same name as the table in Hologres.
NoteCreating an external table is a one-time action and does not automatically refresh metadata. To refresh metadata, you must delete the current external table and manually create the metadata mapping again.
Permissions To Access MaxCompute External Table
Select Dual-signature.
NoteIf you use the RAM role method, you must add users on the Hologres side and grant database permissions.
Lifecycle
Set the lifecycle of the table.
Click Run above Hologres (Source).
After the task runs successfully, the newly created external table is displayed under the MaxCompute schema in the navigation pane on the left.
You can use the following statements to query the data of this external table in MaxCompute.SET odps.namespace.schema=true; SELECT * FROM dwd_orders;The result is as follows:

Create a recurring task to synchronize a Hologres table
To periodically archive DWD table data from the Hologres real-time data warehouse to an internal table in MaxCompute, you can use a data synchronization task and configure a recurring schedule.
Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development & O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the
icon in the navigation pane on the left to go to the Data Development page and create a project folder.Click the
icon in the navigation pane on the left, expand the Hologres data catalog on the Data Catalog page, right-click the dwd_orders table under the public schema of the target instance, and select Data Synchronization To MaxCompute.In the Select Path Where You Want To Save Node dialog box, name the table in the data warehouse
dwd_holo_ordersand press Enter. On the configuration page for synchronizing data from Hologres to MaxCompute, configure the parameters for the Hologres source and the MaxCompute destination.The key parameter configurations in this tutorial are as follows. For more information about the parameters, see Configure a synchronization node.
Parameter
Description
Data Source
Select the name of the MaxCompute data source that is bound to DataWorks.
In this tutorial, it is
dbs_mc.Schema
Select the schema where you want to store the data.
Table
Specify the name of the MaxCompute internal table.
In this tutorial, it is configured as
dwd_holo_orders.Lifecycle
Set the lifecycle of the table.
Import Method
Select the method for writing data to the MaxCompute internal table:
Overwrite: When you need to delete the original data and write new data to the target table, you can select the overwrite method.
Append: Preserves existing data and appends new data to the target table.
Permissions To Access Hologres
You can choose one of the following methods to access the Hologres instance as needed:
Dual-signature access method: Performs Hologres permission verification using your current identity.
You must make sure that the current identity has read permissions on the MaxCompute table and permissions on the Hologres source table that is mapped to the MaxCompute table. For information about access control in MaxCompute, see Data Lakehouse Solution 2.0 user guide. For information about access control in Hologres, see Permission management overview.
Access via RAM role: Authenticates access by specifying a RAM role.
Grant the AliyunSTSAssumeRoleAccess access policy to the RAM user. For more information, see RAM role authorization pattern. After the authorization is complete, configure the specified RAM role in the RamRole field.
Click Scheduling Configuration on the right side of the page. On the Scheduling Configuration page, configure the workflow and scheduling cycle. For more information, see Configure node scheduling.
Click Run above Hologres (Source).
After the task runs successfully, the newly created internal table is displayed under MaxCompute in the navigation pane on the left. You can use the following SQL statements to query the data of this table in MaxCompute.
SET odps.namespace.schema=true; SELECT * FROM default.dwd_holo_orders;The result is as follows:
