All Products
Search
Document Center

MaxCompute:Create metadata mappings and data synchronization for Hologres

Last Updated:Sep 11, 2025

This topic describes how to create metadata mappings and data synchronization tasks for Hologres based on MaxCompute.

Background information

In a traditional data warehouse architecture, upstream real-time or batch data is written to a data warehouse and analyzed using an online analytical processing (OLAP) engine. This process is shown in the upper part of the following figure. However, some scenarios require you to read Hologres data in MaxCompute, as shown in the lower part of the following figure. Examples of these scenarios include the following:

  • Real-time data exposure and archiving: Data must be quickly exposed from real-time data sources to business applications. After the real-time data warehousing requirements are met, the data is archived to the corresponding layers and subject areas of the enterprise-level data warehouse.

  • Business-first approach and data backflow: Business requirements are met first without unified processing through the data warehouse. However, after the data mart stabilizes, the data must flow back to the enterprise-level data warehouse and be integrated with the DWD and DWS layers.

The data access methods for these two scenarios are as follows:

  • Browsing data in the real-time data warehouse during the iteration and improvement of the data warehouse model.

  • Periodically archiving data from the real-time data warehouse or data mart to the enterprise-level data warehouse.

  • Writing data processed by the enterprise-level data warehouse to the Hologres ADS layer for business consumption.

image

Function introduction

This tutorial shows you how to build a metadata mapping pipeline from MaxCompute to Hologres with the following features:

Note

Data type mappings differ between MaxCompute and Hologres. Some Hologres data types cannot be synchronized to MaxCompute. For more information, see Data type mappings between MaxCompute and Hologres.

  • Schema-level metadata mapping: You can use RAM role authentication to read Hologres metadata and data in real time through external schemas for schema-level data access.

  • Single-table-level metadata mapping: You can select tables in a Hologres data catalog and have the system automatically create MaxCompute external tables that map to the Hologres tables with a single click.

  • Data synchronization: You can configure tables that require recurring synchronization as data synchronization tasks with a single click. This meets the requirement of periodically synchronizing data to the enterprise-level data warehouse.

Prerequisites

Workflow for the business scenario

This tutorial is based on a real-time data warehouse built using Flink and Hologres. In the DWD layer of the Hologres real-time data warehouse, a process is added to map MaxCompute to Hologres schemas and tables and to synchronize data from tables in the DWD layer. The following figure shows the details of this process.image

Supported:

  • Mapping to Hologres schemas using external schemas.

  • Mapping to Hologres tables using external tables.

  • Specifying Hologres tables for one-time or recurring data synchronization.

Prepare upstream data

Preparations

Important

ApsaraDB RDS for MySQL instances, Hologres instances, and Flink instances must be in the same region and same zone and use the same VPC. You can use the network detection feature in the Realtime Compute for Apache Flink console to check whether the network is connected between the Flink workspace and the ApsaraDB RDS for MySQL and Hologres instances. For more information, see Perform network detection.

  • Create an ApsaraDB RDS for MySQL instance and prepare a data source. This tutorial uses DMS to log on to the ApsaraDB RDS for MySQL instance. The key parameter configurations are as follows:

    Configuration item

    Description

    Instance Name

    test_dbs_rds

    MySQL Database Name

    test_dbs_mysqldb

    Database Account

    test_dbs_mysqluser

    Account Type

    Select Privileged Account.

  • Create a Hologres instance and a compute group. In this tutorial, the key parameter configurations are as follows:

    Configuration item

    Description

    Purchase a Hologres instance

    Product Type

    Exclusive instance (pay-as-you-go)

    Instance Type

    Compute Group Type

    Reserved Computing Resources Of Compute Group

    64 CU

    Instance Name

    test_dbs_holo

    Create a database

    Database Name

    order_dw

    Default Schema

    public

    Add a compute group

    read_warehouse_1

    Reserved computing resources: Set to 32 CU.

  • Create a Flink workspace and catalogs. The Flink workspace created in this tutorial is named test-dbs-flink.

Create a task in Flink to synchronize an entire RDS MySQL database (ODS layer tables)

Create a whole-database sync task for RDS for MySQL in Flink to synchronize data to the public schema of the Hologres database order_dw. This lets you use the read_warehouse_1 compute group of Hologres to query the operational data store (ODS) data.

Important

You must enable binary logging for the ApsaraDB RDS for MySQL database in advance. You can run the show variables like "log_bin"; command in the ApsaraDB RDS for MySQL database to check whether binary logging is enabled. For more information, see MySQL server configurations.

  1. Log on to the Realtime Compute for Apache Flink console and select a region in the top-left corner.

  2. Create a CDAS synchronization job named ODS.

    1. On the Data Development > ETL page, create an SQL streaming job named ODS and copy the following code to the SQL editor.

      CREATE DATABASE IF NOT EXISTS dw.order_dw   -- The table_property.binlog.level parameter was set when the catalog was created. Therefore, binary logging is enabled for all tables created using CDAS.
      AS DATABASE mysqlcatalog.test_dbs_mysqldb INCLUDING all tables -- You can select the tables from the upstream database that you want to ingest into the data warehouse.
      /*+ OPTIONS('server-id'='8001-8004') */ ;   -- Specify the server-id range for the MySQL CDC instance.
      Note
      • By default, this example synchronizes data to the public schema of the order_dw database. You can also synchronize data to a specified schema in the destination Hologres database. For more information, see Use as a destination catalog for CDAS. After you specify a schema, the table name format also changes when you use the catalog. For more information, see Use a Hologres catalog.

      • If the data structure of the source table changes, you must wait for data changes, such as deletions, insertions, or updates, to occur in the source table before the structural changes are visible in the sink table.

    2. Click Deploy in the upper-right corner to deploy the job.

    3. In the navigation pane on the left, click Operation Center > Deployments. For the ODS job that you just deployed, click Start in the Actions column, select Stateless Start, and then click Start.

  3. Load data to the compute group.

    Table groups are used to store data in Hologres. When you use read_warehouse_1 to query data in a table group (order_dw_tg_default in this example) in the order_dw database, you must load order_dw_tg_default to the read_warehouse_1 compute group. This enables you to write data with the init_warehouse compute group and perform service queries with the read_warehouse_1 compute group.

    On the HoloWeb developer page, click SQL Editor, confirm the instance name and database name, and then execute the following command. For more information, see Create a new compute group instance. After the command is successfully executed, you can see that the order_dw_tg_default table group has been loaded to the read_warehouse_1 virtual warehouse.

    --View the table groups in the current database.
    SELECT tablegroup_name FROM hologres.hg_table_group_properties GROUP BY tablegroup_name;
    
    --Load a table group to the compute group.
    CALL hg_table_group_load_to_warehouse ('order_dw.order_dw_tg_default', 'read_warehouse_1', 1);
    
    --View the status of loading the table group to the compute group.
    SELECT * FROM hologres.hg_warehouse_table_groups;
  4. In the upper-right corner, switch the compute group to read_warehouse_1 for subsequent query and analysis.

    image

  5. In HoloWeb, execute the following commands to view the data of the three tables synchronized from MySQL to Hologres.

    ---Query data in the orders table.
    SELECT * FROM orders;
    
    ---Query data in the orders_pay table.
    SELECT * FROM orders_pay;
    
    ---Query data in the product_catalog table.
    SELECT * FROM product_catalog;

    image.png

Create a DWD layer table in Flink

  1. Use the Flink Catalog feature to create a wide table named dwd_orders at the DWD layer in Hologres.

    On the Query Script tab of the Data Development > Data Query page, copy the following code to the query script, select the target segment, and then click Run to the left of the code line.

    -- The fields of the wide table must be nullable because different streams write to the same sink table, and any column may contain null values.
    CREATE TABLE dw.order_dw.dwd_orders (
      order_id bigint not null,
      order_user_id string,
      order_shop_id bigint,
      order_product_id bigint,
      order_product_catalog_name string,
      order_fee numeric(20,2),
      order_create_time timestamp,
      order_update_time timestamp,
      order_state int,
      pay_id bigint,
      pay_platform int comment 'platform 0: phone, 1: pc', 
      pay_create_time timestamp,
      PRIMARY KEY(order_id) NOT ENFORCED
    );
    
    -- You can modify the properties of a Hologres physical table using a catalog.
    ALTER TABLE dw.order_dw.dwd_orders SET (
      'table_property.binlog.ttl' = '604800' --Change the timeout period of binary logging to one week.
    );
  2. Consume the binary logs of the orders and orders_pay tables at the ODS layer in real time.

    On the Data Development > ETL page, create an SQL stream job named DWD, copy the following code to the SQL editor, and then Deploy and Start the job. This SQL job joins the orders table with the product_catalog table and writes the final results to the dwd_orders table to consolidate the data in real time.

    BEGIN STATEMENT SET;
    
    INSERT INTO dw.order_dw.dwd_orders 
     (
       order_id,
       order_user_id,
       order_shop_id,
       order_product_id,
       order_fee,
       order_create_time,
       order_update_time,
       order_state,
       order_product_catalog_name
     ) SELECT o.*, dim.catalog_name 
       FROM dw.order_dw.orders as o
       LEFT JOIN dw.order_dw.product_catalog FOR SYSTEM_TIME AS OF proctime() AS dim
       ON o.product_id = dim.product_id;
    
    INSERT INTO dw.order_dw.dwd_orders 
      (pay_id, order_id, pay_platform, pay_create_time)
       SELECT * FROM dw.order_dw.orders_pay;
    
    END;
  3. View the data in the dwd_orders wide table.

    On the HoloWeb development page, connect to the Hologres instance and log on to the destination database. Then, execute the following command in the SQL editor.

    SELECT * FROM dwd_orders;

    image

Bind MaxCompute and Hologres computing resources in DataWorks

You can bind MaxCompute and Hologres computing resources to a DataWorks workspace and view the corresponding catalogs in Data Catalog.

  1. Create a new version of a DataWorks workspace. For more information, see Create a workspace. In this tutorial, the workspace name is set to DBS_DW_TEST.

  2. Add computing resources.

    Bind MaxCompute and Hologres computing resources. For more information, see Bind a computing engine. The computing resource names set in this tutorial are as follows:

    Computing resource type

    Computing resource name

    MaxCompute

    dbs_mc

    Hologres

    order_dw

  3. Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the image icon in the navigation pane on the left to go to Data Catalog, where you can view the attached MaxCompute projects and Hologres instances.

    When you add a Hologres catalog in Data Catalog, different methods result in different behaviors, as described in the following table:

    Method to add a Hologres catalog

    Applicable scenario

    Description

    Add using the DataWorks Data Source pattern

    Suitable for Data Integration scenarios.

    For example, you can periodically synchronize data from a specific table in Hologres to MaxCompute.

    A specific database is specified as a DataWorks data source, which carries the identity of the executor. You cannot query other databases without permission.

    Note

    If you bind a Hologres computing resource after you create a new version of a DataWorks workspace, a Hologres catalog is added in DataWorks Data Source mode to Data Catalog by default.

    Add using the Hologres - instance pattern

    Suitable for scenarios that require browsing data across databases.

    • You can use the identity carried by the computing resource to view databases for which you have permissions. This method facilitates browsing data across databases.

    • You can create metadata mapping tasks, but cannot create synchronization tasks.

    image

    Hover the mouse pointer over the MaxCompute project name or Hologres instance name to view detailed information about the data source or instance.

    Data source mode

    image

    Hologres - Instance mode

    image

Create a MaxCompute external schema that maps to a Hologres schema

For tables mapped by an external schema, metadata and data are retrieved from Hologres in real time. You do not need to create tables that contain Data Definition Language (DDL) metadata information in MaxCompute. Changes in the Hologres source table structure or data can be detected and queried in real time.

Using this method, you can browse tables at the ODS and DWD layers of the real-time data warehouse, query data on demand, or prepare for subsequent recurring data synchronization.

  1. In the MaxCompute console, on the Tenant Management > Foreign Data Source page, create a foreign data source that connects to a Hologres database. For more information, see Create a Hologres foreign server.

    The external data source created in this tutorial is named dbs_holo_external.

    Important

    When you create a Hologres foreign data source:

    • For Authentication and Authorization Method, you must select Alibaba Cloud RAM Role.

    • You must set Host to a Hologres instance endpoint of the classic network type.

  2. On the Hologres page, click Go To HoloWeb in the navigation pane on the left, click Security Center in the top menu bar, and then click User Management in the navigation pane on the left. Add the RAM role as a user of the Hologres instance to ensure that it has table permissions in Hologres. For more information, see User Management.image

  3. On the HoloWeb development page of Hologres, grant the Developer permission (SPM mode only) for the instance to the RAM role. For more information, see Manage databases.

  4. Map the schema of the Hologres instance to a schema in the MaxCompute project and browse the tables in Hologres from MaxCompute.

    1. Log on to the DataWorks console and select a region in the top-left corner.

    2. In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio.

    3. On the DataStudio page, click the image icon in the navigation pane on the left to go to the Data Catalog.

    4. Expand the Hologres data catalog, right-click the schema of the target instance (in this tutorial, public), and select Metadata Mapping To MaxCompute.

    5. On the Metadata Mapping To MaxCompute page, configure the parameters for the Hologres source and MaxCompute destination.

      The key parameter configurations in this tutorial are as follows. You can keep the default values for other parameters.

      Parameter

      Description

      Project Search Method

      Select From DataWorks Data Source.

      Data Source

      Select the name of the MaxCompute computing resource that is bound to DataWorks.

      In this tutorial, it is dbs_mc.

      External Schema Name

      Specify the name of the external schema in MaxCompute to which the metadata from the source Hologres schema is mapped.

      In this tutorial, it is configured as public.

      External Data Source

      Select the name of the Hologres federated data source that is created in MaxCompute.

      In this tutorial, it is dbs_holo_external.

    6. Click Run above Hologres (Source).

      After the task runs successfully, you can see the MaxCompute external schema that has the same name as the Hologres schema (public).imageYou can directly browse the tables in Hologres and query the data in MaxCompute using the following SQL commands.

      SET odps.namespace.schema=true;
      SELECT * FROM public.dwd_orders;

      The result is as follows.image

      Important

      If the schema-level mapping runs successfully but the mapped table names are not displayed in the MaxCompute directory of Data Catalog and "Query Failed" is displayed, check whether the permissions of the RAM role that you created are configured correctly. For more information, see Create a RAM role.

Create a MaxCompute external table that maps to a Hologres table

Unlike external schemas, external tables require you to build Hologres tables as external tables in MaxCompute. External tables support two authentication methods: RAM role and dual signature.

  • RAM role: Supports cross-account role assumption. You must complete the following operations on the Hologres side:

    • Add the RAM role as a user of the Hologres instance. For more information, see User management.

    • Grant the Developer permission (SPM mode only) for the instance to the RAM role. For more information, see Manage databases.

  • Dual signature: Uses the identity of the current task executor for authentication. This means that the current user can access Hologres data through MaxCompute external tables using the same identity that has permissions for tables in Hologres. For more information, see Hologres external tables.

You can select some or all fields for mapping. For information about mapping rules, see the tblproperties parameter section in Parameters.

Procedure:

  1. Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development And O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the image icon in the navigation pane on the left to go to the Data Catalog.

  2. Expand the Hologres data catalog, right-click the dwd_orders table under the public schema of the target instance, and select Metadata Mapping To MaxCompute.

  3. On the Metadata Mapping To MaxCompute page, configure the parameters for the Hologres source and MaxCompute destination.

    The key parameter configurations in this tutorial are as follows. You can keep the default values for other parameters. For more information about the parameters, see Table-level metadata mapping.

    Parameter

    Description

    Instance Search Method

    Select From DataWorks Data Source.

    Data Source

    Select the name of the MaxCompute data source that is bound to DataWorks.

    In this tutorial, the name is dbc_mc.

    Schema

    Specify the name of the external schema in MaxCompute to which the metadata from the source Hologres schema is mapped.

    In this tutorial, it is configured as default.

    External Table

    Specify the name of the external table to be created in MaxCompute. The source table data is mapped to this table. By default, the external table has the same name as the table in Hologres.

    Note

    Creating an external table is a one-time action and does not automatically refresh metadata. To refresh metadata, you must delete the current external table and manually create the metadata mapping again.

    Permissions To Access MaxCompute External Table

    Select Dual-signature.

    Note

    If you use the RAM role method, you must add users on the Hologres side and grant database permissions.

    Lifecycle

    Set the lifecycle of the table.

  4. Click Run above Hologres (Source).

    After the task runs successfully, the newly created external table is displayed under the MaxCompute schema in the navigation pane on the left.imageYou can use the following statements to query the data of this external table in MaxCompute.

    SET odps.namespace.schema=true;
    SELECT * FROM dwd_orders;

    The result is as follows:image

Create a recurring task to synchronize a Hologres table

To periodically archive DWD table data from the Hologres real-time data warehouse to an internal table in MaxCompute, you can use a data synchronization task and configure a recurring schedule.

  1. Log on to the DataWorks console and select a region in the top-left corner. In the navigation pane on the left, select Data Development & O&M > Data Development. On the page that appears, select a workspace and click Enter DataStudio. On the DataStudio page, click the image icon in the navigation pane on the left to go to the Data Development page and create a project folder.

  2. Click the image icon in the navigation pane on the left, expand the Hologres data catalog on the Data Catalog page, right-click the dwd_orders table under the public schema of the target instance, and select Data Synchronization To MaxCompute.

  3. In the Select Path Where You Want To Save Node dialog box, name the table in the data warehouse dwd_holo_orders and press Enter. On the configuration page for synchronizing data from Hologres to MaxCompute, configure the parameters for the Hologres source and the MaxCompute destination.

    The key parameter configurations in this tutorial are as follows. For more information about the parameters, see Configure a synchronization node.

    Parameter

    Description

    Data Source

    Select the name of the MaxCompute data source that is bound to DataWorks.

    In this tutorial, it is dbs_mc.

    Schema

    Select the schema where you want to store the data.

    Table

    Specify the name of the MaxCompute internal table.

    In this tutorial, it is configured as dwd_holo_orders.

    Lifecycle

    Set the lifecycle of the table.

    Import Method

    Select the method for writing data to the MaxCompute internal table:

    • Overwrite: When you need to delete the original data and write new data to the target table, you can select the overwrite method.

    • Append: Preserves existing data and appends new data to the target table.

    Permissions To Access Hologres

    You can choose one of the following methods to access the Hologres instance as needed:

    • Dual-signature access method: Performs Hologres permission verification using your current identity.

      You must make sure that the current identity has read permissions on the MaxCompute table and permissions on the Hologres source table that is mapped to the MaxCompute table. For information about access control in MaxCompute, see Data Lakehouse Solution 2.0 user guide. For information about access control in Hologres, see Permission management overview.

    • Access via RAM role: Authenticates access by specifying a RAM role.

      Grant the AliyunSTSAssumeRoleAccess access policy to the RAM user. For more information, see RAM role authorization pattern. After the authorization is complete, configure the specified RAM role in the RamRole field.

  4. Click Scheduling Configuration on the right side of the page. On the Scheduling Configuration page, configure the workflow and scheduling cycle. For more information, see Configure node scheduling.

  5. Click Run above Hologres (Source).

    After the task runs successfully, the newly created internal table is displayed under MaxCompute in the navigation pane on the left. You can use the following SQL statements to query the data of this table in MaxCompute.

    SET odps.namespace.schema=true;
    SELECT * FROM default.dwd_holo_orders;

    The result is as follows:image