All Products
Search
Document Center

Hologres:Import MaxCompute data to Hologres regularly with DataWorks

Last Updated:Mar 26, 2026

This tutorial shows you how to import a MaxCompute partitioned table into a Hologres internal partitioned table on a daily schedule, using a DataWorks Hologres SQL node. The scheduling parameter ${bizdate} drives which partition is loaded each run, so new partitions load automatically without changes to the node.

Prerequisites

Before you begin, make sure you have:

Usage notes

The Schema service must not be enabled at the tenant or project level in MaxCompute. For more information, see Schema operations.

How it works

The import pipeline runs in a DataWorks Hologres SQL node on a daily schedule:

  1. Refresh the external table metadata so Hologres sees the latest MaxCompute partition.

  2. Load the target partition's data into a temporary internal table.

  3. Swap the temporary table into the partitioned parent table as a child partition.

The partition key value (ds) is driven by the DataWorks scheduling parameter ${bizdate}, which the scheduler replaces with the business date (in yyyymmdd format) at runtime. For details on scheduling parameters, see Supported formats for scheduling parameters.

Data preparation

This tutorial uses the public_data.dwd_product_movie_basic_info partitioned table from the MaxCompute public dataset as the source. The table schema is:

-- DDL for the MaxCompute partitioned table
CREATE TABLE IF NOT EXISTS public_data.dwd_product_movie_basic_info(
  movie_name STRING COMMENT 'Movie name',
  director STRING COMMENT 'Director',
  scriptwriter STRING COMMENT 'Screenwriter',
  area STRING COMMENT 'Production region or country',
  actors STRING COMMENT 'Lead actors',
  `type` STRING COMMENT 'Type',
  movie_length STRING COMMENT 'Movie length',
  movie_date STRING COMMENT 'Release date',
  movie_language STRING COMMENT 'Language',
  imdb_url STRING COMMENT 'IMDb ID'
)
PARTITIONED BY (ds STRING) STORED AS ALIORC;

Set up the import pipeline

The following steps walk you through the full setup. Steps 1–3 are one-time preparation. Steps 4–6 configure and publish the scheduled job. Steps 7–8 verify the result.

Step Purpose
1. Prepare MaxCompute data Confirm the source partition exists and has the expected data
2. Create an external table in Hologres Map the MaxCompute source table so Hologres can read it
3. Create a managed table (internal table) in Hologres Create the destination partitioned table
4. Develop data for partitioned child tables Write the import SQL and choose a scenario (new or refresh)
5. Configure scheduling Set the bizdate parameter and daily run time
6. Publish the scheduling task Submit the node so the scheduler picks it up
7. Publish in Operation Center Trigger a historical load to verify the pipeline
8. View data Query Hologres to confirm data loaded correctly

Step 1: Prepare MaxCompute data

  1. Log on to the DataWorks console. In the target region, choose Data Analytics & Services > Data Analysis in the left navigation pane. Click Go to Data Analytics. In the left navigation pane, click the image icon to open the SQL Query page.

  2. Enter the following SQL statement to verify data in the 20170112 partition, then click Run.

    SELECT * FROM public_data.dwd_product_movie_basic_info WHERE ds = '20170112';

    The query result is shown in the following figure. <br />

Step 2: Create an external table in Hologres

Create an external table to map the MaxCompute source table. The field order and field types must match the MaxCompute table.

  1. Log on to the HoloWeb console SQL editor.

  2. Click the image icon below the SQL Editor tab to open an Ad-hoc Query window. In the toolbar, select your Instance Name and Database.

  3. Enter the following statement and click Run. The IMPORT FOREIGN SCHEMA command creates an external table named dwd_product_movie_basic_info in Hologres that maps to the MaxCompute source table.

    IMPORT FOREIGN SCHEMA public_data LIMIT TO (dwd_product_movie_basic_info) FROM SERVER odps_server INTO public OPTIONS(if_table_exist 'update');

Step 3: Create a managed table (internal table) in Hologres

Create a column-oriented partitioned internal table to store the imported data.

  1. On the HoloWeb development page, click Ad-hoc Query.

  2. Select your Instance Name and Database, enter the following statement, and click Run.

    This is a simplified example. Adjust the Data Definition Language (DDL) to match your schema requirements, and configure appropriate indexes to optimize query performance.
    BEGIN;
    CREATE TABLE "public"."holo_dwd_product_movie_basic_info" (
     "movie_name" TEXT,
     "director" TEXT,
     "scriptwriter" TEXT,
     "area" TEXT,
     "actors" TEXT,
     "type" TEXT,
     "movie_length" TEXT,
     "movie_date" TEXT,
     "movie_language" TEXT,
     "imdb_url" TEXT,
     "ds" TEXT
    )
    PARTITION BY LIST (ds);
    CALL SET_TABLE_PROPERTY('"public"."holo_dwd_product_movie_basic_info"', 'orientation', 'column');
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_name" IS 'Movie name';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."director" IS 'Director';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."scriptwriter" IS 'Screenwriter';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."area" IS 'Production region or country';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."actors" IS 'Lead actors';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."type" IS 'Type';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_length" IS 'Movie length';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_date" IS 'Release date';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_language" IS 'Language';
    COMMENT ON COLUMN "public"."holo_dwd_product_movie_basic_info"."imdb_url" IS 'IMDb ID';
    COMMIT;

Step 4: Develop data for partitioned child tables

  1. Log on to the DataWorks console and go to the DataStudio page. Create a Hologres SQL node. For more information, see Hologres SQL node.

  2. On the node editing page, enter the SQL for your import scenario. Hologres does not support writing partitioned data directly to a partitioned parent table. Instead, create a temporary partitioned child table, load data into it, then attach it to the parent table. The partition key value is controlled by ${bizdate}, which the DataWorks scheduler replaces at runtime. The shared steps — creating the temporary table, refreshing the external table metadata, and loading data — are identical for both scenarios. Only the final attachment step differs. Shared SQL (run first in both scenarios)

    Important

    The imported data must match the partition key value. In this example, the partition key is ds. A mismatch causes an error.

    If the partition already exists, ATTACH PARTITION fails. Either use Scenario 2 to refresh it, or manually drop the existing child table before re-running.
    -- Create a temporary partitioned child table
    BEGIN;
    CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} (
     "movie_name" TEXT,
     "director" TEXT,
     "scriptwriter" TEXT,
     "area" TEXT,
     "actors" TEXT,
     "type" TEXT,
     "movie_length" TEXT,
     "movie_date" TEXT,
     "movie_language" TEXT,
     "imdb_url" TEXT,
     "ds" TEXT
    );
    COMMIT;
    
    -- Refresh external table metadata
    IMPORT FOREIGN SCHEMA public_data LIMIT TO (dwd_product_movie_basic_info) FROM SERVER odps_server INTO public OPTIONS(if_table_exist 'update');
    
    -- Wait 30 seconds to let Hologres metadata cache sync before loading data
    SELECT pg_sleep(30);
    
    -- Load the target partition from MaxCompute into the temporary table
    INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}
    SELECT
        "movie_name",
        "director",
        "scriptwriter",
        "area",
        "actors",
        "type",
        "movie_length",
        "movie_date",
        "movie_language",
        "imdb_url",
        "ds"
    FROM "public".dwd_product_movie_basic_info
    WHERE ds='${bizdate}';

    After the shared SQL, append the attachment block for your scenario: Scenario 1: Import new partitioned data Use this when the target partition does not yet exist in Hologres.

    BEGIN;
    
    ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};
    
    -- Attach the child table to the parent partitioned table
    ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES IN ('${bizdate}');
    
    COMMIT;

    Scenario 2: Refresh historical partitioned data Use this when the target partition already exists and you need to reload it (for example, when upstream data is corrected). This scenario is idempotent: detaching, dropping, and re-attaching the child table ensures a clean reload even if the node is re-run after a failure.

    BEGIN;
    
    ALTER TABLE IF EXISTS holo_dwd_product_movie_basic_info DETACH PARTITION holo_dwd_product_movie_basic_info_${bizdate};
    
    DROP TABLE IF EXISTS holo_dwd_product_movie_basic_info_${bizdate};
    
    ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};
    
    -- Attach the refreshed child table to the parent partitioned table
    ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES IN ('${bizdate}');
    
    COMMIT;

Step 5: Configure scheduling

On the Hologres SQL node editing page, click Scheduling Configuration in the right-side pane.

Change only the following parameters. Keep all others at their default values.

Basic properties

Parameter Value
Parameter bizdate=${yyyymmdd}

${yyyymmdd} is a DataWorks built-in scheduling variable that resolves to the business date at runtime.

Time properties

时间属性
Parameter Value
Instance generation method Generate immediately after publishing
Re-run property Cannot re-run after successful execution. Can re-run after failed execution
Scheduled time 00:05

Scheduling dependencies

Set the root node as the scheduling dependency, or choose an existing parent node based on your business logic. Set Code Parsing to Yes, click Code Parsing to auto-parse the root node, then set Code Parsing back to No.

Step 6: Publish the scheduling task

  1. On the Hologres SQL node editing page, click the 保存 icon in the toolbar to save the node.

  2. Click the 提交 icon to submit the node.

  3. In the Submit New Version dialog box, enter a description in the Change Description field.

  4. Click OK.

Step 7: Publish in Operation Center

  1. On the Hologres SQL editing page, click O&M on the far right of the toolbar.

  2. Go to the Operation Center page. In the left navigation pane, choose Scheduled Task O&M > Scheduled Tasks.

  3. On the Scheduled Tasks page, right-click the node and choose Backfill Data > Current Node.

    补数据

  4. In the left navigation pane, choose O&M Assistant > Backfill Data to monitor the running tasks and their status.

Step 8: View data

After the task completes successfully, Hologres automatically creates a partitioned child table for the loaded partition.

  1. Go to the DataStudio page and create a Hologres SQL node. For more information, see Hologres SQL node.

  2. Enter the following queries to verify the data. Query the partitioned child table directly:

    SELECT * FROM holo_dwd_product_movie_basic_info_20170112;

    Query the total row count across the partitioned parent table:

    SELECT COUNT (*) FROM holo_dwd_product_movie_basic_info;