Schedule data import from MaxCompute using DataWorks - Hologres

DataWorks scheduling tasks allow you to set the time and frequency for data transmission. This ensures data integrity and accuracy during the process. You can import data from a MaxCompute partitioned table to a Hologres partitioned table using DataWorks. This approach combines the advantages of both platforms and improves data processing efficiency and reliability.

Prerequisites

Usage note

Do not enable schema at the tenant or project level in MaxCompute. For more information, see Schema operations.

Procedure

Prepare MaxCompute data.

Log on to the DataWorks console. Switch to the destination region. In the left navigation pane, choose Data Analysis and Service > DataAnalysis. Click Go to DataAnalysis. In the left navigation bar, click the icon to go to the SQL Query page.

On the SQL Query page, enter an SQL statement to create a partitioned table, and then click Run.

This example uses the partitioned table dwd_product_movie_basic_info from the MaxCompute public dataset public_data:

-- DDL for the MaxCompute partitioned table
CREATE TABLE IF NOT EXISTS public_data.dwd_product_movie_basic_info(
  movie_name STRING COMMENT 'Movie name',
  director STRING COMMENT 'Director',
  scriptwriter STRING COMMENT 'Scriptwriter',
  area STRING COMMENT 'Production region/country',
  actors STRING COMMENT 'Actors',
  `type` STRING COMMENT 'Type',
  movie_length STRING COMMENT 'Movie length',
  movie_date STRING COMMENT 'Release date',
  movie_language STRING COMMENT 'Language',
  imdb_url STRING COMMENT 'IMDb URL'
) 
PARTITIONED BY (ds STRING) STORED AS ALIORC;

On the SQL Query page, run the following SQL command to view the data in the partitioned table:
View the data in the 20170112 partition.
```
SELECT * FROM public_data.dwd_product_movie_basic_info WHERE ds = '20170112';
```

Create a foreign table in Hologres.
You can create a foreign table in Hologres to map the source MaxCompute table. The field order and data types in the foreign table must match those in the MaxCompute table.
1. Log on to the HoloWeb Query page.
2. Click the icon the SQL Editor tab to open an Ad-hoc Query window. In the toolbar of the query window, set Instance Name and Database.
3. In the editor of the Ad-hoc Query window, run the import foreign schema command.
  This creates a Hologres foreign table named dwd_product_movie_basic_info.
```
import FOREIGN schema public_data limit TO (dwd_product_movie_basic_info) FROM server odps_server INTO public options(if_table_exist 'update');
```

Create an internal table in Hologres.

You can create an internal table in Hologres to receive and store data.

In HoloWeb, Click the icon to create a query.

On the Ad-hoc Query page, set Instance Name and Database. Then, run the following command.

This example imports a MaxCompute partitioned table into Hologres. Therefore, the internal table that you create in Hologres must also be a partitioned table.

Note

The following CREATE TABLE statement is a simple example. You can create the table and set appropriate indexes as needed to achieve better query performance.

BEGIN;
CREATE TABLE "public"."holo_dwd_product_movie_basic_info" (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
)
PARTITION BY LIST (ds);
CALL SET_TABLE_PROPERTY('"public"."holo_dwd_product_movie_basic_info"', 'orientation', 'column');
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_name" IS 'Movie name';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."director" IS 'Director';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."scriptwriter" IS 'Scriptwriter';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."area" IS 'Production region/country';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."actors" IS 'Actors';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."type" IS 'Type';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_length" IS 'Movie length';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_date" IS 'Release date';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."movie_language" IS 'Language';
comment ON COLUMN "public"."holo_dwd_product_movie_basic_info"."imdb_url" IS 'IMDb ID';
COMMIT;

Create a child table.

In this step, you use a Hologres SQL node to schedule jobs for the partitioned table.

Log on to the DataWorks console, go to the Data Development page, and create a Hologres SQL node. For more information, see Create a Hologres SQL node.

On the node editor page, enter the following statements.

Hologres does not support writing data directly to a parent table. You must create a child table in Hologres that corresponds to the partition key value in the MaxCompute source table. Then, import the partition data into the corresponding child table. The partition key value is controlled by the ${bizdate} parameter. The scheduling system automatically assigns a value to this parameter for periodic scheduling. For more information about scheduling parameters, see Supported formats of scheduling parameters.

Note

The imported partition data must match the partition key value. In this example, the partition key is ds. Otherwise, an error occurs.

The data import logic can vary by scenario. The following two scenarios are provided for reference. You can select one as needed.

Scenario 1: Import new partition data.

-- Create a temporary child partitioned table.
BEGIN;
CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
);
COMMIT;

-- Update the foreign table data.
import FOREIGN schema public_data limit TO (dwd_product_movie_basic_info) FROM server odps_server INTO public options(if_table_exist 'update');

-- Wait for 30 seconds before importing to Hologres. This prevents synchronization failures caused by slow cache updates of Hologres metadata.
SELECT pg_sleep(30); 

-- Import MaxCompute data into the temporary child partitioned table.
INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
SELECT 
    "movie_name",
    "director",
    "scriptwriter",
    "area",
    "actors",
    "type",
    "movie_length",
    "movie_date",
    "movie_language",
    "imdb_url",
    "ds"
FROM "public".dwd_product_movie_basic_info
WHERE ds='${bizdate}';

-- Import the new partition data.
BEGIN;

ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};

-- Attach the temporary child partitioned table to the parent partitioned table.
ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');

COMMIT;

Scenario 2: Refresh historical partition data.

-- Create a temporary child partitioned table.
BEGIN;
CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
);
COMMIT;

-- Update the foreign table data.
import FOREIGN schema public_data limit TO (dwd_product_movie_basic_info) FROM server odps_server INTO public options(if_table_exist 'update');

-- Wait for 30 seconds before importing to Hologres. This prevents synchronization failures caused by slow cache updates of Hologres metadata.
SELECT pg_sleep(30); 

-- Import MaxCompute data into the temporary child partitioned table.
INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
SELECT 
    "movie_name",
    "director",
    "scriptwriter",
    "area",
    "actors",
    "type",
    "movie_length",
    "movie_date",
    "movie_language",
    "imdb_url",
    "ds"
FROM "public".dwd_product_movie_basic_info
WHERE ds='${bizdate}';

-- Refresh the historical partition data.
BEGIN;

ALTER TABLE IF EXISTS holo_dwd_product_movie_basic_info DETACH PARTITION holo_dwd_product_movie_basic_info_${bizdate};

DROP TABLE IF EXISTS holo_dwd_product_movie_basic_info_${bizdate};

ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};

-- Attach the child partitioned table to the parent partitioned table.
ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');

COMMIT;

Configure scheduling.
On the Hologres SQL editor page, click Properties on the right side of the node editor to configure the scheduling properties for the node.
Note
Change the following parameters and retain the default values for the others.
- Basic property
  Parameter
  Value
  Parameters
  bizdate=${yyyymmdd}
- Time properties
  Property
  Value
  Instance Generation Mode
  Immediately After Deployment
  Rerun
  Allow upon Failure Only
  Scheduled time
  00:05
- Scheduling dependencies
  You can set the dependency to the root node. You can also select an existing parent node as needed. First, set Automatic Parsing Before Committing to Yes, and then click Parse Input and Output from Code. The root node is automatically parsed. Finally, set Automatic Parsing Before Committing to No.
Publish the scheduling task.
1. On the Hologres SQL editor page, click the icon in the toolbar to save the node.
2. Click the icon in the toolbar to submit the node.
3. In the Submit dialog box, enter a change description.
4. Click Confirm.
Publish the task in the Operation Center.
1. On the Hologres SQL editor page, click Operation Center in the upper-right corner of the toolbar.
2. On the Operation Center page, choose Auto Triggered Node O&M > Auto Triggered Nodes in the left navigation pane.
3. Click the node name. In the canvas, right click your node and choose Run > Backfill Data for Current Node.
4. In the left navigation pane, choose O&M Assistant > Data Backfill to view the running task and its status.
View data.
After the task is successfully executed, a child partitioned table for the corresponding partition data is automatically created in Hologres.
1. Go to the Data Development page and create a Hologres SQL node. For more information, see Hologres SQL node.
2. On the node editor page, enter the following statements to query the data.
  - View the data in the child partitioned table.
```
SELECT * FROM holo_dwd_product_movie_basic_info_20170112;
```
  - View the total data in the parent partitioned table.
```
SELECT COUNT (*) FROM holo_dwd_product_movie_basic_info;
```

Property	Value
Instance Generation Mode	Immediately After Deployment
Rerun	Allow upon Failure Only
Scheduled time	00:05

Parameter	Value
Parameters	bizdate=${yyyymmdd}