How to use Hologres dynamic tables - DataWorks - Alibaba Cloud Documentation Center

The DataWorks data catalog integrates the Hologres dynamic table engine, providing a set of visual tools to manage dynamic tables, scheduling dependencies, and tasks. This helps you create and use Hologres dynamic tables in DataWorks.

Prerequisites

A workspace is created, the Use Data Studio (New Version) option is selected, and a resource group is attached to the workspace. For more information, see Create a workspace.
A Hologres data source is created. For more information, see Attach a Hologres computing resource.
A Hologres computing resource is attached to the workspace and has passed the connectivity test. For more information, see Attach a computing resource.

Limits

The Hologres instance must be V3.0.18 or later.
For more information about the limits of dynamic tables, see Dynamic Table supported features and limits.

Prepare data

Create a test table and add test data to your Hologres instance. The following code provides an example:

CREATE  TABLE  tb_order(
    order_id int PRIMARY KEY,
    title VARCHAR(255),
    price FLOAT ,
    payment FLOAT ,
    order_time TIMESTAMP 
)WITH (
    orientation = 'row',
    clustering_key = 'order_id',
    binlog_level = 'replica',
    binlog_ttl = '86400' 
);

INSERT INTO tb_order SELECT 1252555,'book',12.36,12.36,'2024-12-19 09:00:05';
INSERT INTO tb_order SELECT 1252556,'pen',22.36,22.36,'2024-12-19 17:00:05';
INSERT INTO tb_order SELECT 1252557,'role',36.36,36.36,'2024-12-19 17:10:05';

Note

Replace the time in the sample code with the current business time during testing. Otherwise, you may not be able to retrieve the test data when you create a dynamic table using a Hologres query statement.

Go to the data catalog

Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.
In the left navigation pane, click the icon to go to the Data Catalog.

Create a dynamic table

In the Data Catalog area, find the target Hologres instance, expand the directories, and click the icon to the right of Dynamic Table to open the Create Dynamic Table page.
Note
- If you use a workspace in standard mode, a Development database instance and a Production database instance are listed under the Hologres Data Catalog. We recommend that you first create and test the Hologres dynamic table in the development database. After you confirm that the table works as expected, you can create it in the production environment.
- In a workspace in standard mode, dynamic tables that you create in the development database are not automatically synchronized to the production database. To query and use these tables in the production database, you must create the same Hologres dynamic tables in the production database.

On the Create Dynamic Table page, perform the following steps to create the dynamic table.

In the Basic Information section, configure the parameters.
Specify the Table Name and Description for the dynamic table.
In the Field Information section, configure the settings.
1. In the Field Information area on the SQL Development tab, add the following sample code to the editor.
```
SELECT 
    t.order_id,
    t.title,
    t.price,
    t.day_time,
    COUNT(*) AS order_num
FROM (
    SELECT 
        order_id,
        title,
        price,
        EXTRACT(DAY FROM order_time) AS day_time
    FROM tb_order
    WHERE order_time >= NOW() - INTERVAL '1 hour'
) AS t
GROUP BY 
    t.order_id,
    t.title,
    t.price,
    t.day_time;
```
2. After you add the code, click Precompile above the editor. The Hologres DPI engine compiles the SQL statement to check for errors and determine which dynamic table patterns are compatible.
3. After the precompilation succeeds, click the Field Details tab in the Field Information area to add a Description to the dynamic table.
Partition Field Selection.
After you configure the Field Information, select a partition field from the Partition Field drop-down list under Partition Field Information. If you do not select a field, the system creates a non-partitioned dynamic table by default. The data refresh methods and configurations are different for non-partitioned and partitioned dynamic tables:
- Data refresh for non-partitioned dynamic tables: Data in a non-partitioned dynamic table is automatically refreshed either by a DataWorks recurring schedule or by the Hologres engine.
- Data refresh for partitioned dynamic tables: The data refresh lifecycle includes the following phases: Pre-create partitions, Start partition refresh, End partition refresh, and Refresh partition data.
- The parameters for the data refresh policy are different for non-partitioned and partitioned dynamic tables. For more information, see Configure a data refresh policy.

In the Advanced Settings section, configure the parameters.

Configure the parameters for the dynamic table.

Parameter	Description
Storage Mode	Hologres supports three storage modes: Column Store, Row Store, and Row-Column Hybrid Store. The default mode is Column Store. For more information, see Table storage formats: column store, row store, and row-column hybrid store. Column store is suitable for complex queries in various Online Analytical Processing (OLAP) scenarios. Row store is suitable for key-value (KV) query scenarios based on primary keys. Row-column hybrid store is suitable for scenarios where both column store and row store can be used.
Table Group	Select the name of the `Table Group` that is generated when you create an internal table in the Hologres data source. For more information, see Manage table groups.
Storage Policy	Hologres supports two storage policies: Standard storage (Hot Storage) and Infrequent Access storage (Cold Storage). Hot storage, also known as all-SSD hot storage, is the default storage policy in Hologres. It meets the requirements for low-latency, high-performance data access. For most scenarios, Standard storage is the most effective and cost-efficient choice. Cold storage, also known as all-HDD cold storage, meets the requirement for low-cost storage of infrequently accessed data. It is suitable for very large datasets that are not sensitive to latency or are not frequently accessed. For more information, see Tiered storage.
Table Data Lifecycle	Set a custom maximum time to live for the dynamic table.
Binlog	Choose to enable (replica) or disable (none) subscription to Hologres binary logs. By default, this feature is disabled. For more information, see Subscribe to Hologres binary logs.
Binlog Lifecycle	The maximum lifecycle of Hologres binary logs. You can configure this parameter only if you set the Binlog parameter to replica. For more information, see Subscribe to Hologres binary logs.
Field Properties	Set the field properties. This includes selecting the Distribution Column, Segment Column, Clustering Column, Bitmap Column, and Dictionary Encoding Column for a Field Name. Configure the properties based on the descriptions on the page. For more information, see Manage internal tables.

Publish the dynamic table.
On the right side of the page, click Refresh Policy. On the data refresh policy page, configure the data refresh policy and select your Schedule Resource Group under Dependency Configurations. To publish the new Hologres dynamic table, click Publish at the top of the page.
Note
When you create a partitioned dynamic table, the View Partition Example dialog box opens. Select Day Partition or Hour Partition according to your data refresh policy, and then click Publish.

View a dynamic table

After you publish a Hologres dynamic table, go to the Data Catalog area. Find the destination Hologres instance under the Hologres type, expand the directories, and click Dynamic Table. In the list that appears, find the dynamic table you created and click its name to view the details.

You can view the details.
On the Details tab, you can view the Table Fields and Partition Fields of the dynamic table.
You can view the basic information.
On the Basic Information tab, you can view the Basic Properties, Data Refresh Logic-SQL, Data Refresh Policy, and Advanced Properties of the dynamic table.
You can view DDL information.
On the DDL tab, you can view and copy the DDL of the dynamic table.
View the output.
On the Output Information tab, you can view the output data from the dynamic table.
Note
- If you create a non-partitioned dynamic table, you can view the output information of the table.
- If you create a partitioned dynamic table, the primary table is a logical table and does not have output information. Click the icon to the left of the partitioned dynamic table to expand the child partitioned tables and view their output information.

Delete a dynamic table

To delete a Hologres dynamic table, find the target Hologres instance in the Data Catalog area. Expand the directories. Right-click the target dynamic table under Dynamic Table and select Delete. In the confirmation dialog box, click Confirm.

Important

Deleted tables cannot be restored. Proceed with caution.

Before deleting a Hologres dynamic table, DataWorks first attempts to delete the associated scheduling task. If the task has downstream dependencies, the deletion fails. You must go to the Operation Center, manually detach the downstream dependencies, and then delete the table.

Configure a data refresh policy

The parameters for the data refresh policy differ for non-partitioned and partitioned dynamic tables. Configure the policy as described in the following sections.

Configure a data refresh policy for a non-partitioned dynamic table

Parameter		Description
Refresh Mode		The refresh mode. Valid values are Full Refresh (Full) and Incremental Refresh (Incremental).
Refresh Scheduling Mode		The refresh scheduling mode. Valid values are Hologres Auto-refresh and DataWorks Recurring Schedule.
Hologres Auto-refresh	Automatically Refresh Data	Specifies whether to automatically refresh data.
DataWorks Recurring Schedule	Scheduling Cycle	Set the scheduling cycle to Day or Hour as needed. If you set the scheduling cycle to Day, set the specific Scheduling Time. If you set the Scheduling Cycle to Hour, you can configure the Start Time, Interval, and End Time parameters in the Specify Hour-based Interval section. If you want to perform the scheduling refresh at a specified hour, configure the Specify Hour and Specify Minute parameters in the Specify Hour section.

	Cron Expression	The cron expression for your custom scheduling cycle.
	Rerun Property	Set the rerun property. Valid values are Rerun Allowed For Both Success And Failure, Rerun Allowed For Failure Only, and Rerun Disallowed For Both Success And Failure.

Configure a data refresh policy for a partitioned dynamic table

Refresh policy lifecycle	Parameter	Description
Pre-create Partitions	Partition Pre-creation Method	The default value is Create Partitions By DataWorks Scheduling. The creation is triggered by DataWorks dependency scheduling.
	Number Of Partitions To Pre-create	The number of child partitioned tables to create in advance. If a task is scheduled by day, the partitions for the next day are created on the current day. If a task is scheduled by hour, the partitions for the next hour are created in the current hour.
	Partition Unit	If a task is scheduled by Day, a daily partition is generated each day. The child table name is in the `tableName_{yyyymmdd}` format. If a task is scheduled by Hour, an hourly partition is generated each hour. The child table name is in the `tableName_{yyyymmddHH}` format.
	Partition Pre-creation Date	The date on which to create the partitioned tables. The child partitioned tables are created at the specified pre-creation time on the specified date.
	Partition Pre-creation Time	The time at which to create the partitioned tables. The child partitioned tables are created at the specified time.
Start Partition Refresh	Data Refresh Mode For Pre-created Partitions	First, precompile the SQL to get the available refresh mode settings. Full Refresh (Full) performs a full data refresh. Incremental Refresh (Incremental) refreshes new data at a minute-level frequency.
	Refresh Scheduling Mode	The default value is Hologres Auto-refresh.
	Automatically Refresh Data	Specifies whether to enable automatic data refresh. To enable automatic data refresh, configure the Data Refresh Start Time, Data Refresh Interval, Hologres Computing Resource, and Hologres Computing Resource Specifications parameters as prompted.
End Partition Refresh	When Partition Refresh Task Ends, Switch To	Defaults to Full Refresh (Full).
	End Partition Refresh Scheduling Mode	DataWorks schedules the end of the refresh.
	End Time	Based on the completion time of the dynamic table creation, the partition refresh task is switched to full refresh and then stopped after the specified delay.
Refresh Partition Data	Refresh Data Of Partitions Whose Refresh Has Ended	Specifies whether to perform a one-time full refresh on a partition after its refresh task ends. This action fully refreshes the data in that partition. Note We recommend that you start refreshing partition data only when the input data has changed. To refresh the data of partitions whose refresh tasks have ended, configure the Hologres Computing Resource, Hologres Computing Resource Specifications, Scheduling Cycle, Scheduling Time, and Rerun Property parameters as prompted.