All Products
Search
Document Center

Hologres:CREATE DYNAMIC TABLE

Last Updated:Nov 29, 2024

This topic describes how to create a dynamic table.

Limits

For information about the limits of dynamic tables, see Application scope and limits of dynamic tables.

Syntax

Table creation syntax

Syntax for creating a dynamic table:

CREATE DYNAMIC TABLE [IF NOT EXISTS] <schema.tablename>(
[col_name],
[col_name]
  ) [PARTITION BY LIST (col_name)]
WITH (
    [refresh_mode='[full|incremental]',]
    [auto_refresh_enable='[true|false',]

  -- Parameters for incremental data refresh:
    [incremental_auto_refresh_schd_start_time='[immediate|<timestamptz>]',]
    [incremental_auto_refresh_interval='[<num> {minute|minutes|hour|hours]',]  
    [incremental_guc_hg_computing_resource='[ local | serverless]',]
    [incremental_guc_hg_experimental_serverless_computing_required_cores='<num>',]

   -- Parameters for full data refresh:
    [full_auto_refresh_schd_start_time='[immediate|<timestamptz>]',]
    [full_auto_refresh_interval='[<num> {minute|minutes|hour|hours]',] 
    [full_guc_hg_computing_resource='[ local | serverless]',]-- By default, the hg_full_refresh_computing_resource parameter is set to serverless. The parameter can be configured at the database level or left empty.
    [full_guc_hg_experimental_serverless_computing_required_cores='<num>',]
    
   -- Shared parameter. You can configure Grand Unified Configuration (GUC) parameters:
   [refresh_guc_<guc>='xxx]',] 
   
  -- Common properties of the dynamic table:
    [orientation = '[column | row | row,column]',]
    [table_group = '[tableGroupName]',]
    [distribution_key = 'columnName[,...]]',]
    [clustering_key = '[columnName{:asc]} [,...]]',]
    [event_time_column = '[columnName [,...]]',]
    [bitmap_columns = '[columnName [,...]]',]
    [dictionary_encoding_columns = '[columnName [,...]]',]
    [time_to_live_in_seconds = '<non_negative_literal>',]
    [storage_mode = '[hot | cold]']
    ) 
AS
<query> -- The definition of the query.

Parameters

Parameters related to refresh modes and resources

Parameter category

Parameter

Description

Required

Default value

Shared refresh parameters

refresh_mode

The refresh mode of data. Valid values: full and incremental.

If you leave this parameter empty, data is not refreshed.

No

No default value

auto_refresh_enable

Specifies whether to enable auto refresh. Valid values:

  • true: enables auto refresh.

  • false: disables auto refresh.

No

false

refresh_guc_<guc>

The GUC parameters that are configured for the refresh operation. For information about the supported GUC parameters, see GUC parameters.

No

No default value

Incremental data refresh

incremental_auto_refresh_schd_start_time

The start time of the incremental data refresh. Valid values:

  • immediate: The incremental data refresh is started immediately after a table is created. This is the default value.

  • <timestamptz>: the custom start time of the incremental data refresh. For example, if you set the parameter to 2024-08-24 1:00, a data refresh task starts at 01:00 on August 24, 2024.

No

immediate

incremental_auto_refresh_interval

The interval at which incremental data is refreshed. Supported units: minute, minutes, hour, and hours.

  • Valid values: [1 minute, 48 hours].

  • If you leave this parameter empty, the refresh operation is performed only once at the start time of the refresh operation.

No

No default value

incremental_guc_hg_computing_resource

The computing resources for the incremental data refresh. Valid values:

  • local: resources of the current instance.

  • serverless: serverless computing resources. You must check whether the instance supports serverless computing resources. For more information, see User guide on Serverless Computing.

Note

You can execute the ALTER DATABASE xxx SET incremental_guc_hg_computing_resource=xx statement to specify the computing resources for the incremental data refresh at the database level.

No

serverless

incremental_guc_hg_experimental_serverless_computing_required_cores

The number of serverless computing resources required for the incremental data refresh.

Note

The serverless computing resources available for instances of different specifications are limited. For more information, see User guide on Serverless Computing.

No

No default value

Full data refresh

full_auto_refresh_schd_start_time

The start time of the full data refresh. Valid values:

  • immediate: The full data refresh is started immediately after a table is created. This is the default value.

  • <timestamptz>: the custom start time of the full data refresh. For example, if you set the parameter to 2024-08-24 1:00, a data refresh task starts at 01:00 on August 24, 2024.

No

immediate

full_auto_refresh_interval

The interval at which full data is refreshed. Supported units: minute, minutes, hour, and hours.

  • Valid values: [1 minute, 48 hours].

  • If you leave this parameter empty, the refresh operation is performed only once at the start time of the refresh operation.

No

No default value

full_guc_hg_computing_resource

The computing resources for the full data refresh. Valid values:

  • local: resources of the current instance.

  • serverless: serverless computing resources. You must check whether the instance supports serverless computing resources. For more information, see User guide on Serverless Computing.

Note

You can execute the ALTER DATABASE xxx SET full_guc_hg_computing_resource=xx statement to specify the computing resources for the full data refresh at the database level.

No

serverless

full_guc_hg_experimental_serverless_computing_required_cores

The number of serverless computing resources required for the refresh.

Note

The serverless computing resources available for instances of different specifications are limited. For more information, see User guide on Serverless Computing.

No

No default value

Parameters for table properties

Parameter

Description

Required

Default value

full

incremental

col_name

The name of a field in the dynamic table.

You can explicitly specify the names of columns in a dynamic table. However, you cannot specify the properties and data types of the columns. The engine automatically infers the properties and data types of the columns.

Note

If you specify the properties and data types for columns in a dynamic table, the properties and data types of the columns inferred by the engine may become incorrect.

No

Name of the queried column

Name of the queried column

orientation

The storage orientation of the dynamic table. Valid values:

  • column: column-oriented storage

  • row: row-oriented storage

  • row,column: row-column hybrid storage

No

column

column

table_group

The table group to which the dynamic table belongs. By default, the default table group of the current database is used. For more information, see User guide of table groups and shard counts.

No

Name of the default table group

Name of the default table group

distribution_key

The distribution key of the dynamic table. For more information, see Distribution key.

No

No default value

No default value

clustering_key

The clustering key of the dynamic table. For more information, see Clustering key.

No

You can set the parameter to a custom value. A default inferred value is available.

You can set the parameter to a custom value. A default inferred value is available.

event_time_column

The event time columns of the dynamic table. For more information, see Event time column (segment key).

No

No default value

No default value

bitmap_columns

The bitmap columns of the dynamic table. For more information, see Bitmap index.

No

Fields of the TEXT type

Fields of the TEXT type

dictionary_encoding_columns

The dictionary encoding columns of the dynamic table. For more information, see Dictionary encoding.

No

Fields of the TEXT type

Fields of the TEXT type

time_to_live_in_seconds

The lifecycle of data in the dynamic table.

No

Permanent

Permanent

storage_mode

The storage mode of the dynamic table. Valid values:

  • hot

  • cold

Note

For more information about storage modes, see Tiered storage of hot data and cold data.

No

hot

hot

Query

A query can be generated for data in a dynamic table. The query types and base table types that are supported in a dynamic table vary based on the refresh mode of the dynamic table. For more information, see Application scope and limits of dynamic tables.

Full data refresh

If the refresh mode is full, all data is written to a dynamic table in a query. The full data refresh has the following strengths in contrast to the incremental data refresh:

  • More base table types are supported.

  • More query types and operator types are supported.

In full data refresh mode, more data is processed and more resources may be consumed. Therefore, this data refresh mode is suitable for scenarios such as periodic report check and periodic data backfill.

Note

For more information about the full data refresh mode, see the Full data refresh section in this topic.

Incremental data refresh

If the refresh mode is incremental, data changes in a base table can be identified and incremental data in the base table is written to a dynamic table in a query. In incremental data refresh mode, less data is processed and data processing timeliness is higher. In actual use cases, if you want to query data at the minute level, we recommend that you use the incremental data refresh mode. However, you must take note of the following items when you use this mode for a dynamic table:

  • You must enable binary logging for the base table of the dynamic table. However, if you want to perform a JOIN operation on a dimension table, you do not need to enable binary logging for the dimension table.

  • After binary logging is enabled for a base table, a specific amount of storage resources will be consumed. For information about the storage resources used by binary logs, see the Query the storage details of a table section of the "Query the storage sizes of tables and databases" topic.

  • After incremental data refresh is enabled, the system generates a state table in the backend to record the intermediate aggregate results. For information about the technical principles of a state table, see the Incremental refresh section of the "Overview" topic. The state table stores intermediate aggregate data, which consumes a specific amount of storage resources. For more information about storage resource usage, see Manage dynamic tables.

The incremental data refresh mode of Hologres supports incremental data refresh and full-and-incremental data refresh to meet different business requirements. For more information, see the following content.

Incremental data refresh

After you enable binary logging for an existing base table and create a dynamic table in incremental data refresh mode for the base table, the existing data in the base table is not synchronized to the dynamic table. Only the new data is synchronized to the dynamic table. In this case, data consistency issues may occur.

Full-and-incremental data refresh

A dynamic table in incremental data refresh mode also supports full-and-incremental data consumption. This indicates that the full data of a base table is consumed first, and then the incremental data of the base table is consumed. The full-and-incremental data refresh feature is controlled by a GUC parameter:

-- Enable full-and-incremental data consumption.
SET hg_experimental_enable_hybrid_incremental_mode = true;

Examples:

-- Full-and-incremental data consumption for a dynamic table in incremental data refresh mode
-- Create a base table for which binary logging is not enabled.
CREATE TABLE base_sales(
  day TEXT NOT NULL,
  hour INT,
  user_id BIGINT,
  ts TIMESTAMPTZ,
  amount FLOAT,
  pk text NOT NULL PRIMARY key
);

-- Import data to the base table.
INSERT INTO base_sales VALUES ('2024-08-29',1,222222,'2024-08-29 16:41:19.141528+08',5,'ddd');


-- Enable binary logging for the base table.
ALTER TABLE base_sales SET (binlog_level = replica);

-- Import incremental data to the base table.
INSERT INTO base_sales VALUES ('2024-08-29',2,3333,'2024-08-29 17:44:19.141528+08',100,'aaaaa');


-- Create a dynamic table in incremental data refresh mode.
CREATE DYNAMIC TABLE sales_incremental
  WITH (refresh_mode='incremental') 
AS 
  SELECT day, hour, SUM(amount), COUNT(1) 
    FROM base_sales 
  GROUP BY day, hour;

-- Configure a GUC parameter to enable full-and-incremental data consumption and refresh the dynamic table.
SET hg_experimental_enable_hybrid_incremental_mode = true;
REFRESH TABLE sales_incremental;

Compare data consistency:

  • Query the base table

    SELECT day, hour, SUM(amount), COUNT(1) 
        FROM base_sales 
      GROUP BY day, hour;

    Returned result:

    day	    hour	sum	count
    2024-08-29	1	5	1
    2024-08-29	2	100	1
  • Query the dynamic table

    SELECT * FROM sales_incremental;

    Returned result:

    day	    hour	sum	count
    2024-08-29	1	5	1
    2024-08-29	2	100	1

Examples

Example 1: Create a dynamic table in full data refresh mode and enable auto refresh for the dynamic table

Before you perform the following operations, refer to Import public datasets with a few clicks to import data from the public dataset tpch_10g to Hologres.

-- Create a dynamic table in full data refresh mode, enable immediate refresh for the dynamic table, and set the refresh interval to 1 hour. 
CREATE DYNAMIC TABLE test.thch_q1_full
  WITH (
    refresh_mode='full',
    auto_refresh_enable='true',
    full_auto_refresh_interval='1 hours',
    full_guc_hg_computing_resource='serverless',
    full_guc_hg_experimental_serverless_computing_required_cores='32'
       ) 
AS
  SELECT
        l_returnflag,
        l_linestatus,
        SUM(l_quantity) AS sum_qty,
        SUM(l_extendedprice) AS sum_base_price,
        SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
        SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
        AVG(l_quantity) AS avg_qty,
        AVG(l_extendedprice) AS avg_price,
        AVG(l_discount) AS avg_disc,
        COUNT(*) AS count_order
FROM
        hologres_dataset_tpch_100g.lineitem
WHERE
        l_shipdate <= DATE '1998-12-01' - INTERVAL '120' DAY
GROUP BY
        l_returnflag,
        l_linestatus;

Example 2: Create a dynamic table in incremental data refresh mode and specify the refresh start time for the dynamic table

Before you perform the following operations, refer to Import public datasets with a few clicks to import data from the public dataset tpch_10g to Hologres.

The following example shows how to create a dynamic table in incremental data refresh mode.

Note

Before you create a dynamic table in incremental data refresh mode, you must enable binary logging for the base table of the dynamic table. You do not need to enable binary logging for a dimension table.

-- Enable binary logging for the base table.
BEGIN;
CALL set_table_property('hologres_dataset_tpch_100g.lineitem', 'binlog.level', 'replica');
COMMIT;

-- Create a dynamic table in incremental data refresh mode, specify the refresh start time, and set the refresh interval to 3 minutes.
CREATE DYNAMIC TABLE public.tpch_q1_incremental 
WITH (
refresh_mode='incremental',
auto_refresh_enable='true',
incremental_auto_refresh_schd_start_time='2024-09-15 23:50:0',
incremental_auto_refresh_interval='3 minutes',
incremental_guc_hg_computing_resource='serverless',
incremental_guc_hg_experimental_serverless_computing_required_cores='30'
) AS SELECT
        l_returnflag,
        l_linestatus,
        COUNT(*) AS count_order
FROM
        hologres_dataset_tpch_100g.lineitem
WHERE
        l_shipdate <= DATE '1998-12-01' - INTERVAL '120' DAY
GROUP BY
        l_returnflag,
        l_linestatus
;

Example 3: Create a dynamic table in full data refresh mode for which a JOIN operation is performed on multiple tables

-- In a query, create a dynamic table for which a JOIN operation is performed on multiple tables, select full as the refresh mode, and set the refresh interval to 3 hours. 
CREATE DYNAMIC TABLE test.dt_q_full
  WITH (
    refresh_mode='full',
    auto_refresh_enable='true',
    full_auto_refresh_schd_start_time='immediate'
    full_auto_refresh_interval='3 hours',
    full_guc_hg_computing_resource='serverless',
    full_guc_hg_experimental_serverless_computing_required_cores='64'
  ) 
AS 
SELECT
        o_orderpriority,
        COUNT(*) AS order_count
FROM
        public.orders
WHERE
        o_orderdate >= DATE '1996-07-01'
        AND o_orderdate < DATE '1996-07-01' + INTERVAL '3' MONTH
        AND EXISTS (
                SELECT
                        *
                FROM
                        public.lineitem
                WHERE
                        l_orderkey = o_orderkey
                        AND l_commitdate < l_receiptdate
        )
GROUP BY
        o_orderpriority;

Example 4: Create a dynamic table in incremental data refresh mode for which a JOIN operation is performed on a dimension table

The following example shows how to create a dynamic table in incremental data refresh mode for which a JOIN operation is performed on a dimension table.

Note

Before you create a dynamic table in incremental data refresh mode, you must enable binary logging for the base table of the dynamic table. You do not need to enable binary logging for a dimension table.

In the semantics of a JOIN operation performed on a dimension table, each data record in a base table is associated with the latest version of data in the dimension table. This indicates that the JOIN operation is performed at the time when data is processed. Therefore, if the data in the dimension table is added, updated, or deleted after the JOIN operation is performed, the associated data remains unchanged.

-- Enable binary logging for the base table of the dynamic table. You do not need to enable binary logging for a dimension table.
BEGIN;
CALL set_table_property('hm.sale_detail ', 'binlog.level', 'replica');
COMMIT;


CREATE DYNAMIC TABLE dt_sales_incremental
  WITH (
    refresh_mode='incremental',
    auto_refresh_enable='true',
    incremental_auto_refresh_schd_start_time='2024-09-15 00:00:00'
    incremental_auto_refresh_interval='5 minutes',
    incremental_guc_hg_computing_resource='serverless',
    incremental_guc_hg_experimental_serverless_computing_required_cores='128') 
AS 
SELECT 
    sale_detail.app_id,
    sale_detail.uid,
    product,
    SUM(sale_detail.gmv) AS sum_gmv,
    sale_detail.order_time,
    user_info.province,
    user_info.city 
FROM hm.sale_detail 
INNER JOIN hm.user_info  FOR SYSTEM_TIME AS OF PROCTIME()
ON sale_detail.uid =user_info.uid
GROUP BY sale_detail.app_id,sale_detail.uid,sale_detail.product,sale_detail.order_time,user_info.province,user_info.city;

Example 5: Create a partitioned dynamic table

  1. Prepare a base table and data.

    -- Create a partitioned source table.
    CREATE TABLE base_sales(
      uid INT,
      opreate_time TIMESTAMPTZ,
      amount FLOAT,
      pk text NOT NULL,
      ds text,
      PRIMARY key(ds)
    ) PARTITION BY LIST (ds) ;
    
    CREATE TABLE base_sales_20240616 PARTITION OF base_sales FOR VALUES IN ('20240616');
    
    INSERT INTO base_sales_20240616 VALUES (1,'2024-06-16 16:08:25.387466+08','2','1','20240616');
  2. Create a parent partitioned dynamic table. Configure the definition of the query for the parent partitioned table and do not configure the refresh mode.

    CREATE DYNAMIC TABLE test.partition_dt_base_sales
    PARTITION BY LIST (ds)
    AS
    SELECT  
     public.RB_BUILD_AGG(uid),
     opreate_time,
     amount,
     pk,
     ds,
    COUNT(1)
    FROM    base_sales
    GROUP BY opreate_time ,amount,pk,ds;
  3. Manually create a child partitioned table and configure the refresh mode for the child partitioned table.

    -- Create a child partitioned dynamic table and set the refresh mode to full for the table. After the table is created, the table is immediately refreshed. The refresh interval is 30 minutes and the resources of the current instance are used to refresh the table. 
    CREATE DYNAMIC TABLE test.partition_dt_base_sales_20240616 PARTITION OF partition_dt_base_sales FOR VALUES IN ('20240616')
      WITH (
        refresh_mode='full',
        auto_refresh_enable='true',
        full_auto_refresh_schd_start_time='immediate',
        full_auto_refresh_interval='30 minutes'
           );

Next step: Manage dynamic tables

After you create a dynamic table, you can perform the following operations on the table: