Create high-performance internal tables in Hologres - Hologres

A table is the basic unit for storing and organizing data in Hologres. You can configure the storage format, table indexes, and properties to enable efficient real-time data processing and large-scale data analytics.

Quick start

Preparations

Purchase a Hologres instance.
You have created a database.
Connect to the target Hologres database. We recommend that you connect to Hologres using HoloWeb and run queries.

Create a table

We recommend that you use the CREATE TABLE WITH syntax, which is available in Hologres V2.1 and later. The following example creates a transaction details table that follows hierarchical naming conventions, includes various fields, and has comprehensive comments.

BEGIN;

-- Create a transaction details fact table (standard non-partitioned table)
-- Explicitly specify the public schema and use hierarchical naming (dwd_xxx) in the table name
CREATE TABLE IF NOT EXISTS public.dwd_trade_orders (
    order_id BIGINT NOT NULL,           -- Unique order ID
    shop_id INT NOT NULL,              -- Shop ID
    user_id TEXT NOT NULL,             -- User ID
    order_amount NUMERIC(12, 2) DEFAULT 0.00,       -- Order amount
    payment NUMERIC(12, 2) DEFAULT 0.00,         -- Amount paid
    payment_type INT DEFAULT 0,                     -- Payment type (0: unpaid, 1: Alipay, 2: WeChat, 3: credit card)
    is_delivered BOOLEAN DEFAULT false,             -- Whether shipped
    dt TEXT NOT NULL,                               -- Data timestamp
    order_time TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP, -- Order time
    PRIMARY KEY (order_id)                          -- Set primary key to ensure data uniqueness
)
WITH (
    orientation = 'column',                         -- Use column store: ideal for real-time analytics (OLAP) on massive datasets
    distribution_key = 'order_id',                  -- Set distribution key: data is sharded by order_id
    clustering_key = 'order_time:asc',              -- Set clustering key: data is sorted in ascending order by time within files
    event_time_column = 'order_time',               -- Set segment key: enables file-level pruning to quickly filter time ranges
    bitmap_columns = 'shop_id,payment_type,is_delivered', -- Set bitmap index: accelerates equality filters on low-cardinality columns
    dictionary_encoding_columns = 'user_id:auto'    -- Set dictionary encoding: speeds up aggregation and filtering on string columns
);

-- Add metadata comments
COMMENT ON TABLE public.dwd_trade_orders IS 'Base fact table for transaction order details';
COMMENT ON COLUMN public.dwd_trade_orders.order_id IS 'Unique order ID';
COMMENT ON COLUMN public.dwd_trade_orders.shop_id IS 'Unique shop ID';
COMMENT ON COLUMN public.dwd_trade_orders.user_id IS 'Purchaser user ID';
COMMENT ON COLUMN public.dwd_trade_orders.dt IS 'Data timestamp, format YYYYMMDD';
COMMENT ON COLUMN public.dwd_trade_orders.order_time IS 'Exact timestamp when the order was created';

COMMIT;

View table structure

You can run the following command to view the Data Definition Language (DDL) statement of the table:

SELECT hg_dump_script('public.dwd_trade_orders');

Insert data

Hologres supports standard Data Manipulation Language (DML) syntax. The following example inserts 10 rows of simulated production data into the table.

INSERT INTO public.dwd_trade_orders (order_id, shop_id, user_id, order_amount, payment, payment_type, is_delivered, dt, order_time) VALUES 
(50001, 101, 'U678', 299.00, 280.00, 1, true,  '20231101', '2023-11-01 10:00:01+08'),
(50002, 102, 'U992', 59.00,  59.00,  2, false, '20231101', '2023-11-01 10:05:12+08'),
(50003, 101, 'U441', 150.00, 145.00, 1, true,  '20231101', '2023-11-01 10:10:45+08'),
(50004, 105, 'U219', 888.00, 888.00, 3, true,  '20231101', '2023-11-01 10:20:11+08'),
(50005, 102, 'U883', 35.00,  30.00,  1, false, '20231101', '2023-11-01 10:32:00+08'),
(50006, 110, 'U007', 120.50, 120.50, 2, true,  '20231101', '2023-11-01 10:45:33+08'),
(50007, 101, 'U321', 210.00, 210.00, 1, true,  '20231101', '2023-11-01 11:02:19+08'),
(50008, 108, 'U556', 45.00,  45.00,  2, false, '20231101', '2023-11-01 11:15:04+08'),
(50009, 101, 'U112', 300.00, 290.00, 3, true,  '20231101', '2023-11-01 11:25:55+08'),
(50010, 105, 'U449', 99.90,  99.90,  1, true,  '20231101', '2023-11-01 11:40:22+08');

Query data

-- Calculate total payment per shop and sort by amount in descending order
SELECT 
    shop_id, 
    COUNT(1) as total_orders, 
    SUM(payment) as total_payment 
FROM public.dwd_trade_orders 
GROUP BY shop_id 
ORDER BY total_payment DESC;

The query returns the following result:

shop_id	total_orders	total_payment
105	2	987.90
101	4	925.00
110	1	120.50
102	1	59.00
108	1	45.00

Syntax reference

Table creation syntax

Hologres provides two syntaxes for setting table properties and adding comments:

Standard syntax (recommended for V2.1 and later)

You can use the WITH clause to define properties. This syntax is more compact and provides better performance. You can add comments after the CREATE TABLE statement. For a complete example, see Create a table.

BEGIN;

CREATE TABLE [ IF NOT EXISTS] [schema_name.]table_name ([
    { 
    column_name column_type [column_constraints, [...]]
    | table_constraints
    [,...]
    }
])
[WITH (
    property = 'value',
    [, ...]
)]
;
[COMMENT ON COLUMN <[schema_name.]tablename.column> IS '<value>';]
[COMMENT ON TABLE <[schema_name.]tablename> IS '<value>';]

COMMIT;

Compatibility syntax (supported in all versions)

You can use the CALL statement with set_table_property to set properties and the COMMENT statement to add comments. These statements must be in the same transaction block (BEGIN...COMMIT) as the CREATE TABLE statement.

BEGIN;

CREATE TABLE [ IF NOT EXISTS] [schema_name.]table_name ([
    { 
    column_name column_type [column_constraints, [...]]
    | table_constraints
    [,...]
    }
]);

CALL set_table_property('[schema_name.]<table_name>', '<property>', '<value>');
COMMENT ON COLUMN <[schema_name.]tablename.column> IS '<value>';
COMMENT ON TABLE <[schema_name.]tablename> IS '<value>';

COMMIT;

Table properties

Parameter	Description	Column-oriented table	Row-oriented table	Hybrid table	Recommended value	Modifiable after table creation?
orientation	Specifies the storage format of the table. For more information, see Table storage formats: column store, row store, and hybrid.	`column` (default) See Example: Real-time analytics on massive data (fact table).	`row` See Example: High-concurrency point lookup by primary key (dimension table).	`row,column` See Example: Mixed workload (hybrid storage).	column	No, you cannot. You must recreate the table to make modifications.
distribution_key	Specifies the distribution key, which determines the data distribution strategy. For more information, see Distribution key.	The primary key is used by default. You can change the key based on your business scenario.	The primary key is used by default.	The primary key is used by default.	A subset of the primary key. We recommend that you use only one column.
clustering_key	Specifies the clustering key, which improves query performance. For more information, see Clustering key.	Empty by default.	The primary key is used by default.	Empty by default.	We recommend that you use a maximum of one column in ascending order.
event_time_column	Specifies the segment key, which improves query performance. For more information, see Event time column (segment key).	The first non-null timestamp column is used by default.	Not supported.	The first non-null timestamp column is used by default.	We recommend that you use a timestamp column.
table_group	A logical storage concept that is used to manage the shard count. For more information, see Table group and shard count guide.	The `default table group` is used by default.			Use the default value.	No. To change the table group, you must recreate the table or perform resharding.
bitmap_columns	Specifies the bitmap index. For more information, see Bitmap index.	Use as needed.	Not supported.	Use as needed.	This is recommended for columns that are used in equality comparisons, with a general limit of 10 or fewer columns.	Yes. For more information, see ALTER TABLE.
dictionary_encoding_columns	Dictionary encoding.	Use as needed.	Not supported.	Use as needed.	We recommend using a maximum of 10 low-cardinality columns.	Yes. For more information, see ALTER TABLE.
time_to_live_in_seconds	Specifies the time-to-live (TTL) of data in the table, in seconds. The TTL is not precise. Do not use TTL to manage the data lifecycle in production environments. We recommend that you use partitioned tables. For more information, see CREATE PARTITION TABLE.	Use as needed.			Use the default value. You do not need to set this parameter.	Yes. For more information, see ALTER TABLE.

dictionary_encoding_columns

This property specifies the columns for which to enable dictionary encoding. Dictionary encoding maps column values to a dictionary and converts string comparisons into numeric comparisons, which accelerates GROUP BY and filter operations. By default, dictionary encoding is enabled for all TEXT columns in column-oriented tables. In Hologres V0.9 and later, dictionary encoding is automatically applied based on data characteristics.

CALL set_table_property('table_name', 'dictionary_encoding_columns', '[columnName{:[on|off|auto]}[,...]]');

time_to_live_in_seconds

This property specifies the time-to-live (TTL) value of data in the table. Unit: seconds. The syntax is as follows:

The TTL is calculated from the time when data is written, not the time when data is updated. If you do not set this property, the default TTL is 100 years. In Hologres V1.3.24 and later, the minimum TTL is one day (86,400 seconds). For more information about the usage, see SQL command reference.
The TTL is not precise. Expired data is deleted within a time range, not at a specific point in time. This may cause issues such as primary key duplication or inconsistent query results.

CALL set_table_property('table_name', 'time_to_live_in_seconds', '<non_negative_literal>');

Scenario examples

Example: Partitioned table by data timestamp

Scenario description: As your business grows, the daily order volume can reach tens or even hundreds of millions. If you use the single-table structure from the Quick start section (dwd_trade_orders), the system load may be high during large-scale data deletion, such as cleaning historical data, or during date-specific queries. This example shows how to upgrade the standard table to a partitioned table. The partitioned table inherits all field definitions and index settings from the base table but uses the PARTITION BY clause to physically isolate data by day. This lets you clean expired partitions in seconds using the DROP TABLE statement and enables precise partition pruning during queries. This is the recommended approach for managing ultra-large-scale real-time data in production environments.

BEGIN;

-- Upgraded version: Create parent partitioned table
-- Structure fully inherits from the quick start example, but the primary key must include the partition key dt
CREATE TABLE IF NOT EXISTS public.dwd_trade_orders_partitioned (
    order_id BIGINT NOT NULL,
    shop_id INT NOT NULL,
    user_id TEXT NOT NULL,
    order_amount NUMERIC(12, 2) DEFAULT 0.00,
    payment NUMERIC(12, 2) DEFAULT 0.00,
    payment_type INT DEFAULT 0,
    is_delivered BOOLEAN DEFAULT false,
    dt TEXT NOT NULL,                               -- Partition key
    order_time TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (order_id, dt)                      -- Composite primary key: includes business primary key and partition key
) 
PARTITION BY LIST (dt)                              -- Enable list partitioning
WITH (
    orientation = 'column',
    distribution_key = 'order_id',
    event_time_column = 'order_time',
    clustering_key = 'order_time:asc'
);

COMMIT;

-- 1. Create child table: Allocate physical storage space for a specific date
CREATE TABLE IF NOT EXISTS public.dwd_trade_orders_20231101 
PARTITION OF public.dwd_trade_orders_partitioned FOR VALUES IN ('20231101');

-- 2. Insert data: Application logic remains the same as the base table. Data is automatically routed to the 20231101 child table
INSERT INTO public.dwd_trade_orders_partitioned (order_id, shop_id, user_id, order_amount, payment, payment_type, is_delivered, dt, order_time) VALUES 
(50001, 101, 'U678', 299.00, 280.00, 1, true,  '20231101', '2023-11-01 10:00:01+08'),
(50002, 102, 'U992', 59.00,  59.00,  2, false, '20231101', '2023-11-01 10:05:12+08'),
(50003, 101, 'U441', 150.00, 145.00, 1, true,  '20231101', '2023-11-01 10:10:45+08'),
(50004, 105, 'U219', 888.00, 888.00, 3, true,  '20231101', '2023-11-01 10:20:11+08'),
(50005, 102, 'U883', 35.00,  30.00,  1, false, '20231101', '2023-11-01 10:32:00+08'),
(50006, 110, 'U007', 120.50, 120.50, 2, true,  '20231101', '2023-11-01 10:45:33+08'),
(50007, 101, 'U321', 210.00, 210.00, 1, true,  '20231101', '2023-11-01 11:02:19+08'),
(50008, 108, 'U556', 45.00,  45.00,  2, false, '20231101', '2023-11-01 11:15:04+08'),
(50009, 101, 'U112', 300.00, 290.00, 3, true,  '20231101', '2023-11-01 11:25:55+08'),
(50010, 105, 'U449', 99.90,  99.90,  1, true,  '20231101', '2023-11-01 11:40:22+08');

-- 3. Partition pruning: When querying with a dt condition, the system scans only the relevant child table, avoiding a full table scan
SELECT COUNT(*) FROM public.dwd_trade_orders_partitioned WHERE dt = '20231101';

-- 4. Fast cleanup: To delete expired data, drop the child table directly for instant space reclamation
-- DROP TABLE public.dwd_trade_orders_20231101;

Example: Real-time analytics on massive data (fact table)

Scenario description: This type of table is used for real-time dashboard aggregation. These tables typically hold massive amounts of data. The core requirement is to achieve high-performance statistical aggregation, such as calculating Gross Merchandise Volume (GMV) or order volume.

BEGIN;

-- Explicitly specify the public schema
CREATE TABLE IF NOT EXISTS public.dwd_order_summary (
    order_id BIGINT PRIMARY KEY,
    category_id INT NOT NULL,
    gmv NUMERIC(15, 2),
    order_time TIMESTAMPTZ NOT NULL
) WITH (
    orientation = 'column',             -- Storage format: column store. Best for massive aggregation analytics, offering high compression and efficient column scans
    distribution_key = 'order_id',      -- Distribution key: shards data by order_id to ensure even distribution; enables Local Join when joining other order tables
    event_time_column = 'order_time',   -- Segment key: usually set to a time field. The system segments files by this field to accelerate time-range filtering
    clustering_key = 'order_time:asc'   -- Clustering key: physically sorts data in ascending time order. Reduces disk I/O when querying data from "the last hour" or "a specific day"
);

-- Add metadata comments to support future data governance
COMMENT ON TABLE public.dwd_order_summary IS 'Order summary fact table';
COMMENT ON COLUMN public.dwd_order_summary.order_id IS 'Unique order ID';
COMMENT ON COLUMN public.dwd_order_summary.category_id IS 'Category ID';
COMMENT ON COLUMN public.dwd_order_summary.gmv IS 'Gross merchandise value';
COMMENT ON COLUMN public.dwd_order_summary.order_time IS 'Order time';

COMMIT;

Example: High-concurrency point lookup by primary key (dimension table)

Scenario description: This type of table is used to retrieve user personas by user_id in milliseconds. These tables are designed to provide ultra-low latency responses under high queries per second (QPS).

BEGIN;

-- Explicitly specify the public schema
CREATE TABLE IF NOT EXISTS public.dim_user_persona (
    user_id TEXT PRIMARY KEY,           -- Primary key: essential for point lookup scenarios
    user_level INT,
    persona_jsonb JSONB 
) WITH (
    orientation = 'row'                 -- Storage format: row store. Optimized specifically for primary key point lookups (Point Lookup), with millisecond response times
    -- Note: Row store tables automatically use the primary key as the distribution key and clustering key. No additional settings are needed.
);

-- Add metadata comments
COMMENT ON TABLE public.dim_user_persona IS 'User persona dimension table';
COMMENT ON COLUMN public.dim_user_persona.user_id IS 'Unique user ID';
COMMENT ON COLUMN public.dim_user_persona.user_level IS 'User level';
COMMENT ON COLUMN public.dim_user_persona.persona_jsonb IS 'User profile features in JSON format';

COMMIT;

Example: Mixed workload (hybrid storage)

Scenario description: This type of table is used in logistics after-sales systems. These systems require both analytical aggregation of logistics status and instant retrieval of logistics details by order ID.

BEGIN;

-- Explicitly specify the public schema
CREATE TABLE IF NOT EXISTS public.ads_shipping_info (
    order_id BIGINT PRIMARY KEY,
    shipping_status INT,
    receiver_address TEXT,
    update_time TIMESTAMPTZ
) WITH (
    orientation = 'row,column',         -- Storage format: hybrid. Combines millisecond point lookups (row store) and efficient aggregation analytics (column store)
    distribution_key = 'order_id',      -- Distribution key: determines data distribution logic across shards
    bitmap_columns = 'shipping_status'  -- Bitmap index: set on low-cardinality columns like status fields to greatly accelerate "status filter" queries
);

-- Add metadata comments
COMMENT ON TABLE public.ads_shipping_info IS 'Logistics status query application table';
COMMENT ON COLUMN public.ads_shipping_info.order_id IS 'Order ID';
COMMENT ON COLUMN public.ads_shipping_info.shipping_status IS 'Logistics status (1: pending shipment, 2: in transit, 3: delivered)';
COMMENT ON COLUMN public.ads_shipping_info.receiver_address IS 'Delivery address';

COMMIT;

Limits

Primary key limits

Composite primary key: You can set multiple fields as the primary key. All fields must be not nullable and must be defined in a single statement.

BEGIN;
CREATE TABLE public.test (
 "id" text NOT NULL,
 "ds" text NOT NULL,
PRIMARY KEY (id,ds)
);
CALL SET_TABLE_PROPERTY('public.test', 'orientation', 'column');
COMMIT;

Type limits: Complex data types such as Float, Double, Numeric, Array, JSON, and Date are not supported.
Modification limits: You cannot modify the primary key. To modify the primary key, you must recreate the table.
Storage dependency: Row-oriented tables and hybrid row-column tables must have a primary key. Column-oriented tables do not require a primary key.

Constraint support

Parameter	column_constraints	table_constraints
primary key	Supported	Support
not null	Support	-
null	Supported	-
unique	Not supported	Not supported
check	Not supported	Not supported
default	Supported	Not supported

Keyword, case sensitivity, and escaping limits

Naming rules: Column names cannot start with hg_. Schema names cannot start with holo_, hg_, or pg_.
Escaping requirements: Keywords, reserved words, system fields (such as ctid), case-sensitive names, special characters, and names starting with digits must be enclosed in double quotes ("").

Syntax improvements in V2.0:

Enable the legacy syntax:

-- Enable legacy syntax at session level
set hg_disable_parse_holo_property = on;
-- Enable legacy syntax at database level
alter database <db_name> set hg_disable_parse_holo_property = on;

Example of setting properties for escaped columns:

create table "TBL" (a int);
select relname from pg_class where relname = 'TBL';
insert into "TBL" values (-1977);
select * from "TBL";
------------------------------------------------------------------
-- Syntax for setting table properties on escaped columns starting from Hologres V2.0
begin;
create table tbl (c1 int not null);
call set_table_property('tbl', 'clustering_key', '"c1":asc'); 
commit;
-- Syntax for setting table properties on escaped columns before Hologres V2.0
begin;
create table tbl (c1 int not null);
call set_table_property('tbl', 'clustering_key', '"c1:asc"'); 
commit;
------------------------------------------------------------------
-- Syntax for setting multi-column properties (including uppercase) starting from Hologres V2.1
begin;
create table tbl ("C1" int not null, c2 text not null) with (clustering_key = '"C1",c2');
commit;
-- Syntax for setting multi-column properties (including uppercase) starting from Hologres V2.0
begin;
create table tbl ("C1" int not null, c2 text not null);
call set_table_property('tbl', 'clustering_key', '"C1",c2'); 
commit;
-- Syntax for setting multi-column properties (including uppercase) before Hologres V2.0
begin;
create table tbl ("C1" int not null, c2 text not null);
call set_table_property('tbl', 'clustering_key', '"C1,c2"'); 
commit;
------------------------------------------------------------------
create table "Tab_$A%*" (a int);
select relname from pg_class where relname = 'Tab_$A%*';
insert into "Tab_$A%*" values (-1977);
select * from "Tab_$A%*";
------------------------------------------------------------------
create table tbl ("2c" int not null);
insert into tbl values (3), (4);
select "2c" from tbl;

Table creation logic limits

Configuration	Specify IF NOT EXISTS	Do not specify IF NOT EXISTS
A table with the same name exists	Displays a NOTICE, skips the operation, and returns a success status.	An ERROR message is returned.
A table with the same name does not exist	The statement is successfully executed.	Successful response

Modification limits

Length limit: Table names cannot exceed 127 bytes.
The following items cannot be modified:
- You cannot change data types.
- You cannot reorder columns.
- You cannot change nullability constraints. For example, you cannot switch between not null and nullable.
- Storage layout properties: After a table is created, you cannot change the orientation, distribution_key, clustering_key, or event_time_column properties.
Supported modifications: You can change the bitmap_columns and dictionary_encoding_columns properties after a table is created.