All Products
Search
Document Center

Hologres:Best practices for Hologres table properties

Last Updated:Mar 26, 2026

Choosing the right Hologres table properties for your query patterns reduces the amount of data scanned, the number of files accessed, and the number of I/O operations — resulting in lower latency and higher queries per second (QPS). This guide maps six common query scenarios to the storage format, primary key, distribution key, clustering key, partitioning, and bitmap index settings that best fit each one.

Choose a storage format

Hologres supports three storage formats: row-oriented storage, column-oriented storage, and row-column hybrid storage. For details on each format, see Set up data orientation.

Use the following decision tree to select a format. If your workload is not yet well-defined, start with row-column hybrid storage — it balances trade-offs across the widest range of query patterns.

image..png

Property quick reference

Each table property targets a specific layer of query execution. Understanding what each one does helps you choose the right combination for your scenario.

Property What it does Best for
Primary key Enables Fixed Plan for direct row lookup Point queries with ultra-high QPS
Distribution key Co-locates rows on the same shard by key value; reduces cross-shard shuffle Aggregate queries; JOIN queries
Clustering key Physically sorts rows by key within each file; reduces I/O for range or equality scans Prefix scans; single-value filter queries
event_time_column (segment key) Sorts data within each file by this column; reduces files scanned for range predicates Time-based filter queries
Partitioning Prunes entire partitions before scanning Large tables with time-based access patterns
Bitmap index Maps each distinct value to matching rows; narrows scans for single-value filters Single-value filter queries

Configure table properties by query scenario

Once you have a storage format, align the remaining table properties with your query patterns. Each scenario below describes the recommended configuration, explains the mechanism, and shows measured performance results.

The examples in this section use a 64 compute unit (CU) Hologres instance and the TPC-H 100 GB dataset. The dataset contains two tables: the Orders table (where o_orderkey uniquely identifies an order) and the Lineitem table (where l_orderkey and l_linenumber together identify a line item). These tests are based on the TPC-H benchmarking methodology but do not meet all TPC-H requirements; results cannot be compared with published TPC-H benchmark results. If your table serves multiple query patterns with different filter or JOIN fields, configure properties based on your most frequent or most performance-sensitive query.

Scenario 1: Point queries with ultra-high QPS

Query pattern

You want to perform tens of thousands of point queries per second on the Orders table. You can specify the o_orderkey field to locate a row of data.

SELECT * FROM orders WHERE o_orderkey = ?;

Configuration

Set the filter field as the primary key. Hologres generates a Fixed Plan for primary-key lookups, bypassing the general query planning path and significantly reducing execution overhead. For details, see Accelerate SQL execution with Fixed Plan.

How it works

The Fixed Plan skips query optimization steps that are unnecessary for single-row lookups, routing requests directly to the target row. Without a primary key, the engine falls back to a full shard scan.

Performance results (500 concurrent queries, row-oriented or row-column hybrid table)

Configuration Avg QPS Avg latency
Primary key configured ~104,000 ~4 ms
No primary key ~16,000 ~30 ms

For the DDL statement, see Scenario 1 DDL.

Scenario 2: Prefix scans on a small dataset with high QPS

Query pattern

You want to perform queries with the following characteristics: tens of thousands of QPS need to be processed, queries are performed based on a field in the primary key, and the returned dataset contains several or dozens of data entries.

SELECT * FROM lineitem WHERE l_orderkey = ?;

This query retrieves all line items for one order — a prefix scan on a composite primary key.

Configuration

Apply all three of the following:

  1. Primary key ordering — Put the filter field first in the composite primary key. Use (l_orderkey, l_linenumber), not (l_linenumber, l_orderkey). This lets Hologres evaluate the equality predicate against the leading key without scanning unrelated rows.

  2. Distribution key — Set the filter field as the distribution key (l_orderkey). Data for each order is stored within a single shard, so the query touches exactly one shard instead of all shards.

  3. Clustering key (column-oriented or row-column hybrid storage only) — Set the filter field as the clustering key (l_orderkey). Within each file, rows are sorted by this key, reducing I/O when scanning a contiguous range.

With these properties in place, enable the Fixed Plan for prefix scans by setting the Grand Unified Configuration (GUC) parameter hg_experimental_enable_fixed_dispatcher_for_scan to on. For details, see Accelerate SQL execution with Fixed Plan.

How it works

The combination of correct primary key ordering, a co-located distribution key, and a clustering key means that all matching rows are in one shard, physically adjacent in storage, and directly addressed by the Fixed Plan.

Performance results (row-oriented or row-column hybrid table)

Configuration Concurrent queries Avg QPS Avg latency
All three properties configured 500 ~37,000 ~13 ms
Properties not configured 1 ~60 ~16 ms

For the DDL statement, see Scenario 2 DDL.

Scenario 3: Queries with time-based filter conditions

Query pattern

-- Original query
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) AS sum_qty,
    sum(l_extendedprice) AS sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    avg(l_quantity) AS avg_qty,
    avg(l_extendedprice) AS avg_price,
    avg(l_discount) AS avg_disc,
    count(*) AS count_order
FROM
    lineitem
WHERE
    l_shipdate <= date '1998-12-01' - interval '120' day
GROUP BY
    l_returnflag,
    l_linestatus
ORDER BY
    l_returnflag,
    l_linestatus;

-- Modified query used in the performance test (narrowed time range to highlight the effect)
SELECT ... FROM lineitem
WHERE
    l_year='1992'               -- filters to a single partition
    AND l_shipdate <= date '1992-12-01'
...;

Configuration

  1. Partition by time — Add an l_year column and use it as the partition key to split the table by year. Determine whether to partition your table or only configure the event_time_column property based on your data volume and business requirements. Check the limits and usage notes in CREATE PARTITION TABLE before partitioning.

  2. `event_time_column` — Set l_shipdate as the event_time_column (segment key). This ensures that data entries in files in the shard are sorted in order based on the event_time_column property, which reduces the number of files to be scanned. For details, see Event Time Column (Segment Key).

How it works

Partitioning prunes entire partitions before the query reads any file. The event_time_column then prunes individual files within the remaining partition by sorting data within each file by the configured column.

Performance results (column-oriented table)

Configuration Files scanned
Partition + event_time_column configured as suggested 80 files in 1 partition
No partition; event_time_column set to a different field 320 files
Run EXPLAIN ANALYZE to inspect actual scan counts. The Partitions selected parameter shows how many partitions were accessed; the dop parameter shows how many files were scanned.

For the DDL statement, see Scenario 3 DDL.

Scenario 4: Queries with non-time single-value filter conditions

Query pattern

SELECT
    ...
FROM
    lineitem
WHERE
    l_shipmode IN ('FOB', 'AIR');

Configuration

  1. Clustering key — Set l_shipmode as the clustering key. Rows with the same l_shipmode value are stored consecutively in each file, so the engine reads a smaller, contiguous block instead of scattered rows across the file.

  2. Bitmap index — Configure l_shipmode as a bitmap column. The bitmap index maps each distinct value to the rows that contain it, letting the engine jump directly to matching rows without reading unrelated data.

How it works

The clustering key reduces I/O by grouping matching rows physically. The bitmap index then narrows the scan further by filtering at the block level before any row data is read.

Performance results (column-oriented table)

Configuration Rows read Query duration
Clustering key + bitmap index configured 170 million 0.71 s
Neither configured 600 million (full table) 2.41 s
Check the number of rows read via the read_rows parameter in slow query logs. See Get and analyze slow query logs. To confirm that the bitmap index is being applied, look for the Bitmap Filter keyword in the execution plan.

For the DDL statement, see Scenario 4 DDL.

Scenario 5: Aggregate queries on a single field

Query pattern

SELECT
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount))
FROM
    lineitem
GROUP BY
    l_suppkey;

Configuration

Set the GROUP BY field (l_suppkey) as the distribution key. When data is already distributed by the aggregation key, each shard computes its local aggregate independently — no data needs to move between shards.

How it works

Without a matching distribution key, the engine shuffles all rows to a set of reducer shards before aggregating. With a matching distribution key, this prevents a large amount of data from being shuffled across shards.

Performance results (column-oriented table)

Configuration Data shuffled Execution duration
l_suppkey as distribution key ~0.21 GB 2.30 s
Different field as distribution key ~8.16 GB 3.68 s
Check the volume of shuffled data via the shuffle_bytes parameter in slow query logs. See Get and analyze slow query logs.

For the DDL statement, see Scenario 5 DDL.

Scenario 6: JOIN queries across multiple tables

Query pattern

SELECT
    o_orderpriority,
    count(*) AS order_count
FROM
    orders
WHERE
    o_orderdate >= date '1996-07-01'
    AND o_orderdate < date '1996-07-01' + interval '3' month
    AND EXISTS (
        SELECT
            *
        FROM
            lineitem
        WHERE
            l_orderkey = o_orderkey    -- join condition
            AND l_commitdate < l_receiptdate)
GROUP BY
    o_orderpriority
ORDER BY
    o_orderpriority;

Configuration

Set the JOIN fields as distribution keys on both tables:

  • Lineitem: l_orderkey as the distribution key

  • Orders: o_orderkey as the distribution key

When both tables are distributed on the same key, matching rows from each table are already co-located on the same shard. The JOIN executes locally without shuffling either table across the network.

How it works

When tables are distributed on different keys, a large amount of data must be shuffled across shards to execute the JOIN. Aligning distribution keys with JOIN fields prevents this large-scale data shuffle and turns a network-bound operation into a local one.

Performance results (both tables column-oriented)

Configuration Data shuffled Execution duration
JOIN fields as distribution keys on both tables ~0.45 GB 2.19 s
Distribution keys not aligned with JOIN fields ~6.31 GB 5.55 s
Check the volume of shuffled data via the shuffle_bytes parameter in slow query logs. See Get and analyze slow query logs.

For the DDL statements, see Scenario 6 DDL.

(Optional) Assign tables to table groups

If your Hologres instance has more than 256 cores and serves a variety of workloads, configure multiple table groups and assign each table to the appropriate group at creation time. This isolates resource usage across workloads. For details, see Best practices for table group setup.

Appendix: DDL statements

Scenario 1 DDL

-- Create the Orders table with a primary key.
-- To create the table without a primary key, remove PRIMARY KEY from the O_ORDERKEY column.
DROP TABLE IF EXISTS orders;
BEGIN;
CREATE TABLE orders(
    O_ORDERKEY       BIGINT         NOT NULL PRIMARY KEY
    ,O_CUSTKEY       INT            NOT NULL
    ,O_ORDERSTATUS   TEXT           NOT NULL
    ,O_TOTALPRICE    DECIMAL(15,2)  NOT NULL
    ,O_ORDERDATE     TIMESTAMPTZ    NOT NULL
    ,O_ORDERPRIORITY TEXT           NOT NULL
    ,O_CLERK         TEXT           NOT NULL
    ,O_SHIPPRIORITY  INT            NOT NULL
    ,O_COMMENT       TEXT           NOT NULL
);
CALL SET_TABLE_PROPERTY('orders', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('orders', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('orders', 'distribution_key', 'o_orderkey');
COMMIT;

Scenario 2 DDL

-- Create the Lineitem table with properties configured for prefix scan optimization.
DROP TABLE IF EXISTS lineitem;
BEGIN;
CREATE TABLE lineitem
(
    L_ORDERKEY      BIGINT      NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('lineitem', 'orientation', 'row');
-- CALL set_table_property('lineitem', 'clustering_key', 'L_ORDERKEY,L_SHIPDATE');
CALL set_table_property('lineitem', 'distribution_key', 'L_ORDERKEY');
COMMIT;

Scenario 3 DDL

-- Create a partitioned Lineitem table partitioned by year.
-- The non-partitioned version uses the same DDL as Scenario 2.
DROP TABLE IF EXISTS lineitem;
BEGIN;
CREATE TABLE lineitem
(
    L_ORDERKEY      BIGINT      NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    L_YEAR          TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER,L_YEAR)
)
PARTITION BY LIST (L_YEAR);
CALL set_table_property('lineitem', 'clustering_key', 'L_ORDERKEY,L_SHIPDATE');
CALL set_table_property('lineitem', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('lineitem', 'distribution_key', 'L_ORDERKEY');
COMMIT;

Scenario 4 DDL

-- Create the Lineitem table without clustering key or bitmap index (baseline).
-- For the optimized version, add the clustering key and bitmap_columns properties as shown in the comments.
DROP TABLE IF EXISTS lineitem;
BEGIN;
CREATE TABLE lineitem
(
    L_ORDERKEY      BIGINT      NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('lineitem', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('lineitem', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('lineitem', 'bitmap_columns', 'l_orderkey,l_partkey,l_suppkey,l_linenumber,l_returnflag,l_linestatus,l_shipinstruct,l_comment');
COMMIT;

Scenario 5 DDL

-- Create the Lineitem table with the GROUP BY field configured as the distribution key.
DROP TABLE IF EXISTS lineitem;
BEGIN;
CREATE TABLE lineitem
(
    L_ORDERKEY      BIGINT      NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER,L_SUPPKEY)
);
CALL set_table_property('lineitem', 'segment_key', 'L_COMMITDATE');
CALL set_table_property('lineitem', 'clustering_key', 'L_ORDERKEY,L_SHIPDATE');
CALL set_table_property('lineitem', 'distribution_key', 'L_SUPPKEY');
COMMIT;

Scenario 6 DDL

DROP TABLE IF EXISTS LINEITEM;

BEGIN;
CREATE TABLE LINEITEM
(
    L_ORDERKEY      BIGINT      NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
COMMIT;

DROP TABLE IF EXISTS ORDERS;

BEGIN;
CREATE TABLE ORDERS
(
    O_ORDERKEY      BIGINT      NOT NULL PRIMARY KEY,
    O_CUSTKEY       INT         NOT NULL,
    O_ORDERSTATUS   TEXT        NOT NULL,
    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
    O_ORDERDATE     timestamptz NOT NULL,
    O_ORDERPRIORITY TEXT        NOT NULL,
    O_CLERK         TEXT        NOT NULL,
    O_SHIPPRIORITY  INT         NOT NULL,
    O_COMMENT       TEXT        NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
COMMIT;

References

For Data Definition Language (DDL) statements for Hologres internal tables, see: