Boost Query Performance with Hologres Global Secondary Indexes - Hologres

Queries that filter on non-primary key columns must scan all shards, limiting throughput. A global secondary index maps non-primary key column values to their shards, enabling shard pruning at query time. This reduces data reads and increases queries per second (QPS) for point queries.

How it works

In Hologres, data is distributed across shards based on the distribution key. A query that filters on the primary key or distribution key targets the correct shard directly. A query that filters on a non-primary key column scans all shards.

A global secondary index solves this problem. When you create a global secondary index on a column, the optimizer can perform shard pruning at query time, reading only the relevant shards instead of all shards. Unlike a primary key, a global secondary index does not require unique values in the indexed column.

The index inherits the storage format of its source table:

Row store source table produces a row store index.
Column store source table produces a column store index.
Row-column hybrid store source table produces a row-column hybrid store index.

When to use

Global secondary indexes are designed for high-frequency point queries that filter on a non-primary key column. Consider creating one when:

Your application frequently queries a table by a column that is not the primary key or distribution key.
The query returns a small subset of rows (high selectivity).
You need higher QPS for these queries.
The query columns can be fully covered by the index (all columns in the query are either index columns or INCLUDE columns).

Global secondary indexes are not suited for analytical scans that read large portions of a table. For those workloads, consider bitmap indexes, segment keys, or distribution keys.

Prerequisites

Before you begin, ensure that you have:

A Hologres instance running version 4.0 or later. To check or upgrade your version, see Upgrade an instance

Limitations

Supported data types

Index columns support only the following data types: TEXT, INTEGER, BIGINT, and VARCHAR.

Table requirements

Requirement	Description
Primary key	The source table must have a primary key.
TTL	The source table cannot have the `time_to_live_in_seconds` parameter set.
Table type	Only standard internal tables are supported. Physical and logical partitioned tables are not supported.

Index constraints

Constraint	Description
Maximum columns	A global secondary index supports a maximum of 512 columns (index columns and INCLUDE columns combined).
No duplicate columns	Index columns and INCLUDE columns cannot contain duplicate columns.
Immutable	A global secondary index cannot be modified after creation. Drop it and create a new one to make changes.
Same schema	The index is created in the same schema as the source table. You cannot specify a different schema.

Source table restrictions

Columns that are part of a global secondary index cannot be deleted or modified in the source table.
Table Group and resharding operations cannot be performed on a source table that has a global secondary index.

Storage

By default, global secondary indexes support only Standard storage (hot storage).

Create a global secondary index

Syntax

CREATE GLOBAL INDEX [ IF NOT EXISTS ] index_name
  ON [schema_name.]table_name (index_column_name [, ...])
  [ INCLUDE (include_column_name [, ...]) ];

Parameters

Parameter	Required	Description
`index_name`	Yes	Name of the global secondary index.
`schema_name`	No	Schema of the source table. Defaults to the default schema if not specified.
`table_name`	Yes	Name of the source table.
`index_column_name`	Yes	One or more columns to index. Set this to the filter column used for point queries on non-primary key columns.
`include_column_name`	No	Additional columns to store in the index. Include all columns that your queries need to avoid reading from the source table.

Usage notes

The CREATE GLOBAL INDEX statement blocks until the index is fully built and visible. For large tables, this may take a significant amount of time.
Index building affects write performance because it involves writing multiple copies of data. The impact increases with the data volume of the source table and the number of columns in the index.
The index is always created in the same schema as the source table. You cannot specify a schema in the CREATE GLOBAL INDEX statement.
A query uses the global secondary index only when all columns in the query are covered by the index (either as index columns or INCLUDE columns). If the query references any column not in the index, the optimizer falls back to scanning the source table.

Delete a global secondary index

Syntax

DROP INDEX [schema_name.]index_name;

Parameters

Parameter	Required	Description
`schema_name`	No	Schema of the global secondary index. Defaults to the default schema if not specified.
`index_name`	Yes	Name of the global secondary index to drop.

View global secondary indexes

List all global secondary indexes in the current database:

-- List all global secondary indexes with their source tables
SELECT
    n.nspname AS table_namespace,
    t.relname AS table_name,
    i.relname AS index_name
FROM
    pg_class t
JOIN
    pg_index ix ON t.oid = ix.indrelid
JOIN
    pg_class i ON i.oid = ix.indexrelid
JOIN
    pg_am am ON am.oid = i.relam
JOIN
    pg_namespace n ON n.oid = t.relnamespace
WHERE
    t.relkind = 'r'                -- Standard tables only
    AND am.amname = 'globalindex'; -- Global secondary indexes only

Check the storage size of a global secondary index:

-- Replace schema_name and global_index_name with your values
SELECT pg_relation_size('schema_name.global_index_name');

View the columns included in a global secondary index:

-- Replace global_index_name with the name of your index
SELECT pg_catalog.pg_get_indexdef('global_index_name'::regclass, 0, true);

Example: accelerate point queries

This example demonstrates how a global secondary index improves point query performance on the O_ORDERPRIORITY column of an orders table.

Step 1: Create the source table

The orders table has nine columns. O_ORDERKEY is the primary key.

Column	Type	Description
O_ORDERKEY	BIGINT	Order number (primary key)
O_CUSTKEY	INT	Customer ID (foreign key associated with the CUSTOMER table)
O_ORDERSTATUS	CHAR(1)	Order status ('F' = Finished, 'O' = Open, 'P' = Processing)
O_TOTALPRICE	DECIMAL(15,2)	Total order amount
O_ORDERDATE	DATE	Order creation date
O_ORDERPRIORITY	TEXT	Order priority ('1-URGENT', '2-HIGH', etc.)
O_CLERK	TEXT	Employee ID of the person who processed the order
O_SHIPPRIORITY	INT	Shipping priority (higher value means higher priority)
O_COMMENT	TEXT	Order comment

-- Create the orders table with row-column hybrid storage
CREATE TABLE ORDERS
(
    O_ORDERKEY      BIGINT        NOT NULL PRIMARY KEY,
    O_CUSTKEY       INT           NOT NULL,
    O_ORDERSTATUS   CHAR(1)       NOT NULL,
    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
    O_ORDERDATE     DATE          NOT NULL,
    O_ORDERPRIORITY TEXT          NOT NULL,
    O_CLERK         TEXT          NOT NULL,
    O_SHIPPRIORITY  INT           NOT NULL,
    O_COMMENT       TEXT          NOT NULL
) WITH (
  orientation='row,column',                                        -- Row-column hybrid store
  segment_key='O_ORDERDATE',                                       -- Segment by order date
  distribution_key='O_ORDERKEY',                                   -- Distribute by primary key
  bitmap_columns='O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY',
  dictionary_encoding_columns='o_comment:off,o_orderpriority,o_clerk'
);

COMMENT ON TABLE ORDERS IS 'Primary order table that records basic order information and status';
COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order number (primary key)';
COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID (foreign key associated with the CUSTOMER table)';
COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status (''F'' = Finished, ''O'' = Open, ''P'' = Processing)';
COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total order amount';
COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order creation date';
COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority (''1-URGENT'', ''2-HIGH'', etc.)';
COMMENT ON COLUMN ORDERS.O_CLERK IS 'Employee ID of the person who processed the order';
COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority (higher value means higher priority)';
COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Order comment information';

Step 2: Check the execution plan without an index

Run a point query that filters by order priority:

-- Query all urgent orders
SELECT
  O_ORDERKEY,
  O_CUSTKEY,
  O_ORDERSTATUS,
  O_TOTALPRICE,
  O_ORDERDATE,
  O_ORDERPRIORITY,
  O_CLERK,
  O_SHIPPRIORITY,
  O_COMMENT
FROM ORDERS
WHERE O_ORDERPRIORITY = '1-URGENT';

Check the execution plan:

EXPLAIN
SELECT
  O_ORDERKEY,
  O_CUSTKEY,
  O_ORDERSTATUS,
  O_TOTALPRICE,
  O_ORDERDATE,
  O_ORDERPRIORITY,
  O_CLERK,
  O_SHIPPRIORITY,
  O_COMMENT
FROM ORDERS
WHERE O_ORDERPRIORITY = '1-URGENT';

The output shows the query uses a Bitmap Filter on the Clustering_index. This provides only limited performance improvement:

QUERY PLAN
Gather  (cost=0.00..1.00 rows=1 width=53)
  ->  Local Gather  (cost=0.00..1.00 rows=1 width=53)
        ->  Index Scan using Clustering_index on orders  (cost=0.00..1.00 rows=1 width=53)
              Bitmap Filter: (o_orderpriority = '1-URGENT'::text)
Query Queue: init_warehouse.default_queue
Optimizer: HQO version 4.0.0

Step 3: Create a global secondary index

Create an index on O_ORDERPRIORITY and include all other columns needed by the query:

-- Create a global secondary index on order priority
-- INCLUDE all columns that the query needs
CREATE GLOBAL INDEX idx_orders ON orders(O_ORDERPRIORITY)
INCLUDE (
  O_CUSTKEY,
  O_ORDERSTATUS,
  O_TOTALPRICE,
  O_ORDERDATE,
  O_CLERK,
  O_SHIPPRIORITY,
  O_COMMENT
);

Step 4: Verify the index is used

Run the same EXPLAIN query. The execution plan now shows the optimizer scans the global secondary index idx_orders instead of the source table, and applies shard pruning:

QUERY PLAN
Local Gather  (cost=0.00..1.76 rows=3035601 width=99)
  ->  Index Scan using Clustering_index on idx_orders  (cost=0.00..1.54 rows=3035601 width=99)
        Shard Prune: Eagerly
        Shards selected: 1 out of 20
        Cluster Filter: (o_orderpriority = '1-URGENT'::text)
Query Queue: init_warehouse.default_queue
Optimizer: HQO version 4.0.0

Key differences from the plan without the index:

Aspect	Without index	With index
Scan target	`orders` (source table)	`idx_orders` (global secondary index)
Shard pruning	None (scans all shards)	`Shard Prune: Eagerly`, 1 of 20 shards selected (95% fewer reads in this example)
Filter type	Bitmap Filter	Cluster Filter (index structure used directly)

Step 5: Optimize further with Fixed Plan

Enable Fixed Plan to increase QPS for point queries:

SET hg_experimental_enable_fixed_dispatcher_for_scan = true;

After you enable this setting, the point query uses a Fixed Plan, which reduces query planning overhead and further improves throughput.