Global secondary indexes (GSIs) - PolarDB - Alibaba Cloud Documentation Center

In a PolarDB-X distributed table, queries that don't include the partition key trigger a full scan across all data nodes. This collapses performance to the ceiling of a single node and eliminates linear scaling. A global secondary index (GSI) eliminates full partition scans by letting you define an independent partitioning scheme for an index — so any column can become an efficient query path.

How GSIs work

PolarDB-X stores table data across multiple data nodes, partitioned by a designated key. A query that includes the partition key is routed to the relevant partition directly. A query that doesn't include it fans out to every partition.

When a system has N data nodes, a partition-key query places an average load of 1/N on each node. A full partition scan places a load of 1 on every node — so the performance ceiling drops to that of a single node and the system loses linear scaling.

A GSI solves this by acting as a special partitioned table that duplicates specific columns from the primary table. It is horizontally partitioned by a user-defined key that is different from the primary table's partition key. When a query matches the GSI's partition key, PolarDB-X routes it to the relevant GSI partition, retrieves the matching rows, then fetches any additional columns from the primary table. This avoids the full partition scan.

PolarDB-X uses distributed transactions to maintain strong data consistency between the primary table and its GSIs.

GSIs also support:

Online DDL — create, modify, and drop GSIs without locking the table
Custom covering columns — store additional columns in the GSI to reduce or eliminate lookups on the primary table
Invisible indexes — test a GSI's effect on query plans without exposing it to the query optimizer, so you can validate index performance before enabling it in production

GSI types

GSI

A standard GSI partitions index data by a column other than the primary table's partition key. When a query matches the GSI's partition key, PolarDB-X routes the query to the relevant partition and looks up the primary table for any remaining columns.

Example: The user table is partitioned by user_id. To support efficient queries by name, create a GSI on name.

CREATE TABLE user(
  user_id bigint,
  name varchar(10),
  addr varchar(30),
  GLOBAL INDEX `g_i_name` (name) PARTITION BY HASH(name),
  PRIMARY KEY(user_id)
) PARTITION BY KEY(user_id);

With g_i_name in place, a query filtering by name routes to the matching partition of the index rather than scanning all partitions of user.

UGSI

A unique global secondary index (UGSI) has all the properties of a standard GSI and additionally enforces a global UNIQUE constraint across the entire distributed table.

Example: The user2 table is partitioned by user_id. To guarantee that phone is globally unique across all partitions, create a UGSI on phone.

CREATE TABLE user2(
  user_id bigint,
  phone varchar(20),
  addr varchar(30),
  UNIQUE GLOBAL INDEX `g_i_phone`(phone) PARTITION BY HASH(phone),
  PRIMARY KEY(user_id)
) PARTITION BY KEY(user_id);

Clustered GSI

By default, a clustered GSI stores all columns from the primary table — not just the index key and any covering columns. Because every column is present in the index, queries never need to look up the primary table, eliminating lookup overhead entirely.

The trade-off: a clustered GSI occupies the same amount of disk space as the primary table.

Example: The order_tbl table is partitioned by order_id. To support fast queries by user_id without any primary table lookups, create a clustered GSI on user_id.

CREATE TABLE order_tbl(
  order_id bigint,
  user_id bigint,
  addr varchar(30),
  info text,
  create_time datetime,
  CLUSTERED INDEX `cg_i_user`(user_id) PARTITION BY HASH(user_id),
  PRIMARY KEY(order_id)
) PARTITION BY KEY(order_id);

When querying by user_id, PolarDB-X routes the query to the matching partition of cg_i_user and reads all required data directly from the index.

When to use each type

Scenario	Recommended type
Query by a non-partition-key column; primary table lookup is acceptable	GSI
Query by a non-partition-key column; column must be globally unique	UGSI
Query by a non-partition-key column; eliminating lookup overhead is a priority	Clustered GSI

Performance

GSIs improve read performance and degrade write performance. The following Sysbench benchmarks illustrate the impact at scale.

Read performance

Table	Threads	select_random_ranges QPS	Avg latency (ms)	95th percentile (ms)	select_random_points QPS	Avg latency (ms)	95th percentile (ms)
Partitioned table	128	2,769.17	46.21	99.33	5,226.99	24.48	42.61
	256	3,415.64	144.97	144.97	5,476.76	46.73	82.96
	512	3,272.46	156.31	257.95	5,290.67	96.72	179.94
	1024	2,453.16	416.12	539.71	5,165.31	198.07	404.61
Partitioned table + GSI	128	9,662.11	13.24	25.28	22,584.89	5.66	9.73
	256	10,431.73	24.52	51.02	25,558.26	10.01	17.95
	512	15,634.51	32.72	73.13	27,116.56	18.86	39.65
	1024	22,948.76	44.53	108.68	32,509.87	31.43	73.13

With a GSI on the index column:

Range query QPS: 3,415.64 → 22,948.76 — 571% improvement
Point query QPS: 5,476.76 → 32,509.87 — 493% improvement

Write performance

Table	Threads	Write-only QPS	Avg latency (ms)	95th percentile (ms)	Read-write QPS	Avg latency (ms)	95th percentile (ms)
Partitioned table	128	86,548.12	8.87	10.27	113,655.28	22.52	26.2
	256	115,774.71	13.26	19.29	149,677.52	34.19	44.17
	512	143,928.94	20.51	34.95	14,555.16	70.28	112.67
	1024	153,501.7	39.53	70.55	132,150.69	131.58	287.38
Partitioned table + GSI	128	52,069.22	14.25	18.28	90,074.59	28.41	33.72
	256	66,250.79	23.17	32.53	114,420.32	44.73	57.87
	512	75,700.74	39.1	59.99	111,093.61	92.09	142.39
	1024	76,557.94	80.14	134.9	101,828.32	182.51	350.33