Beam stores data in columnar blocks and records the minimum and maximum values of each block. When you run a range query or equality filter, Beam skips blocks whose min/max range falls outside the filter condition, reducing I/O. Specify a sort key to control how data is physically ordered so that related values land in the same blocks and more blocks can be skipped per query. In most cases, sort keys also improve data compression ratios.
Beam supports two sort key types:
Compound sort key: sorts data by the listed columns in order. Queries that filter on a prefix of those columns benefit most—particularly when one column dominates filtering.
Interleaved sort key: assigns equal weight to each column so any listed column can drive block skipping. Queries that filter on different column combinations with similar frequency benefit most.
Prerequisites
Beam sorting optimization requires AnalyticDB for PostgreSQL V7.0 instances running V7.0.1.x or later. Interleaved sorting requires V7.0.4.0 or later. To check your instance's minor version, see View the minor engine version.
Choose a sort key type
Use the following table to match your query pattern to the recommended sort key type.
| Scenario | Recommended type |
|---|---|
| Queries almost always filter on one specific column or a fixed prefix of columns | Compound sort key |
| One column has high query frequency and high selectivity | Compound sort key |
| Queries filter on multiple columns interchangeably with similar frequency and selectivity, and no column is auto-increment | Interleaved sort key |
Additional guidance:
Compound sort key — column order matters: When all columns are queried at the same frequency, put low-cardinality columns first. Low cardinality means fewer distinct values, so the leading column groups more rows into the same block, enabling more blocks to be skipped.
Interleaved sort key — column selection: Select low-cardinality columns that appear in large data volumes. Low-cardinality interleaved columns produce more consistent block-level groupings across the table.
Auto-increment columns: Do not use auto-increment columns (such as date or timestamp columns) as interleaved sort key columns.
Performance comparison
The following table shows query times for a compound sort key versus an interleaved sort key on the lineorder_flat Beam table (1 TB, Star Schema Benchmark (SSB)). Sort key columns are LO_ORDERDATE and P_BRAND.
| Filter condition | Compound sort key (s) | Interleaved sort key (s) |
|---|---|---|
| Point queries based on the first column | 0.297 | 18.329 |
| 1% filtering rate based on the first column | 1.268 | 19.224 |
| 10% filtering rate based on the first column | 16.83 | 38.30 |
| 50% filtering rate based on the first column | 65.62 | 76.99 |
| Point queries based on the first column + point queries based on the second column | 0.288 | 5.29 |
| 1% filtering rate based on the first column + 1% filtering rate based on the second column | 7.36 | 6.46 |
| 10% filtering rate based on the first column + 10% filtering rate based on the second column | 91.73 | 26.70 |
| 50% filtering rate based on the first column + 50% filtering rate based on the second column | 376.22 | 87.82 |
| 50% filtering rate based on the first column + point queries based on the second column | 71.83 | 19.16 |
| 10% filtering rate based on the first column + 1% filtering rate based on the second column | 82.50 | 18.95 |
| 1% filtering rate based on the first column + 10% filtering rate based on the second column | 7.98 | 6.43 |
| Point queries based on the first column + 50% filtering rate based on the second column | 0.50 | 31.48 |
| Point queries based on the second column | 87.04 | 19.67 |
| 1% filtering rate based on the second column | 515.08 | 78.90 |
| 10% filtering rate based on the second column | 567.85 | 131.39 |
| 50% filtering rate based on the second column | 588.86 | 134.36 |
These results show the relative performance difference between the two sort key types. They do not represent the optimal performance of AnalyticDB for PostgreSQL on the SSB dataset.
Key takeaway: Use a compound sort key when queries filter primarily on the first column or on it with high selectivity. Use an interleaved sort key when queries filter on the second column alone or on multiple columns together.
Compound sort key
Define a compound sort key
A compound sort key sorts data by the listed columns in order. Queries that filter on a prefix of those columns—starting from the first—benefit most.
To create a Beam table with a compound sort key, use ORDER BY in the CREATE TABLE statement:
CREATE TABLE beam_example (
id integer,
name text,
ftime timestamp
)
USING beam
DISTRIBUTED BY (id)
ORDER BY(id);Maintain a compound sort key
After data is written, a background process sorts it automatically based on data volume and file count. To trigger sorting immediately, run OPTIMIZE:
OPTIMIZE beam_example;OPTIMIZE blocks DDL changes. DDL statements resume automatically after OPTIMIZE completes.
Interleaved sort key
An interleaved sort key assigns equal weight to every listed column. Queries that filter on any one of those columns—regardless of column order—can skip blocks effectively.
Define an interleaved sort key
To create a Beam table with an interleaved sort key, use ZORDER BY in the CREATE TABLE statement:
CREATE TABLE beam_example_interleaved (
id integer,
name text,
ftime timestamp,
region varchar,
age integer
)
USING beam
DISTRIBUTED BY (id)
ZORDER BY(name, region, age);An interleaved sort key must include 2 to 8 columns.
Maintain an interleaved sort key
After data is written, a background process sorts it automatically. As data grows, value distributions can shift, causing data skew that reduces block-skipping efficiency. Check for skew regularly and re-sort when needed.
Check skew for a specific table:
SELECT * FROM adbpg_toolkit.pg_get_interleaved_skew('beam_example_interleaved'::regclass)Sample output:
relid | colname | skew | suggestion
-------+----------+------+---------------
17139 | name | 0.46 |
17139 | region | 0.54 | NEED OPTIMIZE
17139 | OVER ALL | 0.54 | NEED OPTIMIZE
(3 rows)Check skew across all interleaved sorting tables in the current database:
SELECT * FROM adbpg_toolkit.pg_stat_interleaved_skew;Sample output:
relid | relname | colname | skew | suggestion
-------+--------------------------+----------+------+---------------
17139 | beam_example_interleaved | name | 0.46 |
17139 | beam_example_interleaved | region | 0.54 | NEED OPTIMIZE
17139 | beam_example_interleaved | OVER ALL | 0.54 | NEED OPTIMIZE
(3 rows)Output columns:
| Column | Description |
|---|---|
skew | Skew ratio of the interleaved sort key column |
suggestion | NEED OPTIMIZE means the column must be re-sorted |
OVER ALL (in colname) | Overall data skew across all interleaved sort key columns |
When the OVER ALL row shows NEED OPTIMIZE, re-sort the table:
OPTIMIZE beam_example_interleaved;OPTIMIZE blocks DDL changes. DDL statements resume automatically after OPTIMIZE completes.
Add or modify a sort key
After creating a table, you can add, change, or remove its sort key.
Adding or modifying a sort key locks the table—reads and writes are blocked until the operation completes. Modifying a sort key rewrites all table data. For large tables, this can take considerable time. Plan the operation accordingly.
| Operation | Behavior after modification |
|---|---|
| Modify a compound sort key | Table data is re-sorted immediately |
| Modify an interleaved sort key | Table data is not re-sorted immediately—run OPTIMIZE afterward |
Add or modify a compound sort key:
ALTER TABLE beam_example SET ORDER BY(id, name);Add or modify an interleaved sort key:
ALTER TABLE beam_example_interleaved SET ZORDER BY(name, region);Remove a sort key:
ALTER TABLE beam_example SET ORDER NONE;Related topics
Beam sorting optimization applies to AnalyticDB for PostgreSQL V7.0 instances only. For sort key configuration on V6.0 instances, see Sorting optimization.