All Products
Search
Document Center

Lindorm:Warehouse-mode columnar tables

Last Updated:May 22, 2026

A warehouse-mode columnar table is the OLAP resource group's built-in analytical table format. It uses columnar storage and an MPP execution engine for both bulk and real-time analytics. Data lives entirely inside the OLAP resource group — separate from lakehouse columnar data (lindorm_columnar) and wide-table engine data (lindorm_table) — and supports a full DDL/DML surface and high-throughput OLAP execution.

After connecting to the OLAP resource group, switch to the warehouse-mode catalog:

SET CATALOG default_catalog;

Table types

Pick a table type based on whether the data needs to be updated after it lands. The two types differ in how writes are processed and stored. The table type is fixed at creation and cannot be changed afterward. The engine sorts, processes, and stores incoming data according to the table type.

Duplicate Key Table

A Duplicate Key Table is the default table type. It uses an append-only write model — every row is preserved exactly as written. The engine performs no deduplication or aggregation, so multiple rows with the same sort key are allowed to coexist.

When to use

  • Append-only detail data that never needs updating — raw logs, behavioral events, instrumentation traces.

  • A single dataset that needs to be aggregated along many ad-hoc dimensions, with no fixed pre-aggregation pattern.

  • Full historical snapshots that must be queried at row granularity by arbitrary predicates.

When not to use: if your workload involves frequent updates or deletes (such as order status changes or user profile edits), use a Primary Key Table instead.

DDL

-- Example: user behavior event log
-- Sorted by (event_day, event_type) — speeds up filters on date and event type.
-- Hash-bucketed by user_id — co-locates a user's rows in one bucket, enabling local joins.
CREATE TABLE user_event_log (
    event_day   DATE        NOT NULL COMMENT "Event date",
    event_type  TINYINT     NOT NULL COMMENT "Event type (1=click, 2=impression, 3=conversion)",
    user_id     BIGINT      NOT NULL COMMENT "User ID",
    item_id     BIGINT               COMMENT "Content/item ID",
    session_id  VARCHAR(64)          COMMENT "Session ID",
    duration_ms INT                  COMMENT "Dwell time in milliseconds",
    province    VARCHAR(32)          COMMENT "Province",
    channel     VARCHAR(32)          COMMENT "Source channel"
)
DUPLICATE KEY(event_day, event_type)
PARTITION BY date_trunc('day', event_day)
DISTRIBUTED BY HASH(user_id);

Notes

  • ORDER BY is an equivalent alternative to DUPLICATE KEY for declaring the sort key. If both are present, DUPLICATE KEY is ignored. When using ORDER BY, place it after DISTRIBUTED BY.

  • If neither sort key nor bucket key is specified, the engine uses the first three columns as the sort key by default.

  • Bitmap and Bloom Filter indexes can be created on any column — there is no Key/Value column distinction.

  • Duplicate Key Tables do not support UPDATE or DELETE (use a Primary Key Table when you need them).

  • Two identical rows are stored as two independent rows — there is no automatic deduplication.

Storage model

All written rows are physically sorted by the sort key and stored in columnar files. Each record is independent and immutable in place. Background compaction merges small files and reclaims space, but never deduplicates or modifies row contents.

The strengths of this model are high write throughput and query flexibility: writes carry no merge or dedup overhead, making the table ideal for high-frequency append workloads; queries can aggregate freely along any dimension, with no pre-defined aggregation locking the schema.

Designing the sort key

The sort key is declared with DUPLICATE KEY or ORDER BY. It defines both the physical row order and the basis for the shortkey index.

Design principles:

  • Put the most frequently filtered column first. The shortkey index only matches a contiguous prefix of the sort key, so the leading column should be the one most often used in WHERE clauses.

  • Prefer low-cardinality columns in the prefix. Low-cardinality columns (date, event type, city) cluster identical values together, which combines well with the ZoneMap index to skip large amounts of I/O.

  • Avoid high-cardinality columns at the prefix. Columns like user_id or device_id are too dispersed for the shortkey index to be useful — use them as Bloom Filter columns or bucket keys instead.

  • Keep the sort key to three columns or fewer. The shortkey index stops extending after 36 bytes, so additional columns deliver diminishing returns.


Primary Key Table

A Primary Key Table enforces a unique, non-null constraint on the primary key columns. It is purpose-built for workloads that mix real-time updates with high-throughput ad-hoc queries.

When to use

  • Synchronizing full CDC streams (inserts/updates/deletes) from transactional databases like MySQL or PostgreSQL via Flink CDC and similar tools.

  • High-frequency UPSERT workloads — order-status tracking, real-time user-profile updates, logistics waybill state sync.

  • Multiple data streams writing to one wide table, each updating its own subset of columns (partial column update).

  • Concurrent complex analytical queries that must see the latest snapshot.

DDL

-- Example: real-time analytical table for e-commerce orders
-- Primary key columns (order_id, dt) lead the column list.
-- Partition column dt and bucket column order_id are both included in the primary key.
-- ORDER BY (dt, merchant_id) speeds up range scans by date and merchant.
CREATE TABLE orders (
    order_id       BIGINT         NOT NULL COMMENT "Order ID",
    dt             DATE           NOT NULL COMMENT "Order date",
    merchant_id    INT            NOT NULL COMMENT "Merchant ID",
    user_id        BIGINT         NOT NULL COMMENT "User ID",
    good_id        INT            NOT NULL COMMENT "Item ID",
    good_name      VARCHAR(128)   NOT NULL COMMENT "Item name",
    total_amount   DECIMAL(18, 2) NOT NULL COMMENT "Order amount",
    payment_amount DECIMAL(18, 2)          COMMENT "Paid amount",
    status         TINYINT        NOT NULL COMMENT "Order status",
    create_time    DATETIME       NOT NULL COMMENT "Created at",
    update_time    DATETIME                COMMENT "Last updated at"
)
PRIMARY KEY (order_id, dt)
PARTITION BY date_trunc('day', dt)
DISTRIBUTED BY HASH(order_id)
ORDER BY (dt, merchant_id);

Notes

  • Primary key columns are unique and non-null. DECIMAL is not allowed in the primary key. The total primary key length must not exceed 128 bytes.

  • Primary key columns must lead the column list with no other columns interleaved. Place all primary key columns at the very start of the CREATE TABLE column definitions; mixing non-primary-key columns in between causes the DDL to fail.

  • When using partitioning and hash bucketing, the primary key must contain the partition columns and the bucket columns so the engine can locate the exact partition and bucket to update.

  • Primary key values are immutable. UPDATE only rewrites non-primary-key columns. To change a primary key value, DELETE the row, then INSERT the new one.

  • The sort key (ORDER BY) is fully decoupled from the primary key — it can be any combination of columns. The sort key can be modified after table creation via ALTER TABLE tbl ORDER BY ..., but you cannot drop it or change the data type of a sort column.

  • Primary Key Tables support the full DML surface: INSERT, UPDATE (filter by either key or value columns), and DELETE (filter by either key or value columns).

  • Primary Key Tables only support hash bucketing. Random bucketing is not supported.

Write model: Delete+Insert

Primary Key Tables use a Delete+Insert write strategy, in contrast to the traditional Merge-On-Read model:

  • On write: The engine looks up the existing row through the primary key index, marks it as deleted, writes the new row, and updates the index.

  • On query: Data is already in its final state — no online multi-version merge is needed; the reader just streams the live rows.

As a result, Primary Key Tables deliver markedly better query latency than Merge-On-Read engines, especially under heavy update traffic where read amplification would otherwise dominate.

Primary key design principles

  • The primary key should uniquely identify a business entity. Pick a column or set of columns that pins down a single record across the whole table — for example, order_id or (user_id, event_id).

  • The primary key must include the partition and bucket columns. When using PARTITION BY and DISTRIBUTED BY HASH, the primary key must cover both — otherwise the engine cannot locate the exact partition and bucket for an update.

  • Keep the primary key compact. Total length must be at most 128 bytes; aim for three columns or fewer. Prefer compact types like INT and BIGINT over VARCHAR — too many primary key columns inflate index storage and maintenance cost.

  • Primary key columns cannot use DECIMAL. Use BIGINT, VARCHAR, or other supported types instead.

  • Primary key columns must be NOT NULL. Every primary key column has to be declared NOT NULL.

Sort key decoupled from primary key

PRIMARY KEY defines uniqueness; ORDER BY defines the physical row order. The two are fully decoupled and can be designed independently.

  • The sort key can be any combination of columns and does not have to include primary key columns — choose it based on the columns most frequently used in filters.

  • If you omit ORDER BY, the primary key columns are used as the sort key by default.

  • The sort key can be modified after table creation: ALTER TABLE tbl ORDER BY (col1, col2, ...). Dropping the sort key is not supported, and you cannot change the data type of a sort column.

Partial column update

Primary Key Tables support partial column update: each writer carries only the columns it owns, and the engine merges them by primary key. This is the foundation for filling a wide table from multiple data streams in real time.

-- Update only the status and update_time columns; other columns keep their current values.
INSERT INTO orders (order_id, status, update_time)
VALUES (10001, 3, '2024-03-01 10:30:00');
-- Columns not listed (total_amount, user_id, ...) remain unchanged.

Data distribution

The distribution strategy determines how data is physically laid out across the cluster. A well-designed partition and bucket scheme lets the engine prune irrelevant data at query time and spreads work evenly across compute resources.

Warehouse-mode columnar tables use a two-level layout:

Table
  └── Partition     ← Logical division — used for data lifecycle and query pruning.
        └── Bucket  ← Physical division — used for even distribution and parallel scan.

Partition

A partition splits the table into independent management units along the partition key. When a query filters on the partition key, the engine prunes irrelevant partitions and scans only the targets, drastically reducing I/O. Partitions are also the unit of lifecycle management — DROP PARTITION removes historical data far faster than DELETE.

Expression partition (recommended)

Defined with a function expression like date_trunc(). The engine creates partitions on the fly based on the time field in incoming data, so you don't have to predefine a partition list. This is the recommended choice for time-series workloads.

-- Daily partitions, automatically created
PARTITION BY date_trunc('day', event_time)

-- Monthly partitions, automatically created
PARTITION BY date_trunc('month', order_date)

Range partition

Splits data into contiguous numeric or date ranges. Suitable for time-series data managed by day, week, or month.

PARTITION BY RANGE(event_day) (
    PARTITION p202401 VALUES LESS THAN ("2024-02-01"),
    PARTITION p202402 VALUES LESS THAN ("2024-03-01"),
    PARTITION p202403 VALUES LESS THAN ("2024-04-01")
)

List partition

Splits data by an enumerated set of column values. Suitable for managing data by region, status, or other discrete categories.

PARTITION BY LIST(region) (
    PARTITION p_east   VALUES IN ("shanghai", "jiangsu", "zhejiang"),
    PARTITION p_north  VALUES IN ("beijing", "tianjin", "hebei"),
    PARTITION p_south  VALUES IN ("guangdong", "fujian", "hainan")
)
Note Partition keys must be a date type (DATE, DATETIME) or an integer type (INT, BIGINT, etc.).

Bucket

Within each partition, buckets distribute data evenly across physical storage units (Bucket / Tablet) so that scans can run in parallel.

Hash bucketing

Rows are routed by the hash of the bucket key, so rows with the same bucket key land in the same bucket. This accelerates equality predicates and same-key joins (a colocated join needs zero network shuffle). Choose a bucket key that is high-cardinality and frequently used in equality filters or join conditions to avoid skew.

-- Engine picks the bucket count automatically
DISTRIBUTED BY HASH(user_id)

-- Specify the bucket count manually
DISTRIBUTED BY HASH(order_id) BUCKETS 32

Random bucketing

Rows are randomly assigned to buckets without specifying a bucket key. Writes are evenly distributed, so this option is well suited to detail-data ingest with no fixed query pattern. Duplicate Key Tables default to random bucketing; Primary Key Tables do not support it.

DISTRIBUTED BY RANDOM
Best practice Aim for 100 MB to 1 GB of data per bucket. Buckets that are too large reduce parallelism; too many buckets inflate metadata and scheduling overhead. Likewise, avoid over-fine partitions (such as hourly partitions) — large numbers of small partitions add significant metadata pressure.

Combined example

-- Top-up detail table: daily expression partition, hash-bucketed by user_id
CREATE TABLE recharge_detail (
    id             BIGINT         NOT NULL COMMENT "Sequence ID",
    user_id        BIGINT         NOT NULL COMMENT "User ID",
    recharge_money DECIMAL(18, 2) NOT NULL COMMENT "Top-up amount",
    city           VARCHAR(32)    NOT NULL COMMENT "City",
    dt             DATE           NOT NULL COMMENT "Top-up date"
)
DUPLICATE KEY(id)
PARTITION BY date_trunc('day', dt)
DISTRIBUTED BY HASH(user_id);

Data types

Numeric types

Type

Bytes

Range

BOOLEAN

1

TRUE / FALSE

TINYINT

1

-128 to 127

SMALLINT

2

-32,768 to 32,767

INT

4

-2,147,483,648 to 2,147,483,647

BIGINT

8

-2⁶³ to 2⁶³-1

LARGEINT

16

-2¹²⁷ to 2¹²⁷-1, for very large integers (such as 128-bit device IDs)

FLOAT

4

Single-precision floating point, approximate

DOUBLE

8

Double-precision floating point, approximate

DECIMAL(M, D)

Variable

Exact decimal — M is total digits (max 38), D is digits after the decimal point. Recommended for financial and transactional data.

Note Numeric types do not support the UNSIGNED modifier.

String types

Type

Max length

Description

CHAR(N)

255 bytes

Fixed-length string, padded with spaces to length N.

VARCHAR(N)

1,048,576 bytes

Variable-length string, stored at the actual length.

STRING

Same as VARCHAR

Equivalent to a large-capacity VARCHAR. Suitable for unstructured text.

Date and time types

Type

Description

DATE

Date in YYYY-MM-DD format. Range: 0000-01-01 to 9999-12-31.

DATETIME

Date-time in YYYY-MM-DD HH:MM:SS[.ffffff] format, with microsecond precision.

OLAP-specific types

Type

Description

BITMAP

Bitmap type. Combined with BITMAP_UNION, performs exact distinct-count operations such as UV calculations.

HLL

HyperLogLog type. Combined with HLL_UNION, performs approximate distinct-count operations with about 1% error and very low memory cost.

BITMAP and HLL columns cannot be used as sort key columns or primary key columns.

Semi-structured types

Type

Description

JSON

Flexible JSON document. Path queries are supported through JSON_QUERY and the -> operator. Suitable for schemas that evolve over time.

ARRAY<T>

Array of homogeneous elements. Supports array functions such as array_agg, array_contains, and array_length.

MAP<K, V>

Key-value map. Suitable for storing dynamic attributes, tags, or other data with a non-fixed schema.

STRUCT<field: type>

Nested struct accessed by field name. Suitable for modeling deeply nested data.


Compute features

SQL analytical operators

Warehouse-mode columnar tables support standard SQL analytical syntax, covering the full range of OLAP query patterns.

Projection and filter

SELECT projects any subset of columns. Columns can be aliased with AS and used in arithmetic expressions or function calls. WHERE accepts the standard set of predicate combinations:

Predicate

Example

Equality

status = 1

Range

amount BETWEEN 100 AND 1000, dt >= '2024-01-01'

Set membership

city IN ('beijing', 'shanghai'), type NOT IN (3, 5)

Pattern match

name LIKE 'A%'

NULL check

pay_time IS NULL, remark IS NOT NULL

Regular expression

email REGEXP '^[a-zA-Z0-9]+'

Whenever possible, predicates are pushed down to the storage layer so that fewer rows are read from disk.

JOIN

Supported join types:

Join type

Description

INNER JOIN

Returns rows that match in both tables.

LEFT OUTER JOIN

Keeps all rows from the left table; fills NULL when no match exists on the right.

RIGHT OUTER JOIN

Keeps all rows from the right table; fills NULL when no match exists on the left.

FULL OUTER JOIN

Keeps all rows from both tables.

CROSS JOIN

Cartesian product.

LEFT SEMI JOIN

Returns left-table rows that have a match on the right; right-table columns are not returned.

LEFT ANTI JOIN

Returns left-table rows that have no match on the right.

The optimizer chooses the join strategy automatically based on table sizes and data distribution: broadcast a small table to every node (Broadcast Join), reshuffle data on the join key (Shuffle Join), or — when the bucket key matches the join key — run a colocated join with no network shuffle.

-- Example: join order details with user profiles
SELECT
    o.order_id,
    o.total_amount,
    u.user_level,
    u.city
FROM orders o
INNER JOIN user_profile u ON o.user_id = u.user_id
WHERE o.dt >= '2024-01-01'
  AND u.user_level IN ('gold', 'platinum');

Set operations

Operator

Description

UNION ALL

Combines results from multiple queries while keeping duplicates. Faster than UNION.

UNION

Combines results from multiple queries and deduplicates them.

INTERSECT

Returns the intersection of two result sets.

EXCEPT / MINUS

Returns rows in the first result set that do not appear in the second.

Aggregation (GROUP BY)

Standard aggregate functions are supported: COUNT, SUM, AVG, MIN, MAX, COUNT DISTINCT. Use HAVING to filter on aggregated results.

The following extended grouping syntax is also supported. Each form computes multiple aggregations in a single scan, so it outperforms running the queries separately:

-- GROUPING SETS: explicitly enumerate the dimension combinations
SELECT city, dt, SUM(amount) AS total
FROM orders
GROUP BY GROUPING SETS ((city, dt), (city), ());

-- ROLLUP: hierarchical totals (from finest grain to grand total)
-- Equivalent to GROUPING SETS ((city, dt), (city), ())
SELECT city, dt, SUM(amount) AS total
FROM orders
GROUP BY ROLLUP (city, dt);

-- CUBE: every combination of the dimensions
-- Equivalent to GROUPING SETS ((city, dt), (city), (dt), ())
SELECT city, dt, SUM(amount) AS total
FROM orders
GROUP BY CUBE (city, dt);

Sort and pagination

ORDER BY sorts on multiple columns (ASC or DESC); pair it with LIMIT and OFFSET for paginated reads.

SELECT order_id, total_amount
FROM orders
WHERE dt = '2024-03-01'
ORDER BY total_amount DESC, order_id ASC
LIMIT 20 OFFSET 40;

Subquery

The optimizer rewrites convertible subqueries into equivalent joins. Supported subquery forms:

Subquery type

Example

Scalar subquery

WHERE amount > (SELECT AVG(amount) FROM orders)

IN / NOT IN subquery

WHERE user_id IN (SELECT user_id FROM vip_users)

EXISTS / NOT EXISTS subquery

WHERE EXISTS (SELECT 1 FROM blacklist WHERE ...)

Correlated subquery

A subquery that references columns from the outer query and is evaluated row by row.

Window functions

Window functions compute aggregates over a row group while preserving the original rows — a core capability for OLAP workloads. Define a window with OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...).

Category

Common functions

Ranking

ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n)

Aggregation

SUM(), AVG(), MIN(), MAX() (sliding window supported)

Offset access

LAG(col, n, default), LEAD(col, n, default)

First/last

FIRST_VALUE(col), LAST_VALUE(col)

-- Example: cumulative spend per user and daily ranking
SELECT
    user_id,
    order_id,
    dt,
    total_amount,
    SUM(total_amount) OVER (
        PARTITION BY user_id
        ORDER BY dt
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_amount,
    RANK() OVER (
        PARTITION BY dt
        ORDER BY total_amount DESC
    ) AS daily_rank
FROM orders;

-- Example: days between consecutive purchases for each user
SELECT
    user_id,
    order_id,
    dt,
    DATEDIFF(dt, LAG(dt, 1) OVER (PARTITION BY user_id ORDER BY dt)) AS days_since_last_order
FROM orders;

CASE WHEN

Both searched and simple CASE forms are supported. CASE WHEN can appear in SELECT, WHERE, ORDER BY, aggregate function arguments, and elsewhere.

SELECT
    order_id,
    total_amount,
    CASE
        WHEN total_amount >= 10000 THEN 'large'
        WHEN total_amount >= 1000  THEN 'medium'
        WHEN total_amount >= 100   THEN 'small'
        ELSE 'micro'
    END AS amount_tier,
    SUM(CASE WHEN status = 1 THEN total_amount ELSE 0 END) AS paid_amount
FROM orders
GROUP BY order_id, total_amount;

Indexes

Warehouse-mode columnar tables ship with a multi-layer index system. Each index targets a different query pattern, and they work together to skip data that the query does not need.

Shortkey index

Built automatically. The engine takes the prefix of the sort key (up to 36 bytes and three columns) every 1024 rows and stores it as a sparse index entry. When a query filters on a prefix of the sort key, a binary search locates the target data block — no manual index creation is needed.

Triggering condition: the filter must form a prefix of the sort key.

Sort key: (event_day, site_id, city_code)

✅ WHERE event_day = '2024-01-01'                       -- Hits the shortkey index
✅ WHERE event_day = '2024-01-01' AND site_id = 10      -- Hits the shortkey index
❌ WHERE city_code = 'BJ'                               -- Not a prefix; cannot use the index
Place the most frequently filtered column at the start of the sort key so it is covered first by the shortkey index.

ZoneMap index

Built automatically. For every column on every data block (Page, 64 KB by default), the engine maintains the minimum, maximum, and NULL existence statistics. During a range query, the engine compares the predicate against each Page's Min/Max and skips Pages that cannot match — pruning at the data-block level.

When it helps: range filters (BETWEEN, >, <, >=, <=) on ordered types such as numerics and dates.

Bloom Filter index

A Bloom Filter–based skip index, built per data block (Page). On write, the engine hashes column values from each Page into the filter; on query, the engine hashes the predicate value and skips Pages whose filter does not contain it, reducing I/O.

When to use: equality predicates (=, IN) on high-cardinality columns such as user IDs, order IDs, or device IDs. These columns have too many distinct values for Bitmap to be cost-effective.

Specify the index through PROPERTIES's bloom_filter_columns. After table creation, you can add or change it with ALTER TABLE.

-- Define at table creation
CREATE TABLE orders (
    order_id    BIGINT         NOT NULL,
    user_id     BIGINT         NOT NULL,
    merchant_id INT            NOT NULL,
    dt          DATE           NOT NULL,
    amount      DECIMAL(18, 2)
)
PRIMARY KEY(order_id)
DISTRIBUTED BY HASH(order_id)
PROPERTIES (
    "bloom_filter_columns" = "user_id, merchant_id"
);

-- Add or change after creation
ALTER TABLE orders SET ("bloom_filter_columns" = "user_id, merchant_id, dt");

-- Drop the Bloom Filter index
ALTER TABLE orders SET ("bloom_filter_columns" = "");

Notes:

  • TINYINT, FLOAT, DOUBLE, and DECIMAL columns are not supported.

  • Only equality predicates benefit. Range predicates (>, <, BETWEEN) cannot use it.

  • Not effective on low-cardinality columns — use the Bitmap index there instead.

Bitmap index

Builds a bitmap per distinct value (each bit corresponds to a row, 1 means the row contains that value). Multi-condition queries are evaluated by AND/OR/NOT operations on the bitmaps, very efficiently producing the result set. The internal storage is Roaring Bitmap, which can save up to 90% of the space compared with a classical bitmap.

When to use: equality predicates on medium-cardinality columns (10,000–100,000 distinct values is a good fit), and multi-column combination filters — for example, segmenting a user-profile table by age band, customer tier, and city in a single query.

Define a Bitmap index inside INDEX at table creation, or add it later with CREATE INDEX (an asynchronous Schema Change operation).

-- Create at table creation
CREATE TABLE user_profile (
    user_id      BIGINT      NOT NULL COMMENT "User ID",
    age_range    TINYINT              COMMENT "Age band (1-5)",
    gender       TINYINT              COMMENT "Gender (0=female, 1=male)",
    city_code    VARCHAR(20)          COMMENT "City code",
    user_level   TINYINT              COMMENT "Customer tier (1-5)",
    register_day DATE                 COMMENT "Registration date",
    INDEX idx_age_range  (age_range)  USING BITMAP COMMENT "Age-band index",
    INDEX idx_city_code  (city_code)  USING BITMAP COMMENT "City index",
    INDEX idx_user_level (user_level) USING BITMAP COMMENT "Tier index"
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id);

-- Add after creation
CREATE INDEX idx_gender ON user_profile (gender) USING BITMAP;

-- Inspect indexes
SHOW INDEX FROM user_profile;

-- Drop an index
DROP INDEX idx_gender ON user_profile;

Notes:

  • FLOAT, DOUBLE, BOOLEAN, and DECIMAL columns are not supported.

  • For very high cardinalities (more than 1 million distinct values), the bitmap storage inflates — switch to the Bloom Filter index.

  • If the index filters out only a small fraction of rows (for example, more than 50% of rows still match), the index hurts more than it helps — skip it.

Choosing an index

Query pattern

Recommended index

Equality or range predicate on the sort-key prefix

Shortkey index (automatic — nothing to create)

Range predicate on numeric or date columns

ZoneMap index (automatic — nothing to create)

Equality predicate on medium-cardinality columns; multi-column combination filters

Bitmap index

Equality predicate on high-cardinality (ID-like) columns

Bloom Filter index