Wide tables in Lindorm store data in row-oriented format, which is efficient for point lookups but slow for large-scale aggregations across millions of rows. A columnstore index stores the same data in columnar format inside the compute engine, enabling fast parallel scans, aggregations, and joins on massive wide tables. Typical use cases include device telemetry aggregation for Internet of Vehicles (IoV) and Internet of Things (IoT) workloads, e-commerce analytics, and logistics order statistics.
Prerequisites
Before you begin, ensure that you have:
The compute engine enabled. See Enable services
LindormDFS enabled, at version 4.0.0 or later
The wide table engine enabled, at version 2.5.0 or later
Usage notes
Columnstore indexes cannot be built synchronously. The initial build takes roughly 15 minutes; large data volumes or a high number of background tasks extend this.
Building an index generates read I/O on the wide table. If hot and cold data separation is enabled, cold storage throttling can slow index builds and cause write backpressure.
Each wide table supports exactly one columnstore index. If a previous creation attempt failed, drop the failed index before creating a new one.
TTL-expired rows in the wide table are not automatically purged from the columnstore index.
Enable the columnstore index feature
Log on to the Lindorm console.
In the upper-left corner, select the region where the instance is deployed.
On the Instances page, click the instance ID, or click View Instance Details in the Actions column.
In the left navigation pane, choose Wide Table Engine.
Click the Columnar Index tab, then click Activate Now.
In the dialog, click OK.
Quick start
This walkthrough creates a columnstore index on a wide table named my_tbl:
+------------+-------------+---------+----------------+
| TABLE_NAME | COLUMN_NAME | TYPE | IS_PRIMARY_KEY |
+------------+-------------+---------+----------------+
| my_tbl | pk0 | INT | true |
| my_tbl | pk1 | VARCHAR | true |
| my_tbl | pt_d | VARCHAR | true |
| my_tbl | col0 | INT | false |
| my_tbl | col1 | VARCHAR | false |
| my_tbl | json_col0 | JSON | false |
+------------+-------------+---------+----------------+pk0 identifies each row and has high cardinality. pt_d is the data generation date; analysis typically runs per day.
Step 1: Create the index.
Choose the statement that matches how often your table schema changes:
Stable schema — use when
my_tblcolumns rarely change:CREATE INDEX my_tbl_idx USING COLUMNAR ON my_tbl(*) PARTITION BY ENUMERABLE (pt_d, bucket(128, pk0)) WITH ( `lindorm_columnar.user.index.database` = 'my_index_db', `lindorm_columnar.user.index.table` = 'my_index_tbl', `lindorm_columnar.user.syncer.lci.dynamicJsonColumns` = 'json_col0' );The index schema is fixed at creation time based on the current table schema and the structure of
json_col0.Evolving schema — add
dynamicSchema = 'true'when columns may be added later:CREATE INDEX my_tbl_idx USING COLUMNAR ON my_tbl(*) PARTITION BY ENUMERABLE (pt_d, bucket(128, pk0)) WITH ( `lindorm_columnar.user.index.database` = 'my_index_db', `lindorm_columnar.user.index.table` = 'my_index_tbl', `lindorm_columnar.user.syncer.lci.dynamicJsonColumns` = 'json_col0', `lindorm_columnar.user.syncer.lci.dynamicSchema` = 'true' );The index tracks subsequent schema changes and new JSON content automatically.
Step 2: Check the index status.
SHOW INDEX FROM my_tbl;For a description of all output fields, see SHOW INDEX.
Step 3: Query with the index.
Add the _use_ldps_ and _columnar_index_ hints to route the query through the compute engine:
SELECT /*+ _use_ldps_(cg0), _columnar_index_ */ COUNT(*), SUM(col0)
FROM my_index_db.my_index_tbl
WHERE pt_d = '2024-01-15'
GROUP BY pk1;Create a columnstore index
Parameters
| Parameter | Description |
|---|---|
index_name | Name of the columnstore index. Accepts uppercase letters, lowercase letters, digits, and underscores. |
table_name | Name of the wide table. |
column_name [, ...] | Columns to include. Must include all primary key columns. To include all columns, use (*). Supported types: TINYINT, SMALLINT, INTEGER, BIGINT, LONG, FLOAT, DOUBLE, VARCHAR, BINARY, VARBINARY, BOOLEAN, DECIMAL, JSON, DATE, TIMESTAMP. |
PARTITION BY ENUMERABLE(...) | Partitions index data using the enumeration algorithm for efficient filtering. Accepts zero or more regular partition expressions and exactly one bucket partition expression. All columns in both expression types must be primary key columns. |
bucket_num | Number of bucket partitions. Must be less than 1024. The bucket index is calculated as hash(column_name) % bucket_num. Use a high-cardinality column to avoid data skew. |
lindorm_columnar.user.index.database | Database that holds the columnstore index table. |
lindorm_columnar.user.index.table | Name of the columnstore index table. |
Partition sizing guidelines
Set each partition to between 50 MB and 512 MB:
Too small (< 50 MB): creates too many small files, degrading storage performance.
Too large (> 512 MB): reduces read/write throughput and causes data skew.
Avoid high-cardinality primary key columns as regular partition keys. Each distinct value creates a separate partition, generating many small files. Use a time or category field (such as a date) as the regular partition key, and reserve the high-cardinality column for the bucket expression.
For example, if daily data volume is 50 GB and the primary key is (id, dt):
PARTITION BY ENUMERABLE (dt, bucket(200, id))Example
CREATE INDEX my_tbl_idx USING COLUMNAR
ON my_tbl(pk0, pk1, pk2, col0, col1)
PARTITION BY ENUMERABLE (pk1, pk2, bucket(128, pk0))
WITH (
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl'
);View a columnstore index
After creation, the index continuously syncs data from the wide table—first historical data, then incremental data. During incremental sync, the index typically lags behind the primary table by less than 1 hour.
SHOW INDEX FROM my_tbl;For a description of all output fields, see SHOW INDEX.
Query with a columnstore index
Add the _use_ldps_ and _columnar_index_ hints to route your SELECT query through the compute engine, which uses the columnstore index to accelerate the query.
Without these hints, queries run against the wide table directly and do not use the columnstore index.
Aggregation
SELECT /*+ _use_ldps_(cg0), _columnar_index_ */ COUNT(*), SUM(col0), MIN(col0), MAX(col0)
FROM my_index_db.my_index_tbl
WHERE col0 > 100 AND col0 < 200 OR col0 > 500
GROUP BY pk1;Sorting
SELECT /*+ _use_ldps_(cg0), _columnar_index_ */ pk0 + col0, pk1
FROM my_index_db.my_index_tbl
WHERE col0 > 100 AND col0 < 200 OR col0 > 500
ORDER BY pk1
LIMIT 100;Joins across multiple wide tables
If you have columnstore indexes on multiple wide tables, join them through their index tables:
SELECT /*+ _use_ldps_(cg0), _columnar_index_ */ *
FROM my_index_db.my_index_tbl0 AS t0
JOIN my_index_db.my_index_tbl1 AS t1
ON t0.pk0 = t1.pk0
AND t0.pk1 = t1.pk1
LIMIT 100;Delete a columnstore index
Use DROP INDEX to delete a columnstore index. For the full syntax and examples, see DROP INDEX.
Advanced usage
Complex partition expressions
When the primary key cannot be used directly as a partition key—for example, when it is a UNIX timestamp—use transformation functions in the partition expression.
Sample table my_ts_tbl:
+------------+-------------+---------+----------------+
| TABLE_NAME | COLUMN_NAME | TYPE | IS_PRIMARY_KEY |
+------------+-------------+---------+----------------+
| my_ts_tbl | id | INT | true |
| my_ts_tbl | ts | LONG | true |
| my_ts_tbl | col0 | VARCHAR | false |
| my_ts_tbl | col1 | INT | false |
+------------+-------------+---------+----------------+Partition by day using the ts (UNIX timestamp) column:
CREATE INDEX my_ts_idx USING COLUMNAR ON my_ts_tbl(*)
PARTITION BY ENUMERABLE (
ifnull(substring(from_unixtime(ts), 0, 10), 'unknown') AS dt,
bucket(128, id)
)
WITH (
`lindorm_columnar.user.index.database` = 'my_ts_index_db',
`lindorm_columnar.user.index.table` = 'my_ts_index_tbl'
);Query by day using the computed partition column dt:
SELECT /*+ _use_ldps_ */ COUNT(1)
FROM lindorm_columnar.my_ts_index_db.my_ts_index_tbl
WHERE dt = '2020-06-06';Index incremental data only
To skip historical data and index only new writes, set lindorm_columnar.user.syncer.skip.fullsync = 'true':
CREATE INDEX my_tbl_idx USING COLUMNAR ON my_tbl(*)
PARTITION BY ENUMERABLE (pk1, pk2, bucket(128, pk0))
WITH (
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl',
`lindorm_columnar.user.syncer.skip.fullsync` = 'true'
);Expand JSON columns
The columnstore index can flatten JSON columns into typed columns during synchronization. Two methods are available:
| Method | How it works | Best for |
|---|---|---|
| Static expansion | Define JSON paths and their types at index creation time | Known, stable JSON structure |
| Dynamic expansion (Public preview) | Infer paths and types from actual data | Evolving or unpredictable JSON structure |
Do not configure the same JSON column in both static and dynamic expansion.
Sample table and data:
+-------------+-------------+--------+----------------+
| TABLE_NAME | COLUMN_NAME | TYPE | IS_PRIMARY_KEY |
+-------------+-------------+--------+----------------+
| my_json_tbl | id | BIGINT | true |
| my_json_tbl | col1 | INT | false |
| my_json_tbl | json_col | JSON | false |
+-------------+-------------+--------+----------------+UPSERT INTO my_json_tbl (id, col1, json_col)
VALUES (2, 2, '{"a": {"b": {"c": "hello,world", "d": 123}, "e": false}, "f": 3.14}');Static expansion
Define a mapping rule that specifies each JSON path and its target type:
CREATE INDEX columnar_idx USING COLUMNAR ON my_json_tbl(*)
PARTITION BY ENUMERABLE (ifnull(id%16, 0) AS dt, bucket(16, id))
WITH (
`lindorm_columnar.user.syncer.lci.jsonMapping.json_col` = 'a.b.c VARCHAR, a.e BOOLEAN, f DOUBLE',
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl'
);The mapping rule format is <json_path> <type> [, ...]:
json_path: dot-separated path to the JSON value, for example,a.b.ctype: one of BOOLEAN, BYTE, SHORT, INTEGER, LONG, FLOAT, DOUBLE, VARCHAR
To map multiple JSON columns, add one lindorm_columnar.user.syncer.lci.jsonMapping.<column> parameter per JSON column.
Dynamic expansion (Public preview)
Specify which JSON columns to expand dynamically. The index infers types from actual data:
CREATE INDEX columnar_idx USING COLUMNAR ON my_json_tbl(*)
PARTITION BY ENUMERABLE (ifnull(id%16, 0) AS dt, bucket(16, id))
WITH (
`lindorm_columnar.user.syncer.lci.dynamicJsonColumns` = 'json_col',
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl'
);Inferred types are limited to BOOLEAN, LONG, DOUBLE, and STRING. When a field contains multiple value types, the index uses STRING.
Dynamic expansion does not apply to historical data; it only processes incremental writes.
Expanded column naming
When a JSON column is expanded (by either method), the original JSON column is not stored in the column-oriented table. Instead, the index creates a column for each expanded path, prefixed with the JSON column name.
Using 'a.b.c VARCHAR, a.e BOOLEAN' as the static mapping for json_col produces:
| Column name | Type | Value stored |
|---|---|---|
json_col.a.b.c | STRING | The a.b.c value from json_col |
json_col.a.e | BOOLEAN | The a.e value from json_col |
Sync original JSON fields
To also store the original JSON column, add syncOriginalJsonContent = 'true':
WITH (
`lindorm_columnar.user.syncer.lci.jsonMapping.json_col` = 'a.b.c VARCHAR, a.e BOOLEAN',
`lindorm_columnar.user.syncer.lci.json.syncOriginalJsonContent` = 'true',
...
)This adds a json_col column of type STRING alongside the expanded columns.
Omit the JSON field name prefix from field names in the column-oriented table
To omit the JSON column name prefix from expanded column names, add ignoreJsonMappingPrefix = 'true':
WITH (
`lindorm_columnar.user.syncer.lci.jsonMapping.json_col` = 'a.b.c VARCHAR, a.e BOOLEAN',
`lindorm_columnar.user.syncer.lci.json.ignoreJsonMappingPrefix` = 'true',
...
)The column-oriented table then uses a.b.c and a.e instead of json_col.a.b.c and json_col.a.e.
If different JSON columns (for example, json_col1 and json_col2) contain the same mapping to expand and store the same path (for example, a.b.c), the columnstore index creation fails.
Dynamically track schema evolution (Public preview)
Set lindorm_columnar.user.syncer.lci.dynamicSchema = 'true' to keep the column-oriented table schema in sync with the primary table. When columns are added to the wide table, they are automatically added to the index:
CREATE INDEX my_tbl_idx USING COLUMNAR
ON my_tbl(*)
PARTITION BY ENUMERABLE (pt_d, bucket(128, pk0))
WITH (
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl',
`lindorm_columnar.user.syncer.lci.dynamicSchema` = 'true'
);If the wide table uses dynamic columns or wildcard columns, those column types are also synced to the index when dynamicSchema = 'true'.
Add columns without reindexing (Public preview)
Use ALTER INDEX to add columns to an existing columnstore index without a full rebuild.
Sample table:
+-------------+-------------+---------+----------------+
| TABLE_NAME | COLUMN_NAME | TYPE | IS_PRIMARY_KEY |
+-------------+-------------+---------+----------------+
| my_json_tbl | id | BIGINT | true |
| my_json_tbl | col1 | INT | false |
| my_json_tbl | col2 | VARCHAR | false |
| my_json_tbl | json_col1 | JSON | false |
| my_json_tbl | json_col2 | JSON | false |
+-------------+-------------+---------+----------------+Initial index creation (covers id, col1, and json_col1):
CREATE INDEX columnar_idx USING COLUMNAR ON my_json_tbl(id, col1, json_col1)
PARTITION BY ENUMERABLE (ifnull(id%16, 0) AS dt, bucket(16, id))
WITH (
`lindorm_columnar.user.syncer.lci.jsonMapping.json_col1` = 'a.b.c VARCHAR, a.e BOOLEAN',
`lindorm_columnar.user.index.database` = 'my_index_db',
`lindorm_columnar.user.index.table` = 'my_index_tbl'
);Add a regular column:
ALTER INDEX IF EXISTS columnar_idx ON my_json_tbl ADD COLUMNS (col2);Add a static JSON mapping for json_col2:
ALTER INDEX IF EXISTS columnar_idx ON my_json_tbl
ADD COLUMNS (
json_extract_long(json_col2, '$.key1'),
json_extract_boolean(json_col2, '$.key2'),
json_extract_double(json_col2, '$.key3.key4')
);Supported JSON extraction functions: json_extract_boolean, json_extract_long, json_extract_double, json_extract_string.
Billing
Creating a columnstore index incurs two types of charges:
Storage: the storage consumed by the columnstore index data.
Compute Units (CUs): the CUs used to synchronize data between the wide table and the index.
FAQ
Are there extra fees for creating a columnstore index?
Yes. Charges include storage costs for the index data and CU costs for synchronization between the wide table and the index.
Can the partition filter expression include non-primary key columns?
No. All columns in the partition filter expression—both regular partition expressions and the bucket expression—must be primary key columns of the wide table.
Can the bucket partition expression include calculation logic?
No. A bucket partition expression accepts only bucket_num and the bucket column name. Use transformation functions (such as from_unixtime) in regular partition expressions instead.
What happens when there are too many or too few partitions?
Too many partitions cause data bloat, increasing storage overhead and slowing queries. Keep each partition above 50 MB and bucket_num below 1024. Too few partitions limit read/write throughput and cause data skew. Keep each partition below 512 MB.
Can I query the columnstore index directly through the Lindorm compute engine?
Yes. Specify the index table name in your query and use the _use_ldps_ hint. For setup details, see Access columnstore data.
Contact Lindorm technical support (DingTalk ID: s0s3eg3) before modifying the columnstore index table directly.
Can I create multiple columnstore indexes for the same wide table?
No. Each wide table supports exactly one columnstore index.
Are TTL-expired rows automatically removed from the columnstore index?
No. TTL expiration in the wide table does not propagate to the columnstore index.
My index creation failed. Why does a new creation attempt also fail?
Each wide table supports at most one columnstore index, even when the index is in a failed state. Drop the failed index using DROP INDEX, then create a new one.