Prevent Hotspots with Wide Table Design in Lindorm - Lindorm

LindormTable stores petabytes of data in a distributed, LSM-Tree-based architecture. Because it partitions data by primary key ranges across nodes, the choices you make at table design time — primary key layout, index strategy, read/write patterns — directly determine query latency, write throughput, and whether hotspot nodes appear.

The core principle is query-driven design: identify your query patterns first, then design your primary key and indexes to serve them. This guide walks through those decisions so you can avoid the most common performance pitfalls.

How it works

Data model

LindormTable is a row-oriented storage engine. Each row consists of primary key columns and non-primary key columns.

The following CREATE TABLE statement creates an orders table used as the running example throughout this guide:

CREATE TABLE orders (
    channel VARCHAR NOT NULL,       -- Payment channel: alipay, wechat, unionpay
    id      VARCHAR NOT NULL,       -- Order ID
    ts      TIMESTAMP NOT NULL,     -- Payment timestamp
    status  VARCHAR,                -- Order status
    location VARCHAR,               -- Payment location
    PRIMARY KEY (channel, id, ts)
) WITH (DYNAMIC_COLUMNS='true');    -- Allow columns not declared in the schema

After the table is created, the data model looks like this:

Primary key column	Non-primary key column
channel	id	ts	status	location	……
alipay	a0001	1705786502000	0	shanghai	……
alipay	a0002	1705786502001	1	beijing	……
……	……	……	……	……	……
unionpay	u0001	1705786502056	0	hangzhou	……
unionpay	u0002	1705786502068	1	nanjing	……
……	……	……	……	……	……
wechat	w0001	1705786502056	0	shanghai	……
wechat	w0002	1705786502068	0	shanghai	……
……	……	……	……	……	……

Primary key columns have three fixed properties:

Immutable schema. You cannot add, remove, or reorder primary key columns, or change their data types after the table is created. Design the primary key carefully upfront.
Unique row identifier. The combination of all primary key columns uniquely identifies each row.
Clustered index. Data is physically sorted by the primary key columns in left-to-right order. LindormTable matches queries from the leftmost primary key column — similar to a composite index in MySQL.

Because data is clustered by primary key, queries that specify the leftmost column are efficient. If a range condition appears on any primary key column, LindormTable cannot use subsequent columns as an index filter and must scan all matching rows:

-- Scans all rows where channel = 'alipay' AND id > 'a0089',
-- then filters for ts = 1705786502068.
SELECT * FROM orders WHERE channel = 'alipay' AND id > 'a0089' AND ts = 1705786502068;

Omitting the leftmost column forces a full table scan even if other primary key columns are specified:

-- Full table scan — id is not the leftmost primary key column.
SELECT * FROM orders WHERE id = 'a0089';

If most queries filter by id, either make id the leftmost primary key column or create a secondary index on it.

Non-primary key columns behave differently from relational databases:

Columns can be written without being declared in the schema in advance (see Dynamic columns and Wildcard columns).
Each row must have at least one non-primary key column value — you cannot write a row containing only primary key values.
Querying a non-primary key column without an index triggers a full table scan. LindormTable rejects such queries by default. Either narrow the scan with a primary key range or create an index:

-- Rejected by default — full table scan on a non-indexed column.
SELECT * FROM orders WHERE location = 'shanghai';

-- Accepted — narrows the scan to rows where channel = 'alipay'.
SELECT * FROM orders WHERE location = 'shanghai' AND channel = 'alipay';

Data distribution

LindormTable partitions table data into regions based on primary key ranges and distributes them across nodes. All regions are contiguous ranges that together cover the full primary key space.

When a region grows beyond 8 GB (default), or when LindormTable detects a hotspot in that region, it splits the region into two sub-regions and redistributes them across nodes for load balancing. For example, the row {alipay, a100999, 1705786502068} is initially stored in Region_3. If that region exceeds the threshold, it splits into two sub-regions placed on separate nodes.

Because rows with the same primary key prefix can span multiple regions (for example, alipay rows across Region_1, Region_2, and Region_3), LindormTable handles splitting automatically. You do not need to use PARTITION BY when creating a table.

Hotspot types

A hotspot occurs when read or write traffic concentrates on a single region or node, causing that node to bottleneck the entire cluster. LindormTable has three hotspot patterns:

Single-row hotspot. A single row receives disproportionate reads or writes. All requests for that row always route to the same node. The only remedy is scaling up that node — horizontal scale-out does not help. Prevent this with a well-distributed primary key design.
Ranged hotspot. Traffic concentrates on a narrow key range, all within one region. LindormTable can detect and split such regions automatically. For better prevention, design a primary key whose values spread evenly — for example, prefix id values with a hash (hash(id)+id) so that similar IDs scatter across the key space.
Write hotspot from an incremental primary key. When the first primary key column is monotonically increasing (for example, a timestamp or auto-increment ID), all new writes land at the high end of the key space. Even after a region splits, writes continue going to the upper sub-region only — region splitting cannot fix this pattern.

Important

Do not use an incremental column as the first primary key column. This is the most common cause of write hotspots in LindormTable and cannot be resolved by splitting regions.

Some databases such as Cassandra use a hash of the incremental column as a partition key to distribute data. If you migrate data from such a system to Lindorm, the incremental primary key still causes write hotspots in LindormTable.

Primary key design

The primary key determines data distribution, sort order, and — when no secondary indexes exist — query performance. Design the primary key around your query patterns.

Key rules:

Keep it compact. Include only the columns needed to uniquely identify a row. Avoid columns that store JSON blobs or long strings.
Make the first column non-incremental. The first primary key column must have values that are spread across the key space. In the orders table, channel works well as the first column because its values (alipay, wechat, unionpay) spread writes across different regions, while the incremental id column comes second.

Handle incremental columns with salting or reversing. Do not use an incremental column as the first primary key column, and avoid including incremental columns in the primary key where possible. If a column with incremental values must be part of the primary key, do not place it first. To break the sequential ordering, use one of these techniques:

Technique	How it works	Trade-off
Hashing	Prefix the value with its hash: `hash(id)+id`	Distributes writes evenly; row order is not preserved
Reversing	Reverse the value: `reverse(id)`	Randomizes the leading digits; row order is not preserved

Design for your read patterns. If no secondary indexes exist, all filtering goes through the primary key. Put the columns you filter on most — especially with equivalent conditions — earlier in the key.

For more guidance, see Design primary keys for Lindorm wide tables.

Index table design

When your queries cannot be satisfied by the primary key alone, create a secondary index on the columns you need to filter by. For more information, see Secondary indexes.

A secondary index is an index table whose primary key is the indexed column. Apply the same design rules as the primary key: matching is left-to-right, and skipping the leftmost column forces a full scan. For example, if a composite secondary index covers columns (a, b, c), querying only WHERE b = xx scans the entire table.

Secondary indexes add write overhead — every write to the base table also writes to each index table. Keep the number of secondary indexes small. Index columns cannot be modified after creation; to change an index, drop it and create a new one.

For queries that combine many columns or need flexible filter combinations, use LindormSearch instead:

Index type	When to use	Dependencies	Real-time visibility
Secondary index	Queries with a stable filter pattern	None	Yes
Search index	Online queries with many indexed columns or flexible combined conditions	LindormSearch nodes required; search index feature must be enabled	Data syncs to LindormSearch with a short delay. See Overview.

Data writing

Write data in a pattern consistent with your primary key design so that writes spread across nodes rather than concentrating on one.

Batch writes are more efficient than single-row writes because they reduce Remote Procedure Call (RPC) count, letting the server process multiple rows in one operation. Keep each batch under 2 MB total. Larger batches risk exhausting server memory or triggering a full garbage collection, which degrades service stability.

LindormTable supports writes through either the ApsaraDB for HBase API or SQL:

Tables created with SQL (with typed columns such as INT or LONG) must be accessed via SQL. Writing to them with the ApsaraDB for HBase API can produce unreadable data when you query those columns via SQL.
Tables created with the ApsaraDB for HBase API can be accessed by SQL through column mapping. See Use SQL statements to access an HBase table.

Pre-split regions

By default, LindormTable starts with one region per table and splits automatically as data arrives. If you plan to write a large volume of data immediately after table creation — especially with Bulkload — pre-split the table to distribute the initial write load:

SQL or ApsaraDB for HBase API writes: Set the initial region count to number of nodes × 4. Adjust based on your actual workload.
Bulkload imports: Set the initial region count to size of data to import / 8. This distributes the imported data so that the initial regions do not need to split during import.

Make sure the pre-split ranges match your data distribution. If your key distribution does not align with the split points, data can cluster in a few regions even with many pre-splits.

For syntax details, see CREATE TABLE.

Data querying

Single-value queries

A single-value query specifies equivalent conditions on all primary key columns, identifying one or more specific rows. Examples:

-- Single row
SELECT * FROM orders WHERE channel = 'alipay' AND id = 'a0001' AND ts = 1705786502000;

-- Multiple rows: ts has multiple values — returns multiple rows
SELECT * FROM orders WHERE channel = 'alipay' AND id = 'a0001' AND ts IN (1705786502000, 1705786502222, 1705786502333);

-- Cartesian product: 3 id × 3 ts = 9 rows scanned
SELECT * FROM orders WHERE channel = 'alipay' AND id IN ('a0001', 'a0002', 'a0003') AND ts IN (1705786502000, 1705786502222, 1705786502333);

-- Cartesian product: 3 channel × 3 id × 3 ts = 27 rows scanned
SELECT * FROM orders WHERE channel IN ('alipay', 'wechat', 'unionpay') AND id IN ('a0001', 'a0002', 'a0003') AND ts IN (1705786502000, 1705786502222, 1705786502333);

LindormTable returns all results of a single-value query at once. A large number of IN conditions compounds quickly (the rows scanned equal the Cartesian product of all IN sets), risking server memory exhaustion or garbage collection pauses.

The default limit is 2,000 rows per single-value query. Exceeding this returns:

Multi Get Plan query too many rows in one select

To raise this limit while keeping memory usage safe, contact technical support (DingTalk ID: s0s3eg3).

Range queries

A range query omits one or more primary key columns from its conditions. LindormTable streams results back without loading them all into memory, so range queries have no row count limit. You do not need LIMIT or cursor-based pagination.

-- Full table scan
SELECT * FROM orders;

-- Range scan: all rows where channel = 'alipay'
SELECT * FROM orders WHERE channel = 'alipay';

-- Range scan: all rows where channel = 'alipay' AND id = '1705786502001'
SELECT * FROM orders WHERE channel = 'alipay' AND id = '1705786502001';

ORDER BY

Data in LindormTable is physically sorted by the primary key, so SELECT results come back in primary key order even without an ORDER BY clause. If the query hits an index table, results come back in index key order instead.

For efficient sorting:

Include the leftmost primary key column in ORDER BY. Sorting by non-leftmost or non-primary key columns requires a large in-memory sort.
For descending sort (ORDER BY ... DESC), add the DESC keyword to the column definition at table creation time. This stores data in descending order and avoids a runtime reversal.

For details, see Use ORDER BY in large result sets.

COUNT queries

LindormTable is built on an LSM-Tree structure that stores multiple versions and delete markers. The row count is not stored in table metadata, so SELECT COUNT(*) requires scanning all data. On large tables this is slow.

For an estimated row count, see Count the number of rows in a table.

If a WHERE clause is specified, scan time depends on how many rows match. For example:

SELECT COUNT(*) FROM orders WHERE channel = 'alipay';

This scans only the rows where channel = 'alipay'.

Data deletion

Deleting a row does not remove it immediately — LindormTable adds a delete marker and physically removes the data during the next compaction. Until then, both the row and its delete marker are visible to the storage engine but filtered from query results.

A high rate of deletions accumulates delete markers and degrades query performance. Where possible, use the TTL attribute instead of explicit deletes to let LindormTable expire old data automatically. If large-scale deletions are unavoidable, shorten the compaction interval to clear markers faster. See ALTER TABLE for how to configure TTL and compaction settings.

Important

A shorter compaction interval increases system load. Set it based on your actual deletion volume and system capacity.

Updating rows in a table with secondary indexes triggers a delete-then-insert in each index table. Heavy update workloads generate delete markers in index tables, degrading index query performance. Shorten the compaction interval to keep index tables clean.

What's next

For troubleshooting unexpected query results caused by LSM-Tree versioning, timestamps, or TTL behavior, see Common causes for unexpected query results.
Monitor CPU, network, disk capacity, file counts, region sizes, and compaction backlogs to avoid resource exhaustion. For quota and limit details, see Quotas and limits.
For SQL error resolution, see FAQ.