Primary Key Design Principles for Lindorm Wide Tables - Lindorm

LindormTable is a distributed data engine in which data is distributed based on primary keys. If the primary key of a table contains multiple columns, LindormTable uses the columns in a left-to-right order to query data. A poorly designed primary key concentrates reads and writes on a small set of partitions, creating hot spots that degrade performance. This topic covers the principles for designing effective primary keys and provides examples for common workloads.

Key concepts

GET vs. SCAN queries

Primary key design determines which query methods are available.

Query method	How it works	Requirement
GET	Point lookup by primary key	All primary key columns must be specified with explicit values
SCAN	Range scan over the primary key	The range must be specified on the first primary key column; full table scans are rejected by default

Example queries:

-- GET: retrieve a single order
SELECT * FROM table WHERE userid='abc' AND orderid=123

-- SCAN: retrieve a range of orders
SELECT * FROM table WHERE userid='abc' AND 123<orderid<456

-- SCAN with reverse order: retrieve most recent orders first
SELECT * FROM table WHERE userid='abc' AND 123<orderid<456 ORDER BY orderid DESC

For queries that cannot be expressed with GET or SCAN, use an index table or secondary index. For more information, see SELECT.

Design principles

Uniqueness

Different row versions share the same primary key. By default, a GET query returns only the latest version. Design primary keys to be unique unless you intentionally use multi-versioning.

A primary key can be a single column or a composite of multiple columns:

[userid] — one record per user
[userid][orderid] — multiple records per user, one per order

Column count and value length

Keep primary keys compact to reduce storage costs and improve write performance.

Column count: Use 1–3 primary key columns.
Value length: Prefer fixed-length values such as long integers. For variable-length values, keep each column under 2 KB.

First-column distribution

Data stored in Lindorm is distributed based on primary keys. If the primary key of a table contains multiple columns, data is distributed based on the columns in a left-to-right order. When values in the first primary key column are skewed or monotonically increasing, writes pile up on a single partition.

The following table shows whether common first-column types produce uniform or skewed distribution:

First primary key column	Distribution	Suitable?
User ID (high cardinality, random)	Uniform	Yes
Device ID (many devices, even access)	Uniform	Yes
Timestamp or auto-increment value	Sequential — all writes go to one partition	No
Order type or status code (few possible values)	Skewed — most writes go to a few partitions	No
Column with a shared prefix	Skewed — similar values land in the same partition	No

If you must use a timestamp or auto-increment column as the first primary key column, apply a hash prefix to distribute writes across partitions. See Avoid hot spots.

Avoid hot spots

When the first primary key column is monotonically increasing or has low cardinality, use one of the following techniques to distribute writes evenly. Each technique has a different trade-off for reads.

Hash prefix

Prepend a hash of the original key to scatter writes across partitions. Create a derived column pk1 using:

pk1 = hash(pk).substring(0, 4) + pk

The first four characters of the hash act as a random prefix. Writes spread evenly; reads use pk1 directly.

Primary key: [pk1][...]

Trade-off: To query by the original pk, compute the hash prefix at query time to construct pk1.

MD5 prefix

A common variant of hash prefix using MD5:

Primary key: [md5(userid).subStr(0,4)][userId][orderid]

Reverse index

Reverse the string representation of the key to break up sequential prefixes:

Primary key: [reverse(userid)][orderid]

Trade-off: Range scans on userid become less natural. Works best when exact-match lookups dominate.

Modulo bucket

Divide writes into a fixed number of buckets using modulo arithmetic. Assign each row to a bucket based on a monotonically increasing value:

long bucket = timestamp % numBuckets

Primary key: [bucket][timestamp][hostname][log-event]

Trade-off: To query all data for a time range, issue one query per bucket and merge the results.

Random salt

Append a random number to spread writes across partitions:

Primary key: [userId][orderid][random(100)]

Trade-off: Reads for a single [userId][orderid] require scanning across multiple salt values. Use this pattern only when write distribution is more critical than read simplicity.

Simplify primary keys

Reducing key size speeds up writes and lowers storage costs. Two common approaches:

Replace STRING with LONG or INT. Example: '2015122410' → Long(2015122410).
Replace long names with short codes. Example: 'taobao' → 'tb'.

Common designs

Log and time series data

Query goal	Primary key design
Query all records for a metric over a time range	`[hostname][log-event][timestamp]`
Retrieve the most recent records for a metric	`[hostname][log-event][timestamp DESC]`
Query time data with high write volume (hot timestamp)	`long bucket = timestamp % numBuckets;` `[bucket][timestamp][hostname][log-event]`

For "most recent records" queries, define timestamp DESC in the primary key so that new records sort to the top. A SCAN then returns the latest records without additional sorting.

For high-volume time series where a single timestamp column would create a hot partition, the modulo bucket pattern distributes writes. To read all records in a time range, query each bucket separately and merge the results.

Transaction data

Transaction workloads typically require multiple access patterns — by seller, by buyer, and by order ID. Design separate tables for each access pattern:

Access pattern	Table	Primary key design
Seller's transaction records by time	Seller table	`[seller_id][timestamp][order_number]`
Buyer's transaction records by time	Buyer table	`[buyer_id][timestamp][order_number]`
Order lookup by order ID	Order table	`[order_number]`

Join the three tables to perform queries when you need to access data across all three access patterns.