Best practices for wide table design - Lindorm - Alibaba Cloud Documentation Center

The Lindorm wide table engine (LindormTable) allows you to store petabytes of data. Data in LindormTable is partitioned by ranges based on the primary key and is evenly distributed across nodes. In addition, Lindorm supports SQL syntaxes and indexes to deliver an experience akin to relational databases. However, Lindorm wide tables are built based on a distributed NoSQL database architecture that uses LSM trees, which distinguishes them form traditional relational databases. Therefore, this topic first introduces the basic principles such as the data model and data distribution pattern of Lindorm first to help you use Lindorm wide tables more effectively and avoid performance and hotspot data issues caused by inappropriate modeling method in your business.

Data model

LindormTable is a row-oriented storage engine. You can execute the following SQL statement in LindormTable to create a sample table named orders:

CREATE TABLE orders (
    channel VARCHAR NOT NULL, # Specify a column that stores payment channels, such as Alipay or WeChat.
    id VARCHAR NOT NULL,            # Specify a column that stores order IDs.
    ts TIMESTAMP NOT NULL,          # Specify a column that stores the time when an order is paid.
    status VARCHAR,                 # Specify a column that stores order status.
    location VARCHAR,               # Specify a column that stores the location where an order is paid.
    PRIMARY KEY(channel, id, ts)    # Specify that the primary key of the table contains the channel, id, and ts columns.
) WITH (DYNAMIC_COLUMNS='true');    # Enable dynamic columns for the table. In this case, you can add a non-primary key column to the table without the need to define it in the table schema in advance.

After the table is created in LindormTable, a data model described by the following table is built.

Primary key column			Non-primary key column
channel	id	ts	status	location	……
alipay	a0001	1705786502000	0	shanghai	……
alipay	a0002	1705786502001	1	beijing	……
……	……	……	……	……	……
unionpay	u0001	1705786502056	0	hangzhou	……
unionpay	u0002	1705786502068	1	nanjing	……
……	……	……	……	……	……
wechat	w0001	1705786502056	0	shanghai	……
wechat	w0002	1705786502068	0	shanghai	……
……	……	……	……	……	……

Primary key columns

Columns in a row can be categorized into primary key columns and non-primary key columns. The primary key of a table can contain multiple columns and has the following characteristics:

The schema of the primary key cannot be modified: The primary key of a Lindorm wide table is determined when you create the table. After the table is created, you cannot add, delete, or change the data types and order of the primary key columns. Therefore, design the primary key of a table carefully before you create the table. The primary key of a wide table significantly affects the performance of the table for request handling.
The primary key is unique in a table: All primary key columns of a table form a unique row key in the table. Therefore, you can identify a unique row in a table based on a complete primary key.
The primary key is used as a clustered index: In a Lindorm wide table, data is clustered based on the order of the primary key columns. For example, in the sample table orders, rows with the same value in the channel column are clustered. These rows are further clustered based on their values in the id column. Then, rows with the same value in the id column are clustered based on their values in the ts column.
This way, the primary key is used as a clustered index for higher query efficiency. Lindorm preferentially matches a query to data from the leftmost primary key column, which is similar to MySQL. You can specify more equivalent conditions that match the primary key in a query to reduce the amount of queried data. You can specify the primary key column that most matches query conditions as the leftmost primary key column. If a range condition is specified for a primary key column in a query, the leftmost primary key is not preferentially matched even if other equivalent conditions are specified for other primary key columns. In this case, LindormTable needs to scan and filter a large amount of data to find the queried data.
For example, you can execute the following statement to query data in the sample table orders:
```
-- LindormTable scans all rows whose value in the channel column is alipay and whose value in the id column is larger than a0089 to locate the row whose value in the ts column equals to 1705786502068. 
SELECT * FROM orders WHERE channel=="alipay" AND id > 'a0089' AND ts = 1705786502068;  
```
If the leftmost primary key column is not specified in the query condition, LindormTable scans all rows in the table even if other primary key columns are specified in the condition. The following statement provides an example:
```
-- LindormTable scans all rows in the table.
SELECT * FROM orders WHERE id = 'a0089';
```
In the preceding statement, a query condition is specified for the primary key column id, which is not the leftmost primary key column. Therefore, LindormTable scans all rows to locate the row whose value in the id column equals to a0089, which is inefficient. If most of the queries in your business contain conditions for the id column, we recommend that you specify the id column as the leftmost primary key column or create an individual index table for the id column to accelerate the queries.

Non-primary key columns

Lindorm allows you to dynamically define non-primary key columns for a table rather than defining them in the table schema. You can directly write data to a non-primary key column whether it exists in the table, which is similar to the usage of non-primary key columns in HBase. For more information, see Dynamic columns. In addition, you can use wildcards to define non-primary key columns. For example, if you execute a statement to write data to the *_str column whose data type is STRING, the data can be written to all STRING columns whose name ends with _str. For more information, see Wildcard columns.

Lindorm allows you to update data in non-primary key columns. When you write data to a table, you do not need to specify values for all non-primary key columns. However, you must specify a value for at least one non-primary key column because you cannot write a row of data that contains only the primary key. Data in a Lindorm wide table is stored based on the order of the primary key. If you specify a query condition for a non-primary key column on which no index is created, all data in the table is scanned to locate the queried data. By default, this kind of queries are rejected by Lindorm. Therefore, to query data in non-primary key columns more efficiently, you can specify a range condition for the primary key or create indexes for the non-primary key columns.

-- Specify conditions only for non-primary key columns. In this case, all data in the table is scanned.
SELECT * FROM table WHERE location = 'shanghai'; 

-- Specify conditions for primary key columns. In this case, only rows whose value in the channel column is alipay are scanned to locate the row whose value in the location column is shanghai.
SELECT * FROM table WHERE location = 'shanghai' and channel='alipay';

Data distribution

Lindorm is a distributed database service. Data in a Lindorm wide table is partitioned into regions based on the ranges of primary key values and is stored across all nodes of a Lindorm instance. The following figure shows how data is partitioned in Lindorm.

In Lindorm, data in a table is partitioned into regions. Each region stores a piece of data in the table. All regions are consecutively partitioned based on the ranges of primary key values, forming the entire table space. For example, if you write the following row of data to a table, the row is stored in Region_3: {alipay,a100999,1705786502068}. When the size of data in Region_3 exceeds the threshold (8 GB by default) or when hotspot data that is frequently read or written is detected in this region, LindormTable splits the region into two sub-regions, which are separately the upper part and lower part of the original region. LindormTable distributes the two sub-regions to different nodes based on workloads for load balancing.

Rows with the same prefix may be stored in different regions. For example, if a region cannot completely store all rows whose value in the channel column is alipay, these rows may be stored in different regions. As shown in the preceding figure, rows whose value in the channel column is alipay are separately stored in Region_1, Region_2, and Region_3. When you write data to a Lindorm wide table, LindormTable automatically splits regions based on the written data. This way, you do not need to worry about hotspot data that is clustered in specific ranges, nor do you need to use the PARTITION BY clause to split a table into regions when you create the table.

Pre-split regions

Lindorm supports automatic region splitting. Therefore, you do not need to specify the range of regions when you create a Lindorm wide table. LindormTable automatically splits regions for a table when data is written to the table. If you need to distribute data to specific regions in a table, you can specify the number of pre-split regions when you create the table. LindormTable stores the specified regions across multiple nodes to distribute read and write requests. Note that the number of regions that you specify when you create a table is only the number of initial regions. LindormTable further splits the initial regions when data is written to the table. For more information about how to specify pre-split regions when you create a table, see CREATE TABLE.

If you want to write large amounts of data or use Bulkload to import data to a table immediately after it is created, we recommend that you specify pre-split regions based on the required data distribution to prevent a single node from being overloaded.

If you plan to use SQL or the HBase API to write data, you can set the number of pre-split regions to Number of nodes x 4 when you create a table. The optimized number of initial regions varies with scenarios. We recommend that you set the number of pre-split regions based on your business requirements.
If you use Bulkload to import data to a table, we recommend that you set the number of pre-split regions to Size of the data to import / 8. This way, the imported data can be distributed to each region and does not trigger the splitting of these regions.

In addition, make sure that the ranges of pre-split regions conforms to the data writing pattern. Otherwise, the written data may be clustered only in specific regions. In this case, the performance of the table may be degraded even if you specify a large number of pre-split regions.

Hotspot data

A distributed system is designed to evenly distribute requests to each region. This way, the system can be horizontally scaled to handle more requests with more data. When only a piece of data in a system is frequently accessed, the data becomes hotspot data. In this case, the performance of the node on which the hotspot data is stored bottlenecks the performance of the entire distributed system.

Single-row hotspot data: When a single row of data is frequently read or written, this row of data becomes single-row hotspot data. Requests for the same row are always distributed to the same node. Therefore, the performance of this node bottlenecks the entire system. In this case, you can only resolve this issue by scaling up the specification of the node but not scale out the number of nodes in the system. To avoid single-row hotspot data in a wide table, you must design an appropriate schema or primary key for the table when you create the table.
Ranged hotspot data: When data within a small range is frequently read or written, the data becomes ranged hotspot data. A Lindorm wide table is partitioned based on the ranges of its primary key. Therefore, data within a small range may be stored in the same region. Therefore, requests for the data within the range are distributed to the same node, which bottlenecks the performance of the system. Lindorm can identify hotspot data within a small range and automatically distribute the hotspot data to different nodes by splitting regions. However, we recommend that you design a more distributed primary key for the table to avoid ranged hotspot data. For example, in the sample table orders, you can use the HASH function to convert the values in the id column to hashes and use this column as the primary key to avoid ranged hotspot data.
Hotspot written data caused by incremental primary key: Hotspot written data may exist if the primary key of a table is incremental. A Lindorm wide table is partitioned based on the ranges of its primary key. Therefore, even if a region is split into sub-regions, the data that is subsequently written to the region is stored only in the lower part of the sub-region. For example, in the sample table orders, the primary key column id is incremental, and a large number of data records that meet the channel=alipay condition is written to the table. In this case, even if Region_3 is split into Region_3_a and Region_3_b, data is only written to Region_3_b. Hotspot written data caused by incremental primary key cannot be completely resolved in Lindorm due to the system architecture. Therefore, we recommend that you do not specify an incremental column as the first primary key column and avoid including incremental columns in the primary key.
Important
Some databases or systems, such as Cassandra, use a column that contains the hash values of the incremental primary key column as the partition key. This way, the values of the table can be evenly distributed to different partitions. However, if you migrate data from one of the databases or systems to Lindorm, the incremental primary key column incurs hotspot written data.

Primary key design

The design of primary key is critical for a Lindorm wide table because the primary key determines the data distribution and sorting in the table. Therefore, to properly allocate resources for and efficiently use a wide table, you must design an appropriate primary key for the table. For more recommendations on how to design primary keys, see Design primary keys for Lindorm wide tables.

Take note the following notes when you design the primary key of a table:

On the premise of uniquely identifying each row in the table, the primary key should contain as less data as possible. Do not include columns that store JSON data or web page content in the primary key.
The data in the first primary key column must be discreated. If the first primary key column contains incremental values such as order IDs, you can append their hashes to the values as prefixes (hash(id)+id) or reverse the values (reverse(id)).
The first primary key column cannot contain incremental values. Otherwise, hotspot data significantly bottlenecks the write performance of the table. If a primary key column must contain incremental values, do not specify it as the first primary key column. Ensure that the values in the first primary key column are discreated. For example, the first primary key column in the sample table orders is channel, which is not an incremental column. This way, data with the same value in the channel column is further distributed to different regions based on the value of the incremental column id, which avoids hotspot data.
If no index tables are created for a table, the query performance of the table is mostly determined by the table primary key. Therefore, you must consider the pattern in which the table is queried when you define the table primary key.

Index table design

If the primary key cannot meet the query requirements in your business, you can create a secondary index for the column that you want to query. For more information, see Secondary indexes.

A secondary index is an index table that uses the indexed column as the primary key. Therefore, the column for which a secondary index is created must be designed in the same manner as the primary key.

Queries on an indexed column are matched from the leftmost secondary index key column. For example, a federated secondary index is created for the a, b, and c columns. If only conditions for the b and c columns are specified in a query (such as SELECT * FROM tablename WHERE b=xx;), the query scans all data in the table.

If you write data to a base table for which a secondary index is created, the data is written to both the base table and the index table. This introduces additional write operations and may degrade the write performance of the base table. Therefore, to ensure the read and write performance of a table, do not create a large number of secondary indexes for the table. In addition, columns in a secondary index cannot be modified. If a created secondary index does not meet your requirement, you can only delete it and then create another one. Therefore, secondary indexes are applicable to queries with a pattern that does not frequently change. In scenarios where you need to use indexes and combined conditions to query data in a table that contains a large number of columns, we recommend that you use the search index provided by Lindorm.

The following table describes the two types of indexes provided by Lindorm.

Index type	Scenario	Dependency	Real-time visible
Secondary index	Queries with a pattern that does not frequently change.	None	Yes
Search index	Online queries that involve a large number of indexed columns and combined query conditions.	You must purchase LindormSearch nodes and enable the search index feature.	Data needs to be synchronized to LindormSearch, which causes a latency of the index. For more information, see Overview.

Data writing

You must write data to a wide table in a pattern that is compliant with the design of the table primary key. This way, written data can be evenly distributed to each node to avoid performance bottlenecks. You can write multiple rows of data to a table in a batch. This is more efficient than single-row writing because it reduces the number of Remote Procedure Calls (RPCs). This way, multiple rows written to a table can be processed by the server in a batch, which utilizes the throughput more efficiently. However, do not write a large number of rows of data in a batch. Otherwise, the memory of the server may be exhausted or a full garbage collection may be triggered. In this case, the service stability of Lindorm may be degraded. We recommend that you write no more than 2 MB of data in total in a batch.

You can use either the ApsaraDB for HBase API or SQL to write data to a Lindorm wide table. If a table is created by using SQL and the data type of each column (such as INT and LONG) in the table is defined, you may not be able to query data from the columns by using SQL after you write data to these columns by using the ApsaraDB for HBase API. Tables created by using SQL can be accessed only be using SQL. In contrast, tables created by using the ApsaraDB for HBase API can be accessed by SQL. You can configure column mappings to modify the table schema. For more information, see Use SQL statements to access an HBase table.

Data query

Single-value query and range query

Two types of queries are commonly performed on Lindorm wide tables: single-value queries and range queries.

Single-value query

If you specify conditions on all primary key columns in a query, the query is a single-value query. Examples:

SELECT * FROM orders WHERE channel='alipay' AND id='a0001' AND ts=1705786502000;
SELECT * FROM orders WHERE channel='alipay' AND id='a0001' AND ts IN (1705786502000, 1705786502222, 1705786502333);
SELECT * FROM orders WHERE channel='alipay' AND id IN ('a0001', 'a0002', 'a0003') AND ts IN (1705786502000, 1705786502222, 1705786502333);
SELECT * FROM orders WHERE channel IN ('alipay', 'wechat', 'unionpay') AND id IN ('a0001', 'a0002', 'a0003') AND ts IN (1705786502000, 1705786502222, 1705786502333);

In the preceding examples, the first statement is a single-value query performed on a single row, other statements are single-value queries performed on multiple rows. In the second statement, multiple conditions are specified for the ts column. Therefore, multiple rows of data are returned. In the third statement, three IN conditions are separately specified for the id and ts primary key columns. Therefore, this statement scans 9 (3 x 3) rows of data. Similarly, the fourth statement scans 27 (3 x 3 x 3) rows of data. If you specify more IN conditions in a single-value query, the conditions are combined based on the Cartesian product of the condition sets. In this case, more rows of data are requested and queried. When you perform single-value queries on multiple rows in a Lindorm wide table, all query results are returned at a time. Therefore, if you perform single-value queries on more rows, Lindorm must scan more rows of data and return a larger result set. In this case, the memory of the server may be exhausted or a full garbage collection may be triggered. Therefore, do not perform single-value queries on a large number of rows at the same time and reduce the number of IN conditions and their combinations in the queries.

By default, you can perform up to 2,000 single-value queries at a time in Lindorm. Otherwise, the Multi Get Plan query too many rows in one select error is returned. If you need to perform a large number of single-value queries and make sure that the queries do not exhaust the memory of your Lindorm instance, you can contact the technical support (DingTalk ID: s0s3eg3) to increase the limit.

Range query

If a query includes no conditions for the primary key columns of a table or includes conditions only for a part of primary key columns of a table, the query is a range query. Examples:

SELECT * FROM orders;
SELECT * FROM orders WHERE channel='alipay';
SELECT * FROM orders WHERE channel='alipay' AND id='1705786502001';
SELECT * FROM orders WHERE channel='alipay' AND id IN ('a0001', 'a0002', 'a0003');
SELECT * FROM orders WHERE channel IN ('alipay', 'wechat', 'unionpay') AND id IN ('a0001', 'a0002', 'a0003');

The preceding statements are all range queries because they do no include conditions for all primary key columns of the table.

Lindorm can return all results of a range query as streaming data without exhausting the memory even if the query scans all data in the table. Therefore, the maximum number of range queries that can be performed at a time is not limited. You do not need to add LIMIT or OFFSET in the query statement to return results by page or based on the specified cursor.

ORDER BY query

Data in a Lindorm wide table is sorted by the primary key. Therefore, the query results are returned in the order of the primary key even if you do not specify an ORDER BY clause in the SELECT statement. However, if the conditions in a query hit an index table, the results are returned in the order of the index table, which is the order of column for which the index table is created.

If you specify multiple columns in the ORDER BY clause, the result is sorted by the columns in a left-to-right order. To efficiently use the index, you must include the leftmost primary key column in the ORDER BY clause. If you specify only non-leftmost primary key columns or non-primary key columns in the ORDER BY clause, a large number of calculation operations are performed to sort the returned results. Therefore, we recommend that you use the leftmost primary key column to sort the result or create a secondary index for the non-primary key column that you want to use to sort the result. In addition, if you want to sort the query results in descending order (ORDER BY DESC), we recommend that you add the DESC keyword to the statement used to create the table. This way, data is stored in descending order and provides optimized query performance. For more information about how to use the ORDER BY clause, see Use ORDER BY in large result sets.

COUNT query

Lindorm is a NoSQL database service built based on the LSM-Tree storage structure. In this structure, multiple versions of data and delete markers are stored because data may be modified or deleted for multiple times. Therefore, the underlying storage of Lindorm does not store the number of rows in each table in the table metadata. To accurately query the number of rows in a table, you must scan all data in the table. Therefore, it takes a long period of time to query the number of rows in a table that contains large amounts of data. If you need to obtain only the estimated number of rows in a table, see Count the number of rows in a table.

If conditions are specified in a COUNT query, the duration of the query depends on the amount of data that needs to be scanned under the specified conditions. For example, if you execute the SELECT count(*) FROM table WHERE channel = 'alipay'; statement, the number of rows whose value in the channel column is alipay are counted.

Data deletion

When you delete a row of data from a Lindorm wide table, Lindorm does not immediately delete the row but adds a delete marker to it. This row of data is not permanently deleted until the COMPATCTION operation is performed. Before a row of data is permanently deleted, the row and the delete marker added to it can both be queried. However, they are filtered out from the query result by Lindorm. Therefore, if you perform a large number of data deletion operations, more delete markers are generated and the query performance may be degraded. We recommend that you configure the TTL attribute of a table to regularly clear data that you no longer require instead of directly deleting the data. If you have to perform a large number of data deletion operations in your business, you can specify a short interval for the COMPACTION operation to clear deleted data and delete markers more quickly. For more information about how to configure the TTL attribute and the interval at which the COMPACTION operation is performed, see ALTER TABLE.

If you update data in a table for which secondary indexes are created, Lindorm deletes the corresponding data in the secondary index table first and then inserts the updated data to the secondary index table. Therefore, if you update large amounts of data in the base table, a large number of delete markers are also generated in the secondary index tables, which may degrade the query performance of the index tables. You can specify a short interval for the COMPACTION operation to clear delete markers more quickly.

Important

However, a shorter interval for the COMPACTION operation increases the load of the system. Configure it based on your actual requirements.

Additional information

LindormTable is built upon the LSM-Tree storage structure. If you are not familiar with its inherent features such as versioning, timestamps, and TTL, the results of write and query operations may not meet your expectations due to improper usage. In this case, see Common causes for unexpected query results.
When you use Lindorm wide tables, you must monitor the usage of your instance resources, such as CPU, network, and disk capacity, to avoid performance degradation caused by insufficient resources. You must also pay attention to the number of files on the server, the size of each region, and whether the COMPACTION operation has a backlog. This prevents data write queries from exceeding the limit of Lindorm. For more information about the limits of Lindorm, see Quotas and limits.
For more information about how to resolve common errors, see FAQ.