Lindorm provides two efficient and easy-to-use indexes: secondary indexes and search indexes. These indexes are suitable for different scenarios, such as non-primary key matching, multi-dimensional queries, . You can use the CREATE INDEX syntax to specify the index type and add index properties.
Engine and version
The CREATE INDEX syntax applies only to LindormTable. No version limits apply.
To create a search index or a columnstore index using the CREATE INDEX syntax, you must use Lindorm SQL V2.6.1 or later. To view the version of Lindorm SQL, see SQL version guide.
Important notes
Index building for secondary indexes, columnstore indexes, and search indexes requires data lookups, which generate read operations. If you enabled the hot and cold data separation feature for your instance, monitor the throttling of cold storage (storage-optimized cloud storage). Throttled read operations on cold storage directly affect the efficiency of index building. This can cause backpressure on write operations.
Syntax
create_index_statement ::= CREATE INDEX [IF NOT EXISTS] [ index_identifier ]
[ USING index_method_definition ]
ON table_identifier '(' index_key_expression ')'
[ INCLUDE include_expression]
[ PARTITION BY partition_definition ]
[ { ASYNC | SYNC} ]
[ WITH '(' index_options ')' ]
index_method_definition ::= { KV | SEARCH | COLUMNAR }
index_key_expression ::= '('
index_key_definition
|wildcard_string_literal
')'
index_key_definition ::= {
column_identifier [ DESC ]
| column_identifier'(' column_options ')'
| function_expression
}
[ (',' index_key_definition)* ]
column_options ::= '('
option_definition (',' option_definition )*
')'
function_expression ::= function_identifier
'('
column_identifer ( ',' column_identifer )*
')'
option_definition ::= option_identifer '=' string_literal
include_expression ::= '('
column_identifier( ',' column_identifier )*
')'
partition_definition ::= {
{RANGE TIME}
'('
column_identifer
')' [ PARTITIONS number_literal ]
|
HASH '('
column_identifer
( ',' column_identifer )*
')' [ PARTITIONS number_literal ]
|
ENUMERABLE
'('
column_identifer
( ',' column_identifer )*
index_options ::= '('
option_definition (',' option_definition )*
')'Differences
LindormTable supports two types of indexes that can be created with the CREATE INDEX statement: secondary indexes and search indexes.
The supported syntax elements vary by index type. The following table describes the differences.
Syntax element | Secondary index | Search index |
✔ | 0 | |
✔ | 0 | |
0 | ✖ | |
✖ | 0 | |
Index building mode (ASYNC|SYNC) Important Only LindormTable 2.6.3 and later support the | 0 | 0 |
0 | 0 |
Usage notes
You can create a maximum of 3 secondary indexes, 1 search index, for each wide table.
Index type (index_method_definition)
When you create an index, you can use the USING keyword to specify the index type. LindormTable supports the following three index types:
Parameter | Index type | Description |
KV | Secondary index | If you do not explicitly specify an index type in the Important An instance can have a maximum of eight concurrent secondary index building tasks. If you try to create a secondary index when eight tasks are already running, the statement fails. |
SEARCH | Search index | A full-text search index based on a search engine. It is mainly used for complex multi-dimensional query scenarios, such as tokenization, fuzzy queries, aggregation and analysis, and sorting with pagination. For more information, see Introduction to search indexes. Search indexes support all basic data types except DATE, TIME, and DECIMAL. For more information about data types, see Basic data types. Important
|
Index key expression (index_key_expression)
You can define one or more columns as index keys. An index with multiple index keys is called a composite index.
Index key definition (index_key_definition)
If the index type is a search index, you can add properties to the index keys. If the index type is a secondary index, you can specify a function expression as the index key.
Search index key properties (option_definition)
When you add an index column using the ALTER INDEX statement, you can also specify properties for the index key. For example, c3(type=text,analyzer=ik) creates an index on column c3 and specifies that the ik tokenizer is used for this column.
The following table describes the index key properties supported by search indexes.
Index key property | Type | Description |
indexed | STRING | Specifies whether to create an index for the specified column in the index key. An inverted index is created for string type columns. A numeric index, such as a BKD-Tree index, is created for numeric type fields.
|
rowStored | STRING | Specifies whether to store the raw data.
|
columnStored | STRING | Specifies whether to use column store to accelerate sorting and analysis.
|
type | STRING | For tokenization scenarios, set the type parameter to text for the tokenized field. In other cases, the data type is the same as the wide table data type by default. Important The type parameter must be used with the analyzer parameter. |
analyzer | STRING | Specifies the list of tokenizers. Valid values:
Important The analyzer parameter must be used with the type parameter. |
mapping | STRING | Custom index key properties, represented as a JSON object in a string. This is compatible with the Elasticsearch index key property syntax. Important
|
Secondary index function expression (function_expression)
When you create a secondary index, you can specify a function expression as the index key. Five function expressions are supported: Z-ORDER, S2, CAST, MD5, and SHA256.
ImportantOnly LindormTable 2.6.7.5 and later supports the use of MD5 or SHA256 function expressions as index keys. If you cannot upgrade your LindormTable engine in the console, contact Lindorm technical support on DingTalk at s0s3eg3.
Z-ORDER function: Creates a spatio-temporal secondary index for one or more columns of a spatio-temporal data type. The syntax is as follows:
Z-ORDER '(' column_identifier ( ',' column_identifer )* ')'The data type of column_identifier must be a spatio-temporal data type. For more information about spatio-temporal indexes, see Spatio-temporal indexes.
CAST function: Creates an index on the result of a data type conversion for a column. The syntax is as follows:
CAST(column_identifier AS type)type is a data type. For more information, see Basic data types.
S2 function: Creates an S2 grid secondary index for a column of a spatio-temporal data type. The syntax is as follows:
S2 '(' column_identifier, level ')'The data type of column_identifier must be POLYGON or MULTIPOLYGON. The value of level ranges from 1 to 30. For more information, see S2 index function.
MD5 function: Creates an index on the MD5 encoded value of a VARCHAR column. The syntax is as follows:
MD5(column_identifier)For more information about the MD5 function, see MD5 function.
SHA256 function: Creates an index on the SHA256 encoded value of a VARCHAR column. The syntax is as follows:
SHA256(column_identifier)For more information about the SHA256 function, see SHA256 function.
Wildcard constant (wildcard_string_literal)
Only search indexes support the wildcard constant (*).
The wildcard constant (*) is used to build an index on all columns. For example, CREATE INDEX IF NOT EXISTS idx5 USING SEARCH ON test(*).
After the statement is executed, newly added columns are not automatically added to the search index or columnstore index. You must add them manually using the
ALTER INDEXstatement.Dynamic columns are not included in the index. For more information, see Dynamic columns.
Included columns (include_expression)
Included columns, also known as a covering index, are columns from the primary table that are included in the index table. This feature avoids the need to look up the primary table after a query hits the index, which improves query performance.
For secondary indexes, you can use the WITH keyword to add table properties to include dynamic columns. For more information, see Secondary indexes.
Index partition (partition_definition)
Only search indexes support index partitioning.
Index partitioning is an index management policy. The server-side automatically splits and stores data. When you query data, the system automatically performs partition pruning.
Search indexes support RANGE partitioning and HASH partitioning. For more information, see Partitioned indexes.
Index building mode (ASYNC|SYNC)
When you use the CREATE INDEX statement to create an index, you can specify the index building mode using the ASYNC or SYNC keyword.
ASYNC: Creates the index asynchronously. After the
CREATE INDEXstatement is executed, an index building task starts, and the statement returns immediately, regardless of whether the index is successfully built.SYNC: Creates the index synchronously. After the
CREATE INDEXstatement is executed, an index building task starts immediately. TheCREATE INDEXstatement returns a result only after the index building task is complete.
The following table describes the building modes supported by the two index types.
Index building mode | Secondary index | Search index |
ASYNC Important Starting from LindormTable 2.6.1, the default index building mode for the | 0 | 0 |
SYNC Important Only LindormTable 2.6.3 and later support synchronous index building. | 0 | ✔ |
Index properties (index_options)
When you create an index using the CREATE INDEX statement, you can use the WITH keyword to specify index properties. The following index properties are supported.
Secondary index
Property | Type | Description |
COMPRESSION | STRING | The compression algorithm for the index table. Supported algorithms include the following:
|
INDEX_COVERED_TYPE | STRING | The following values are available for the index redundancy method:
You can also use the |
STARTKEY | STRING | The start key of the index table. Important You cannot set a start key for a timestamp column or a column of a spatial data type. |
ENDKEY | STRING | The end key of the index table. Important You cannot set an end key for a timestamp column or a column of a spatial data type. |
NUMREGIONS | INTEGER | The number of pre-partitions for the index table. Important You cannot set the number of pre-partitions for a timestamp column or a column of a spatial data type. |
Search index
Property | Type | Description |
indexState | STRING | The state of the search index. Valid values:
|
numShards | INTEGER | Specifies the number of shards. The default value is twice the number of search nodes. Important The data volume of a single shard is recommended to be between 30 million and 100 million rows. The storage size is recommended to be between 30 GB and 50 GB. If the data volume of a shard is very large, the read and write performance of the search index may decrease. In addition, a single shard has a limited data capacity. Exceeding 2 billion rows affects system stability. Therefore, plan the number of index shards before you create a search index for production. If your data contains a time attribute field and the data volume increases significantly over time (such as order data or log data), we recommend that you create a time-partitioned index. |
RANGE_TIME_PARTITION_START | INTEGER | The number of days before the index creation operation to start creating partitions. This applies to scenarios with historical data. If the timestamp of historical data is earlier than the partition start time, an error occurs. Note This parameter is required when you create a partitioned index. |
RANGE_TIME_PARTITION_INTERVAL | INTEGER | The interval in days for creating new partitions. For example, Note This parameter is required when you create a partitioned index. |
RANGE_TIME_PARTITION_TTL | INTEGER | The number of days to retain partition data. For example, Note This parameter is required when you create a partitioned index. |
RANGE_TIME_PARTITION_MAX_OVERLAP | INTEGER | If the data being written has a future timestamp, this parameter specifies the maximum allowed time interval from the current time, in days. If not specified, data from one day in the future is allowed by default. |
RANGE_TIME_PARTITION_FIELD_TIMEUNIT | LONG | The unit of the time partition field specified by the business. The default unit is milliseconds (ms).
|
RANGE_TIME_PARTITION_CHS | INTEGER | The boundary for hot and cold data. Specifies the age of data to be transferred to cold storage. The default unit is seconds (s). For example, Note If this parameter is not set, hot and cold data separation is not enabled. Data is stored only in hot storage by default. |
INDEX_SETTINGS | STRING | Custom index properties, represented as a JSON object in a string. This is compatible with the Elasticsearch index settings syntax. Important The INDEX_SETTINGS parameter applies only to Elasticsearch-compatible versions. |
SOURCE_SETTINGS | STRING | Properties for the raw data storage policy of search index columns. This is a JSON object in a string and is compatible with the _source settings syntax of Elasticsearch. By default, a search index does not store the raw data of index columns. The following parameters are supported:
Important The
Note Configure the SOURCE_SETTINGS parameter only when you need to perform data queries through the search engine visualization user interface. You do not need to configure this parameter for regular use cases. |
Examples
Assume that the primary table `test` is created using the following statement:
CREATE TABLE test (
p1 VARCHAR NOT NULL,
p2 INTEGER NOT NULL,
c1 BIGINT,
c2 DOUBLE,
c3 VARCHAR,
c4 TIMESTAMP,
c5 GEOMETRY(POINT),
PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');Secondary index
Asynchronously build an index
If you do not specify a building mode, the index is built asynchronously by default.
CREATE INDEX idx1 ON test(c1 desc) include(c3,c4) WITH (COMPRESSION='ZSTD');Result Verification
Run SHOW INDEX FROM test to check whether the index is created.
Create a composite index
Create a composite index on columns c1, c2, and c3 synchronously.
CREATE INDEX idx1 ON test(c1, c2, c3) include(c4) SYNC WITH ( COMPRESSION ='ZSTD');Verifying the results
Run SHOW INDEX FROM test to check whether the index is created.
Create a spatio-temporal secondary index
The following example shows how to create a spatio-temporal secondary index. For more information, see Spatio-temporal indexes.
CREATE INDEX idx ON roads (Z-ORDER(g1));
CREATE INDEX idt ON roads (Z-ORDER(g1,t));Include all columns
CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE ='COVERED_ALL_COLUMNS_IN_SCHEMA');Verifying the results
Run SHOW INDEX FROM test to check whether the index is created.
Include all dynamic columns
CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE='COVERED_DYNAMIC_COLUMNS');Verify the result
Run SHOW INDEX FROM test to check whether the index is created.
Set the number of pre-partitions for the index table
Set the number of pre-partitions for the index table to 32.
CREATE INDEX idx1 ON test(c4 desc) include(c5,c6) WITH (NUMREGIONS ='32');Verify the result
Run SHOW INDEX FROM test to check whether the index is created.
Specify the start and end keys for the index table
When you create a secondary index, you can specify the start and end keys and the number of pre-partitions for the index table. This divides the index table into 32 pre-partitions between 11111111 and 9999999.
CREATE INDEX idx1 ON test(c3 desc) include(c5,c6) WITH (NUMREGIONS ='32', STARTKEY ='11111111', ENDKEY = '9999999');Verify the results
Run SHOW INDEX FROM test to check whether the index is created.
Create a Z-ORDER secondary index
Create a Z-ORDER secondary index for the polygon data type column c5.
CREATE INDEX idx1 ON test(Z-ORDER(c5));Verifying the result
Run SHOW INDEX FROM test to check whether the index is created.
Create a grid secondary index
Create a grid secondary index for the polygon type column c5. Currently, only the asynchronous creation mode is supported.
CREATE INDEX idx1 ON test(S2(c5, 10));Build the grid secondary index.
BUILD INDEX s2_idx ON test;
Verifying the results
Run SHOW INDEX FROM test to check whether the index is created.
Convert the data type of a specified column
Convert the data type of column c3 to INTEGER and then create a secondary index.
CREATE INDEX idx1 ON test(CAST(c3 AS INTEGER));Verifying the result
Run SHOW INDEX FROM test to check whether the index is created.
Search index
Asynchronously build an index
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(p1, p2, c1, c2, c3);Verifying the results
Run SHOW INDEX FROM test to check whether the index is created.
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Create an index for all columns
Create a search index for all columns. If no specific column properties are specified, the default values are used.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*');Verify the result
Run SHOW INDEX FROM test to check whether the index is created.
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Add index key properties
Add non-custom index key properties
Create a search index for all columns. If the index includes column c3, set its properties to type=text, analyzer=ik, and indexed=true.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(type=text,analyzer=ik,indexed=true));Verify the result
Run
SHOW INDEX FROM testto check whether the index is created.NoteYou can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Add custom index key properties
Assume that the table schema is as follows:
CREATE TABLE test ( p1 VARCHAR NOT NULL, p2 INTEGER NOT NULL, c1 BIGINT, c2 DOUBLE, c3 VARCHAR, c4 TIMESTAMP, PRIMARY KEY(p1, p2) ) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');Create a search index for all columns, where the index column c3 is specified as type text and the tokenizer is ik_max_word.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(mapping='{ "type": "text", "analyzer": "ik_max_word" }'));ImportantThe mapping parameter applies only to Elasticsearch-compatible versions.
If the mapping parameter is used, all other parameters for this index key are invalid.
Verifying the result
Run
SHOW INDEX FROM testto check whether the index is created.NoteYou can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set the index state
Specify columns and some column properties, and set the search index state to ACTIVE.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,numShards=4);Verifying the results
Run SHOW INDEX FROM test to check whether the index is created.
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set custom index properties
Create a search index, set the search index state to ACTIVE, and use custom settings to specify the index compression method as ZSTD and the refresh interval as 10 seconds.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,INDEX_SETTINGS='{
"index": {
"codec": "zstd",
"refresh_interval": "10s"
}
}');Verify the result
Run SHOW INDEX FROM test to check whether the index is created.
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set time partitions
Partition by the time column c4, starting from 30 days ago, with automatic partitioning every 7 days. By default, partition data for 90 days is retained.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test (c1, c2, c3, c4)
PARTITION BY RANGE TIME(c4) PARTITIONS 16
WITH (indexState=ACTIVE, RANGE_TIME_PARTITION_START='30', RANGE_TIME_PARTITION_INTERVAL='7', RANGE_TIME_PARTITION_TTL='90', RANGE_TIME_PARTITION_MAX_OVERLAP='90');Verification
Run SHOW INDEX FROM test to check whether the index is created.
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set storage for raw data
Search indexes are used only for data filtering and do not store the raw data of index columns by default. To directly access the search engine when you use a search index, you can set the index to store all or part of the raw data of the index columns.
Set storage for the raw data of all index columns
Create a search index that includes columns
c1,c2,c3, andc4, and store the raw data of columnsc1,c2,c3, andc4.CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS=' { "enabled": true } ');Verify the results
Run
SHOW INDEX FROM testto check whether the index is created.NoteYou can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set storage for the raw data of some index columns
Create a search index that includes columns
c1,c2,c3, andc4, and store the raw data of columnsc2,c3, andc4.CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS=' { "includes": ["c*"], "excludes": ["c1"] } ');Verifying the result
Run
SHOW INDEX FROM testto check whether the index is created.NoteYou can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.