Lindorm provides two efficient and easy-to-use indexes: secondary indexes and search indexes. These indexes are used for non-primary key matching, multi-dimensional queries. The CREATE INDEX statement lets you specify the index type and add index properties.
Engine and version
The CREATE INDEX syntax applies only to LindormTable. No version limits apply to the base syntax.
To use the CREATE INDEX syntax to create a search index or a columnstore index, you must use Lindorm SQL 2.6.1 or later. For more information about how to check the version of Lindorm SQL, see SQL version guide.
Precautions
Index building for secondary indexes, columnstore indexes, and search indexes requires reading data from the primary table, which generates read operations. If you enabled the hot and cold data separation feature for your instance, you must monitor the throttling of cold storage (storage-optimized cloud storage). If read operations on cold storage are throttled, the efficiency of index building is affected and may cause backpressure on write operations.
Syntax
create_index_statement ::= CREATE INDEX [IF NOT EXISTS] [ index_identifier ]
[ USING index_method_definition ]
ON table_identifier '(' index_key_expression ')'
[ INCLUDE include_expression]
[ PARTITION BY partition_definition ]
[ { ASYNC | SYNC} ]
[ WITH '(' index_options ')' ]
index_method_definition ::= { KV | SEARCH | COLUMNAR }
index_key_expression ::= '('
index_key_definition
|wildcard_string_literal
')'
index_key_definition ::= {
column_identifier [ DESC ]
| column_identifier'(' column_options ')'
| function_expression
}
[ (',' index_key_definition)* ]
column_options ::= '('
option_definition (',' option_definition )*
')'
function_expression ::= function_identifier
'('
column_identifer ( ',' column_identifer )*
')'
option_definition ::= option_identifer '=' string_literal
include_expression ::= '('
column_identifier( ',' column_identifier )*
')'
partition_definition ::= {
{RANGE TIME}
'('
column_identifer
')' [ PARTITIONS number_literal ]
|
HASH '('
column_identifer
( ',' column_identifer )*
')' [ PARTITIONS number_literal ]
|
ENUMERABLE
'('
column_identifer
( ',' column_identifer )*
index_options ::= '('
option_definition (',' option_definition )*
')'Differences
LindormTable supports creating two types of indexes using the CREATE INDEX statement: secondary indexes and search indexes.
Different indexes support different syntax elements, as shown in the following table:
Syntax element | Secondary index | Search index |
〇 | 〇 | |
〇 | 〇 | |
〇 | ✖️ | |
✖️ | 〇 | |
Index building mode (ASYNC|SYNC) Important The | 〇 | 〇 |
〇 | 〇 |
Usage notes
You can create a maximum of 3 secondary indexes, 1 search index for each wide table.
Index type (index_method_definition)
When you create an index, you can use the USING keyword to specify the index type. LindormTable supports the following three types of indexes:
Parameter | Index type | Description |
KV | Secondary index | When the Important An instance can have a maximum of eight secondary index building tasks running at the same time. If eight tasks are already running, the statement to create another secondary index fails. |
SEARCH | Search index | A full-text search index that is based on a search engine. It is mainly used for complex multi-dimensional queries and supports scenarios such as tokenization, fuzzy queries, aggregate analytics, sorting, and pagination. For more information, see Introduction to search indexes. Search indexes support all basic data types except DATE, TIME, and DECIMAL. For more information about data types, see Basic data types. Important
|
Index key expression (index_key_expression)
You can define one or more columns as index keys. An index that consists of multiple index keys is also called a composite index.
Index key definition (index_key_definition)
If the index type is search index, you can add properties to the index keys. If the index type is secondary index, you can specify the index key as a function expression.
Search index key properties (option_definition)
You can specify properties for an index key when you create an index or when you use the ALTER INDEX statement to add an index column. For example, c3(type=text,analyzer=ik) creates an index for column c3 and specifies the ik tokenizer for this column.
The following table describes the index key properties supported in search indexes.
Index key property | Type | Description |
indexed | STRING | Specifies whether to create an index for the specified column in the index key. An inverted index is created for string type columns. A numeric index, such as a BKD-Tree index, is created for numeric type fields.
|
rowStored | STRING | Specifies whether to store the raw data.
|
columnStored | STRING | Specifies whether to enable column store to accelerate sorting and analytics.
|
type | STRING | In tokenization scenarios, you must set the type parameter to text for the tokenized field. In other cases, the type is the same as the data type in the wide table by default. Important The type parameter must be used with the analyzer parameter. |
analyzer | STRING | The list of tokenizers. Valid values:
Important The analyzer parameter must be used with the type parameter. |
mapping | STRING | Custom index key properties, which are a JSON object represented as a string. This parameter is compatible with the Elasticsearch index key property syntax. Important
|
Secondary index function expression (function_expression)
When you create a secondary index, you can specify the index key as a function expression. Five function expressions are supported: Z-ORDER, S2, CAST, MD5, and SHA256.
ImportantOnly LindormTable 2.6.7.5 and later supports specifying the index key as an MD5 or SHA256 function expression when you create an index. If you cannot upgrade your LindormTable engine in the console, contact Lindorm technical support (DingTalk ID: s0s3eg3).
Z-ORDER function: Creates a spatio-temporal secondary index for one or more columns that have spatio-temporal data types. The syntax is as follows:
Z-ORDER '(' column_identifier ( ',' column_identifer )* ')'The data type of column_identifier must be a spatio-temporal data type. For more information about spatio-temporal indexes, see Spatio-temporal indexes.
CAST function: Creates an index on the result of a data type conversion for a column. The syntax is as follows:
CAST(column_identifier AS type)type is the data type. For more information, see Basic data types.
S2 function: Creates an S2 grid secondary index for a spatio-temporal data type column. The syntax is as follows:
S2 '(' column_identifier, level ')'The data type of column_identifier must be POLYGON or MULTIPOLYGON. The value of level ranges from 1 to 30. For more information, see S2 index functions.
MD5 function: Creates an index for the MD5 encoded value of a VARCHAR column. The syntax is as follows:
MD5(column_identifier)For more information about the MD5 function, see MD5 function.
SHA256 function: Creates an index for the SHA256 encoded value of a VARCHAR column. The syntax is as follows:
SHA256(column_identifier)For more information about the SHA256 function, see SHA256 function.
Wildcard constant (wildcard_string_literal)
Only search indexes support the wildcard character (*).
The wildcard character (*) is used to build an index on all columns. For example, CREATE INDEX IF NOT EXISTS idx5 USING SEARCH ON test(*);.
After the statement is executed, columns that are added later are not automatically added to the search index or columnstore index. You must add them using the
ALTER INDEXstatement.Dynamic columns are not included in the index. For more information, see Dynamic columns.
Included columns (include_expression)
Included columns are columns from the primary table that are stored in the index table. This feature is also known as an included index or a covering index. It improves query performance by preventing the need to read the primary table after a query hits the index.
For secondary indexes, you can use the WITH keyword to add table properties to include dynamic columns. For more information, see Secondary indexes.
Index partition (partition_definition)
Only search indexes support index partitioning.
Index partitioning is an index management policy where the server automatically splits and stores data in separate partitions. When you query data, the system automatically performs partition pruning to improve query efficiency.
Search indexes support RANGE partitioning and HASH partitioning. For more information, see Partitioned indexes.
Index building mode (ASYNC|SYNC)
When you use the CREATE INDEX statement to create an index, you can use the ASYNC or SYNC keyword to specify the index building mode.
ASYNC: Builds the index asynchronously. After the
CREATE INDEXstatement is executed, the index building task starts immediately. TheCREATE INDEXstatement returns immediately, regardless of whether the index is successfully built.SYNC: Builds the index synchronously. After the
CREATE INDEXstatement is executed, the index building task starts immediately. TheCREATE INDEXstatement returns only after the index is built.
The following table describes the building modes supported by the two types of indexes.
Index building mode | Secondary index | Search index |
ASYNC Important Starting from LindormTable 2.6.1, the index is built asynchronously by default after you execute the | 〇 | 〇 |
SYNC Important Synchronous index building is supported only in LindormTable 2.6.3 and later. | 〇 | 〇 |
Index properties (index_options)
When you use the CREATE INDEX statement to create an index, you can use the WITH keyword to specify index properties. The supported index properties are as follows.
Secondary indexes
Property | Type | Description |
COMPRESSION | STRING | The compression algorithm for the index table. Supported algorithms include the following:
|
INDEX_COVERED_TYPE | STRING | The following values are valid for the index redundancy method:
You can also use the |
STARTKEY | STRING | The start key of the index table. Important You cannot set a start key for a timestamp column or a column of a spatio-temporal data type. |
ENDKEY | STRING | The end key of the index table. Important You cannot set an end key for a timestamp column or a column of a spatio-temporal data type. |
NUMREGIONS | INTEGER | The number of pre-partitions for the index table. Important You cannot set the number of pre-partitions for a timestamp column or a column of a spatio-temporal data type. |
Search indexes
Property | Type | Description |
indexState | STRING | The state of the search index. Valid values:
|
numShards | INTEGER | Specifies the number of shards. The default value is twice the number of search nodes. Important We recommend that the data volume of a single shard be controlled between 30 million and 100 million rows, and the storage size be controlled between 30 GB and 50 GB. If the data volume of a shard is very large, the read and write performance of the search index may decrease. In addition, a single shard can carry a limited amount of data. If the data exceeds 2 billion rows, system stability is affected. Therefore, we recommend that you plan the number of index shards before you create a search index for production. If your data contains a time attribute field and the data volume increases significantly over time, such as order data or log data, we recommend that you create a time-partitioned index. |
RANGE_TIME_PARTITION_START | INTEGER | Specifies the number of days before the index creation operation to start creating partitions. This applies to scenarios with historical data. If the timestamp of historical data is earlier than the start time of partitioning, an error is reported. Note You must specify this parameter when you create a partitioned index. |
RANGE_TIME_PARTITION_INTERVAL | INTEGER | Specifies the interval in days at which to create a new partition. For example, Note You must specify this parameter when you create a partitioned index. |
RANGE_TIME_PARTITION_TTL | INTEGER | Specifies the number of days to retain partition data. For example, Note You must specify this parameter when you create a partitioned index. |
RANGE_TIME_PARTITION_MAX_OVERLAP | INTEGER | If the data to be written has a future timestamp, this parameter specifies the maximum allowed time interval from the current time, in days. If this parameter is not specified, data for the next day is allowed to be written by default. |
RANGE_TIME_PARTITION_FIELD_TIMEUNIT | LONG | Specifies the unit of the time partition field. The default unit is milliseconds (ms).
|
RANGE_TIME_PARTITION_CHS | INTEGER | The boundary for hot and cold data. Specifies how long ago data should be transferred to cold storage. The default unit is seconds (s). For example, Note If this parameter is not set, hot and cold data separation is not enabled. Data is stored in hot storage by default. |
INDEX_SETTINGS | STRING | Custom index properties, which are a JSON object represented as a string. This parameter is compatible with the Elasticsearch index settings syntax. Important The INDEX_SETTINGS parameter applies only to Elasticsearch-compatible search engines. |
SOURCE_SETTINGS | STRING | The properties of the raw data storage policy for search index columns. It is a JSON object represented as a string and is compatible with the Elasticsearch _source settings syntax. By default, a search index does not save the raw data of index columns. The following parameters are supported:
Important The
Note Configure the SOURCE_SETTINGS parameter only when you need to perform data queries from the search engine visualization user interface. You do not need to configure this parameter in regular use cases. |
Examples
Assume that the following statement is used to create the primary table `test`:
CREATE TABLE test (
p1 VARCHAR NOT NULL,
p2 INTEGER NOT NULL,
c1 BIGINT,
c2 DOUBLE,
c3 VARCHAR,
c4 TIMESTAMP,
c5 GEOMETRY(POINT),
PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');Secondary indexes
Build an index asynchronously
If you do not specify a building mode, the index is built asynchronously by default.
CREATE INDEX idx1 ON test(c1 desc) include(c3,c4) WITH (COMPRESSION='ZSTD');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Create a composite index
Synchronously create a composite index for columns c1, c2, and c3.
CREATE INDEX idx1 ON test(c1, c2, c3) include(c4) SYNC WITH ( COMPRESSION ='ZSTD');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Include all columns
CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE ='COVERED_ALL_COLUMNS_IN_SCHEMA');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Include all dynamic columns
CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE='COVERED_DYNAMIC_COLUMNS');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Set the number of pre-partitions for the index table
Set the number of pre-partitions for the index table to 32.
CREATE INDEX idx1 ON test(c4 desc) include(c5,c6) WITH (NUMREGIONS ='32');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Specify the start and end keys for the index table
You can specify the start and end keys and the number of pre-partitions for the index table when you create a secondary index. This example splits the index table into 32 pre-partitions between 11111111 and 9999999.
CREATE INDEX idx1 ON test(c3 desc) include(c5,c6) WITH (NUMREGIONS ='32', STARTKEY ='11111111', ENDKEY = '9999999');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Create a Z-ORDER secondary index
Create a Z-ORDER secondary index for the polygon data type column c5.
CREATE INDEX idx1 ON test(Z-ORDER(c5));Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Create a grid secondary index
Create a grid secondary index for the polygon type column c5. Currently, only asynchronous creation mode is supported.
CREATE INDEX idx1 ON test(S2(c5, 10));Build the grid secondary index.
BUILD INDEX s2_idx ON test;
Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Convert the data type of a specified column
Convert the data type of column c3 to INTEGER and then create a secondary index.
CREATE INDEX idx1 ON test(CAST(c3 AS INTEGER));Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
Search indexes
Build an index asynchronously
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(p1, p2, c1, c2, c3);Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
You can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Create an index for all columns
Create a search index for all columns. If you do not specify column properties, the default values are used.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
You can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Add index key properties
Add non-custom index key properties
Create a search index for all columns. For the index column c3, the properties are set to type=text, analyzer=ik, and indexed=true.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(type=text,analyzer=ik,indexed=true));Verify the result
You can execute
SHOW INDEX FROM test;to check whether the index is created.NoteYou can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Add custom index key properties
Assume the table schema is as follows:
CREATE TABLE test ( p1 VARCHAR NOT NULL, p2 INTEGER NOT NULL, c1 BIGINT, c2 DOUBLE, c3 VARCHAR, c4 TIMESTAMP, PRIMARY KEY(p1, p2) ) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');Create a search index for all columns. For index column c3, specify the type as text and the tokenizer as ik_max_word.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(mapping='{ "type": "text", "analyzer": "ik_max_word" }'));ImportantThe mapping parameter applies only to Elasticsearch-compatible search engines.
If you use the mapping parameter, all other parameters for this index key become invalid.
Verify the result
You can execute
SHOW INDEX FROM test;to check whether the index is created.NoteYou can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Set the index state
Specify columns, specify some column properties, and set the search index state to ACTIVE.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,numShards=4);Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
You can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Set custom index properties
Create a search index, set the search index state to ACTIVE, and use custom settings to specify the index compression method as ZSTD and the refresh interval as 10 seconds.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,INDEX_SETTINGS='{
"index": {
"codec": "zstd",
"refresh_interval": "10s"
}
}');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
You can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Set a time partition
Partition the index by the time column c4. This example starts partitioning from 30 days ago, creates a new partition every 7 days, and retains partition data for the default period of 90 days.
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test (c1, c2, c3, c4)
PARTITION BY RANGE TIME(c4) PARTITIONS 16
WITH (indexState=ACTIVE, RANGE_TIME_PARTITION_START='30', RANGE_TIME_PARTITION_INTERVAL='7', RANGE_TIME_PARTITION_TTL='90', RANGE_TIME_PARTITION_MAX_OVERLAP='90');Verify the result
You can execute SHOW INDEX FROM test; to check whether the index is created.
You can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Set raw data storage
By default, a search index is used only for data filtering and does not store the raw data of index columns. To retrieve raw data directly from the search engine without accessing the primary table, you can configure the index to store all or part of the raw data of the index columns.
Store the raw data of all index columns
You can create a search index for the
c1,c2,c3, andc4columns, and store the raw data of thec1,c2,c3, andc4columns.CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS=' { "enabled": true } ');Verify the result
You can execute
SHOW INDEX FROM test;to check whether the index is created.NoteYou can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.
Store the raw data of some index columns
Create a search index that contains columns
c1,c2,c3, andc4, and store the raw data of columnsc2,c3, andc4.CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS=' { "includes": ["c*"], "excludes": ["c1"] } ');Verify the result
You can execute
SHOW INDEX FROM test;to check whether the index is created.NoteYou can also view the index building progress in the LTS console. For more information, see View the progress of a full build for a search index.