Create an index - Lindorm - Alibaba Cloud Documentation Center

Lindorm provides two efficient and easy-to-use indexes: secondary indexes and search indexes. These indexes are suitable for different scenarios, such as non-primary key matching, multi-dimensional queries, . You can use the CREATE INDEX syntax to specify the index type and add index properties.

Engine and version

The CREATE INDEX syntax applies only to LindormTable. No version limits apply.

Important

To create a search index or a columnstore index using the CREATE INDEX syntax, you must use Lindorm SQL V2.6.1 or later. To view the version of Lindorm SQL, see SQL version guide.

Important notes

Index building for secondary indexes, columnstore indexes, and search indexes requires data lookups, which generate read operations. If you enabled the hot and cold data separation feature for your instance, monitor the throttling of cold storage (storage-optimized cloud storage). Throttled read operations on cold storage directly affect the efficiency of index building. This can cause backpressure on write operations.

Syntax

create_index_statement   ::=  CREATE INDEX [IF NOT EXISTS] [ index_identifier ]
                              [ USING index_method_definition ]
                              ON table_identifier '(' index_key_expression ')'
                              [ INCLUDE include_expression]
                              [ PARTITION BY partition_definition ]
                              [ { ASYNC | SYNC} ]
                              [ WITH '(' index_options  ')' ]
index_method_definition  ::=  { KV | SEARCH | COLUMNAR }
index_key_expression     ::=  '('
                                  index_key_definition 
                                  |wildcard_string_literal                                 
                              ')'

index_key_definition     ::= {
                                  column_identifier [ DESC ]
                                  | column_identifier'(' column_options ')'
                                  | function_expression
                             }
                             [ (',' index_key_definition)* ]
column_options    ::=  '(' 
                           option_definition (',' option_definition )* 
                       ')'
function_expression      ::=  function_identifier 
                              '(' 
                                  column_identifer ( ',' column_identifer )* 
                              ')'
option_definition        ::=  option_identifer '=' string_literal
include_expression       ::=  '(' 
                                  column_identifier( ',' column_identifier )*
                              ')'
partition_definition     ::= {
                                {RANGE TIME} 
                                     '(' 
                                          column_identifer 
                                     ')' [ PARTITIONS number_literal ]
                                |
                                HASH '(' 
                                          column_identifer 
                                          ( ',' column_identifer )*
                                     ')' [ PARTITIONS number_literal ]
                                |
                                ENUMERABLE 
                                     '(' 
                                          column_identifer 
                                          ( ',' column_identifer )*
index_options            ::=  '(' 
                                  option_definition (',' option_definition )* 
                              ')'

Differences

LindormTable supports two types of indexes that can be created with the CREATE INDEX statement: secondary indexes and search indexes.

The supported syntax elements vary by index type. The following table describes the differences.

Syntax element	Secondary index	Search index
Index type (index_method_definition)	✔	0
Index key expression (index_key_expression)	✔	0
Included columns (include_expression)	0	✖
Index partition (partition_definition)	✖	0
Index building mode (ASYNC\|SYNC) Important Only LindormTable 2.6.3 and later support the `SYNC` building mode.	0	0
Index properties (index_options)	0	0

Usage notes

You can create a maximum of 3 secondary indexes, 1 search index, for each wide table.

Index type (index_method_definition)

When you create an index, you can use the USING keyword to specify the index type. LindormTable supports the following three index types:

Parameter

Index type

Description

Secondary index

If you do not explicitly specify an index type in the CREATE INDEX statement, a secondary index is created by default. For more information about secondary indexes, see Secondary indexes.

Important

An instance can have a maximum of eight concurrent secondary index building tasks. If you try to create a secondary index when eight tasks are already running, the statement fails.

Search index

A full-text search index based on a search engine. It is mainly used for complex multi-dimensional query scenarios, such as tokenization, fuzzy queries, aggregation and analysis, and sorting with pagination. For more information, see Introduction to search indexes.

Search indexes support all basic data types except DATE, TIME, and DECIMAL. For more information about data types, see Basic data types.

Important

Before creating a search index, enable the search index feature. Fees are charged for search nodes and Lindorm Tunnel Service (LTS) nodes. To enable this feature, see Enable search indexes.
The index key of a search index must contain a non-primary key column.
Lindorm instances have no limit on the number of search index building tasks.

Index key expression (index_key_expression)

You can define one or more columns as index keys. An index with multiple index keys is called a composite index.

Index key definition (index_key_definition)

If the index type is a search index, you can add properties to the index keys. If the index type is a secondary index, you can specify a function expression as the index key.

Search index key properties (option_definition)

When you add an index column using the ALTER INDEX statement, you can also specify properties for the index key. For example, c3(type=text,analyzer=ik) creates an index on column c3 and specifies that the ik tokenizer is used for this column.

The following table describes the index key properties supported by search indexes.

Index key property	Type	Description
indexed	STRING	Specifies whether to create an index for the specified column in the index key. An inverted index is created for string type columns. A numeric index, such as a BKD-Tree index, is created for numeric type fields. true: (Default) Yes. false: An index is not created.
rowStored	STRING	Specifies whether to store the raw data. true: Stores the raw data. false: The default value. The raw data is not stored.
columnStored	STRING	Specifies whether to use column store to accelerate sorting and analysis. true: The default value. Column store is used. false: Column store is not used.
type	STRING	For tokenization scenarios, set the type parameter to text for the tokenized field. In other cases, the data type is the same as the wide table data type by default. Important The type parameter must be used with the analyzer parameter.
analyzer	STRING	Specifies the list of tokenizers. Valid values: standard english ik whitespace comma Important The analyzer parameter must be used with the type parameter.
mapping	STRING	Custom index key properties, represented as a JSON object in a string. This is compatible with the Elasticsearch index key property syntax. Important The mapping parameter applies only to Elasticsearch-compatible search engine versions. If the mapping parameter is used, all other parameters for this index key become invalid.

Secondary index function expression (function_expression)

When you create a secondary index, you can specify a function expression as the index key. Five function expressions are supported: Z-ORDER, S2, CAST, MD5, and SHA256.
Important
Only LindormTable 2.6.7.5 and later supports the use of MD5 or SHA256 function expressions as index keys. If you cannot upgrade your LindormTable engine in the console, contact Lindorm technical support on DingTalk at s0s3eg3.
- Z-ORDER function: Creates a spatio-temporal secondary index for one or more columns of a spatio-temporal data type. The syntax is as follows:
```
Z-ORDER '(' column_identifier ( ',' column_identifer )*  ')'
```
  The data type of column_identifier must be a spatio-temporal data type. For more information about spatio-temporal indexes, see Spatio-temporal indexes.
- CAST function: Creates an index on the result of a data type conversion for a column. The syntax is as follows:
```
CAST(column_identifier AS type)
```
  type is a data type. For more information, see Basic data types.
- S2 function: Creates an S2 grid secondary index for a column of a spatio-temporal data type. The syntax is as follows:
```
S2 '(' column_identifier, level ')'
```
  The data type of column_identifier must be POLYGON or MULTIPOLYGON. The value of level ranges from 1 to 30. For more information, see S2 index function.
- MD5 function: Creates an index on the MD5 encoded value of a VARCHAR column. The syntax is as follows:
```
MD5(column_identifier)
```
  For more information about the MD5 function, see MD5 function.
- SHA256 function: Creates an index on the SHA256 encoded value of a VARCHAR column. The syntax is as follows:
```
SHA256(column_identifier)
```
  For more information about the SHA256 function, see SHA256 function.

Wildcard constant (wildcard_string_literal)

Only search indexes support the wildcard constant (*).

The wildcard constant (*) is used to build an index on all columns. For example, CREATE INDEX IF NOT EXISTS idx5 USING SEARCH ON test(*).

Important

After the statement is executed, newly added columns are not automatically added to the search index or columnstore index. You must add them manually using the ALTER INDEX statement.
Dynamic columns are not included in the index. For more information, see Dynamic columns.

Included columns (include_expression)

Included columns, also known as a covering index, are columns from the primary table that are included in the index table. This feature avoids the need to look up the primary table after a query hits the index, which improves query performance.

Note

For secondary indexes, you can use the WITH keyword to add table properties to include dynamic columns. For more information, see Secondary indexes.

Index partition (partition_definition)

Only search indexes support index partitioning.

Index partitioning is an index management policy. The server-side automatically splits and stores data. When you query data, the system automatically performs partition pruning.

Search indexes support RANGE partitioning and HASH partitioning. For more information, see Partitioned indexes.

Index building mode (ASYNC|SYNC)

When you use the CREATE INDEX statement to create an index, you can specify the index building mode using the ASYNC or SYNC keyword.

ASYNC: Creates the index asynchronously. After the CREATE INDEX statement is executed, an index building task starts, and the statement returns immediately, regardless of whether the index is successfully built.
SYNC: Creates the index synchronously. After the CREATE INDEX statement is executed, an index building task starts immediately. The CREATE INDEX statement returns a result only after the index building task is complete.

The following table describes the building modes supported by the two index types.

Index building mode	Secondary index	Search index
ASYNC Important Starting from LindormTable 2.6.1, the default index building mode for the `CREATE INDEX` statement is asynchronous.	0	0
SYNC Important Only LindormTable 2.6.3 and later support synchronous index building.	0	✔

Index properties (index_options)

When you create an index using the CREATE INDEX statement, you can use the WITH keyword to specify index properties. The following index properties are supported.

Secondary index

Property	Type	Description
COMPRESSION	STRING	The compression algorithm for the index table. Supported algorithms include the following: SNAPPY ZSTD LZ4
INDEX_COVERED_TYPE	STRING	The following values are available for the index redundancy method: COVERED_ALL_COLUMNS_IN_SCHEMA: Includes all predefined non-primary key columns in the table schema. COVERED_DYNAMIC_COLUMNS: Includes all predefined non-primary key columns and dynamic columns in the table schema. Note If you specify INDEX_COVERED_TYPE, you do not need to specify the INCLUDE clause in the statement. Before including dynamic columns, make sure that the dynamic column feature is enabled. For more information, see Dynamic columns. You can also use the `INCLUDE` keyword to include specified columns.
STARTKEY	STRING	The start key of the index table. Important You cannot set a start key for a timestamp column or a column of a spatial data type.
ENDKEY	STRING	The end key of the index table. Important You cannot set an end key for a timestamp column or a column of a spatial data type.
NUMREGIONS	INTEGER	The number of pre-partitions for the index table. Important You cannot set the number of pre-partitions for a timestamp column or a column of a spatial data type.

Search index

Property	Type	Description
indexState	STRING	The state of the search index. Valid values: ACTIVE: The index is available. INACTIVE: The index is unavailable. DISABLED: The index is disabled.
numShards	INTEGER	Specifies the number of shards. The default value is twice the number of search nodes. Important The data volume of a single shard is recommended to be between 30 million and 100 million rows. The storage size is recommended to be between 30 GB and 50 GB. If the data volume of a shard is very large, the read and write performance of the search index may decrease. In addition, a single shard has a limited data capacity. Exceeding 2 billion rows affects system stability. Therefore, plan the number of index shards before you create a search index for production. If your data contains a time attribute field and the data volume increases significantly over time (such as order data or log data), we recommend that you create a time-partitioned index.
RANGE_TIME_PARTITION_START	INTEGER	The number of days before the index creation operation to start creating partitions. This applies to scenarios with historical data. If the timestamp of historical data is earlier than the partition start time, an error occurs. Note This parameter is required when you create a partitioned index.
RANGE_TIME_PARTITION_INTERVAL	INTEGER	The interval in days for creating new partitions. For example, `RANGE_TIME_PARTITION_INTERVAL='7'` indicates that a new partition is created every week. Note This parameter is required when you create a partitioned index.
RANGE_TIME_PARTITION_TTL	INTEGER	The number of days to retain partition data. For example, `RANGE_TIME_PARTITION_TTL='180'` indicates that partition data from the last six months is retained. Historical partitions are automatically cleared. If this parameter is not specified, data is never cleared. Note This parameter is required when you create a partitioned index.
RANGE_TIME_PARTITION_MAX_OVERLAP	INTEGER	If the data being written has a future timestamp, this parameter specifies the maximum allowed time interval from the current time, in days. If not specified, data from one day in the future is allowed by default.
RANGE_TIME_PARTITION_FIELD_TIMEUNIT	LONG	The unit of the time partition field specified by the business. The default unit is milliseconds (ms). If the unit of the partition field is set to seconds (s), the number must be 10 digits long. If the unit of the partition field is set to milliseconds (ms), the number must be 13 digits long.
RANGE_TIME_PARTITION_CHS	INTEGER	The boundary for hot and cold data. Specifies the age of data to be transferred to cold storage. The default unit is seconds (s). For example, `RANGE_TIME_PARTITION_CHS='864000'` indicates that data older than 10 days is archived to cold storage. Note If this parameter is not set, hot and cold data separation is not enabled. Data is stored only in hot storage by default.
INDEX_SETTINGS	STRING	Custom index properties, represented as a JSON object in a string. This is compatible with the Elasticsearch index settings syntax. Important The INDEX_SETTINGS parameter applies only to Elasticsearch-compatible versions.
SOURCE_SETTINGS	STRING	Properties for the raw data storage policy of search index columns. This is a JSON object in a string and is compatible with the _source settings syntax of Elasticsearch. By default, a search index does not store the raw data of index columns. The following parameters are supported: `enabled`: The value is a Boolean type. `true` indicates that the raw data of all index columns is stored. `false` indicates that the raw data of all index columns is not stored. Important The `enabled` parameter cannot be used with the `includes` or `excludes` parameters. `includes`: The value is a string array. Specifies a list of index column names for which to store raw data. You can use the `` wildcard character to match multiple columns. `excludes`: The value is a string array. Specifies a list of index column names for which not to store raw data. You can use the `` wildcard character to match multiple columns. Note Configure the SOURCE_SETTINGS parameter only when you need to perform data queries through the search engine visualization user interface. You do not need to configure this parameter for regular use cases.

Examples

Assume that the primary table `test` is created using the following statement:

CREATE TABLE test (
  p1 VARCHAR NOT NULL,
  p2 INTEGER NOT NULL,
  c1 BIGINT,
  c2 DOUBLE,
  c3 VARCHAR,
  c4 TIMESTAMP,
  c5 GEOMETRY(POINT),
  PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');

Secondary index

Asynchronously build an index

If you do not specify a building mode, the index is built asynchronously by default.

CREATE INDEX idx1 ON test(c1 desc) include(c3,c4) WITH (COMPRESSION='ZSTD');

Result Verification

Run SHOW INDEX FROM test to check whether the index is created.

Create a composite index

Create a composite index on columns c1, c2, and c3 synchronously.

CREATE INDEX idx1 ON test(c1, c2, c3) include(c4) SYNC WITH ( COMPRESSION ='ZSTD');

Verifying the results

Run SHOW INDEX FROM test to check whether the index is created.

Create a spatio-temporal secondary index

The following example shows how to create a spatio-temporal secondary index. For more information, see Spatio-temporal indexes.

CREATE INDEX idx ON roads (Z-ORDER(g1));
CREATE INDEX idt ON roads (Z-ORDER(g1,t));

Include all columns

CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE ='COVERED_ALL_COLUMNS_IN_SCHEMA');

Verifying the results

Run SHOW INDEX FROM test to check whether the index is created.

Include all dynamic columns

CREATE INDEX idx1 ON test(c4 desc) WITH (INDEX_COVERED_TYPE='COVERED_DYNAMIC_COLUMNS');

Verify the result

Run SHOW INDEX FROM test to check whether the index is created.

Set the number of pre-partitions for the index table

Set the number of pre-partitions for the index table to 32.

CREATE INDEX idx1 ON test(c4 desc) include(c5,c6)  WITH (NUMREGIONS ='32');

Verify the result

Run SHOW INDEX FROM test to check whether the index is created.

Specify the start and end keys for the index table

When you create a secondary index, you can specify the start and end keys and the number of pre-partitions for the index table. This divides the index table into 32 pre-partitions between 11111111 and 9999999.

CREATE INDEX idx1 ON test(c3 desc) include(c5,c6) WITH (NUMREGIONS ='32', STARTKEY ='11111111', ENDKEY = '9999999');

Verify the results

Run SHOW INDEX FROM test to check whether the index is created.

Create a Z-ORDER secondary index

Create a Z-ORDER secondary index for the polygon data type column c5.

CREATE INDEX idx1 ON test(Z-ORDER(c5));

Verifying the result

Run SHOW INDEX FROM test to check whether the index is created.

Create a grid secondary index

Create a grid secondary index for the polygon type column c5. Currently, only the asynchronous creation mode is supported.
```
CREATE INDEX idx1 ON test(S2(c5, 10));
```
Build the grid secondary index.
```
BUILD INDEX s2_idx ON test;
```

Verifying the results

Run SHOW INDEX FROM test to check whether the index is created.

Convert the data type of a specified column

Convert the data type of column c3 to INTEGER and then create a secondary index.

CREATE INDEX idx1 ON test(CAST(c3 AS INTEGER));

Verifying the result

Run SHOW INDEX FROM test to check whether the index is created.

Search index

Asynchronously build an index

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(p1, p2, c1, c2, c3);

Verifying the results

Run SHOW INDEX FROM test to check whether the index is created.

Note

You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Create an index for all columns

Create a search index for all columns. If no specific column properties are specified, the default values are used.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*');

Verify the result

Run SHOW INDEX FROM test to check whether the index is created.

Note

You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Add index key properties

Add non-custom index key properties
Create a search index for all columns. If the index includes column c3, set its properties to type=text, analyzer=ik, and indexed=true.
```
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(type=text,analyzer=ik,indexed=true));
```
Verify the result
Run SHOW INDEX FROM test to check whether the index is created.
Note
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Add custom index key properties
Assume that the table schema is as follows:
```
CREATE TABLE test (
  p1 VARCHAR NOT NULL,
  p2 INTEGER NOT NULL,
  c1 BIGINT,
  c2 DOUBLE,
  c3 VARCHAR,
  c4 TIMESTAMP,
  PRIMARY KEY(p1, p2)
) WITH (CONSISTENCY = 'strong', MUTABILITY='MUTABLE_LATEST');
```
Create a search index for all columns, where the index column c3 is specified as type text and the tokenizer is ik_max_word.
```
CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test('*',c3(mapping='{
  "type": "text",
  "analyzer": "ik_max_word"
}'));
```
Important
- The mapping parameter applies only to Elasticsearch-compatible versions.
- If the mapping parameter is used, all other parameters for this index key are invalid.
Verifying the result
Run SHOW INDEX FROM test to check whether the index is created.
Note
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Set the index state

Specify columns and some column properties, and set the search index state to ACTIVE.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,numShards=4);

Verifying the results

Run SHOW INDEX FROM test to check whether the index is created.

Note

You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Set custom index properties

Create a search index, set the search index state to ACTIVE, and use custom settings to specify the index compression method as ZSTD and the refresh interval as 10 seconds.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test(c1, c3(type=text,analyzer=ik)) WITH (indexState=ACTIVE,INDEX_SETTINGS='{
  "index": {
    "codec": "zstd",
    "refresh_interval": "10s"
   }
}');

Verify the result

Run SHOW INDEX FROM test to check whether the index is created.

Note

You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Set time partitions

Partition by the time column c4, starting from 30 days ago, with automatic partitioning every 7 days. By default, partition data for 90 days is retained.

CREATE INDEX IF NOT EXISTS idx2 USING SEARCH ON test (c1, c2, c3, c4)
PARTITION BY RANGE TIME(c4) PARTITIONS 16
WITH (indexState=ACTIVE, RANGE_TIME_PARTITION_START='30', RANGE_TIME_PARTITION_INTERVAL='7', RANGE_TIME_PARTITION_TTL='90', RANGE_TIME_PARTITION_MAX_OVERLAP='90');

Verification

Run SHOW INDEX FROM test to check whether the index is created.

Note

You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.

Set storage for raw data

Search indexes are used only for data filtering and do not store the raw data of index columns by default. To directly access the search engine when you use a search index, you can set the index to store all or part of the raw data of the index columns.

Set storage for the raw data of all index columns
Create a search index that includes columns c1, c2, c3, and c4, and store the raw data of columns c1, c2, c3, and c4.
```
CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS='
{
  "enabled": true
}  
');
```
Verify the results
Run SHOW INDEX FROM test to check whether the index is created.
Note
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.
Set storage for the raw data of some index columns
Create a search index that includes columns c1, c2, c3, and c4, and store the raw data of columns c2, c3, and c4.
```
CREATE INDEX idx2 USING SEARCH ON test(c1, c2, c3, c4) WITH (SOURCE_SETTINGS='
{
  "includes": ["c*"],
  "excludes": ["c1"]
}  
');
```
Verifying the result
Run SHOW INDEX FROM test to check whether the index is created.
Note
You can also view the index building progress in the Lindorm console. For more information, see View the full build progress of a search index.