Use the CreateSearchIndex method to create a search index for a data table. A data table can have multiple search indexes. When you create a search index, add the fields that you want to query to the index. You can also configure advanced options, such as custom routing keys and pre-sorting.
Prerequisites
-
The Tablestore client is initialized. For more information, see Initialize Tablestore Client.
-
A data table is created and meets the following conditions:
-
The maximum number of versions is set to 1.
-
The time to live (TTL) is set to -1, or the table has updates disabled.
-
Usage notes
-
When you create a search index, the data type of each field in the search index must match the data type of the corresponding field in the data table.
-
To set a specific TTL for a search index (a value other than -1), you must disable the UpdateRow write feature for the data table. The TTL of the search index must be less than or equal to the TTL of the data table. For more information, see Lifecycle management.
Parameters
When you create a search index, you must specify the table name (table_name), the search index name (index_name), and the index schema (schema). The schema includes field schemas (field_schemas), index settings (index_setting), and index pre-sorting settings (index_sort). The following table lists these parameters.
|
Parameter |
Description |
|
table_name |
The name of the data table. |
|
index_name |
The name of the search index. |
|
field_schemas |
A list of field_schema objects. Each field_schema object contains the following parameters:
|
|
index_setting |
The index settings, which include the routing_fields setting. routing_fields (Optional): Custom routing fields. You can select some primary key columns as routing fields. Typically, you only need to set one. If you set multiple routing keys, the system concatenates their values into a single value. When index data is written, the system determines the data partition based on the values of the routing fields. Records with the same routing field values are distributed to the same data partition. |
|
index_sort |
The index pre-sorting settings, which include the sorters setting. If you do not configure this parameter, data is sorted by primary key by default. Note
Search indexes that contain Nested fields do not support index pre-sorting (indexSort). sorters (Required): The pre-sorting method for the index. Supported methods: sorting by primary key and sorting by field value. For more information, see Sorting and pagination.
|
Examples
Specify analyzers when creating a search index
The following sample code creates a search index with analyzers configured. The search index contains six fields: k (Keyword), t (Text), g (Geopoint), ka (Keyword array), la (Long array), and n (Nested). The n field has three sub-fields: nk (Keyword), nl (Long), and nt (Text).
def create_search_index(client):
# A Keyword field. Create an index and enable statistical aggregation.
field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True)
# A Text field. Create an index and use single-word tokenization.
field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SINGLEWORD)
# A Text field. Create an index and use fuzzy tokenization.
#field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.FUZZY,analyzer_parameter=FuzzyAnalyzerParameter(1, 6))
# A Text field. Create an index and use a custom separator (a comma) for tokenization.
#field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SPLIT, analyzer_parameter = SplitAnalyzerParameter(","))
# A Geopoint field. Create an index.
field_c = FieldSchema('g', FieldType.GEOPOINT, index=True)
# A Keyword array field. Create an index.
field_d = FieldSchema('ka', FieldType.KEYWORD, index=True, is_array=True)
# A Long array field. Create an index.
field_e = FieldSchema('la', FieldType.LONG, index=True, is_array=True)
# A Nested field that includes three sub-fields: nk (Keyword), nl (Long), and nt (Text).
field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
FieldSchema('nk', FieldType.KEYWORD, index=True),
FieldSchema('nl', FieldType.LONG, index=True),
FieldSchema('nt', FieldType.TEXT, index=True),
])
fields = [field_a, field_b, field_c, field_d, field_e, field_n]
index_setting = IndexSetting(routing_fields=['PK1'])
index_sort = None # When a search index contains a Nested field, you cannot set index pre-sorting.
#index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
client.create_search_index('<TABLE_NAME>', '<SEARCH_INDEX_NAME>', index_meta)
Create a search index and configure vector fields
The following sample code creates a search index with vector fields configured. The search index contains three fields: col_keyword (Keyword), col_long (Long), and col_vector (Vector). The distance measure algorithm for the vector field is dot product.
def create_search_index(client):
index_meta = SearchIndexMeta([
FieldSchema('col_keyword', FieldType.KEYWORD, index=True, enable_sort_and_agg=True), # String type
FieldSchema('col_long', FieldType.LONG, index=True), # Numeric type
FieldSchema("col_vector", FieldType.VECTOR, # Vector type
vector_options=VectorOptions(
data_type=VectorDataType.VD_FLOAT_32,
dimension=4, # The vector dimension is 4, and the similarity algorithm is dot product.
metric_type=VectorMetricType.VM_DOT_PRODUCT
)),
])
client.create_search_index(table_name, index_name, index_meta)
Enable summary and highlighting when creating a search index
The following sample code creates a search index with summary and highlighting enabled. The search index contains three fields: k (Keyword), t (Text), and n (Nested). The n field has three sub-fields: nk (Keyword), nl (Long), and nt (Text). Summary and highlighting is enabled for the t field and the nt sub-field of the n field.
def create_search_index0905(client):
# A Keyword field. Create an index and enable statistical aggregation.
field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True)
# A Text field. Create an index, use single-word tokenization, and enable summary and highlighting for the field.
field_b = FieldSchema('t', FieldType.TEXT, index=True, analyzer=AnalyzerType.SINGLEWORD,
enable_highlighting=True)
# A Nested field that includes three sub-fields: nk (Keyword), nl (Long), and nt (Text). The summary and highlighting feature is enabled for the nt sub-column.
field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
FieldSchema('nk', FieldType.KEYWORD, index=True),
FieldSchema('nl', FieldType.LONG, index=True),
FieldSchema('nt', FieldType.TEXT, index=True, enable_highlighting=True),
])
fields = [field_a, field_b, field_n]
index_setting = IndexSetting(routing_fields=['id'])
index_sort = None # When a search index contains a Nested field, you cannot set index pre-sorting.
# index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
client.create_search_index('pythontest', 'pythontest_0905', index_meta)
FAQ
-
Differences between range queries using the GetRange and Search operations
-
Data cannot be found using the Search operation of a search index
-
Does Tablestore support queries similar to IN and BETWEEN...AND in relational databases?
-
The "field:xx must enable enable_sort_and_agg" error occurs when you use a search index
References
-
After you create a search index, you can query data across multiple dimensions by using various query types, such as term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, geo query, Boolean query, vector search, nested query, and column existence query.
-
When you query data, you can perform sorting and pagination, highlighting, or collapse (deduplication) operations on the result set.
-
After you create a search index, you can manage it by performing operations such as lifecycle management, dynamically modifying the schema, listing search indexes, querying search index descriptions, and deleting a search index.
-
To perform data analytics such as calculating the maximum value, minimum value, sum, or row count, you can use the aggregation feature or the SQL query feature.
-
To export data quickly without requiring a specific order for the result set, you can use the parallel scan feature.