You can call the CreateSearchIndex operation to create one or more search indexes for a data table. When you create a search index, you can add the fields that you want to query to the search index and configure advanced settings for the search index. For example, you can configure the routing key and presorting settings.
Prerequisites
An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
A data table for which the max_version parameter is set to 1 is created. One of the following conditions must be met by the time_to_live parameter of the data table. For more information, see Create a data table.
The time_to_live parameter of the data table is set to -1, which specifies that the data in the data table never expires.
The time_to_live parameter of the data table is set to a value other than -1, and update operations on the data table are prohibited.
Usage notes
The data types of fields in a search index must match the data types of fields in the data table for which the search index is created. For more information, see Data type mappings.
To specify a value other than -1 for the time_to_live parameter of a search index, you must disable the UpdateRow operation on the data table for which the search index is created. The value of the time_to_live parameter for the search index must be less than or equal to the value of the time_to_live parameter for the data table. For more information, see TTL of search indexes.
Parameters
When you create a search index, you must configure the table_name, index_name, and schema parameters. You must also configure the field_schemas, index_setting, and index_sort parameters in schema. The following table describes the parameters.
Parameter | Description |
table_name | The name of the data table. |
index_name | The name of the search index. |
field_schemas | The list of field schemas. You can configure the following parameters for each field schema:
|
index_setting | The settings of the search index, including routing_fields. routing_fields: optional. This parameter specifies custom routing fields. You can specify multiple primary key columns as routing fields. Tablestore distributes data that is written to a search index across different partitions based on the specified routing fields. The data that has the same routing field values is distributed to the same partition. |
index_sort | The presorting settings of the search index, including the setting of the sorters parameter. If no value is specified for the index_sort parameter, field values are sorted by primary key. Note If you set the field_type parameter to Nested, you cannot specify the index_sort parameter. sorters: required. This parameter specifies the presorting method for the search index. PrimaryKeySort and FieldSort are supported. For more information, see Perform sorting and paging.
|
Examples
Create a search index and specify the analyzer type
The following sample code provides an example on how to create a search index and specify an analyzer type. In this example, the search index consists of the following fields: the k field of the Keyword type, the t field of the Text type, the g field of the Geo-point type, the ka field of the array Keyword type, the la field of the array Long type, and the n field of the Nested type. The n field consists of the following subfields: the nk field of the Keyword type, the nl field of the Long type, and the nt field of the Text type.
def create_search_index(client):
# Create an index on the field of the Keyword type and enable aggregation for the field.
field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True)
# Create an index on the field of the Text type and set the analyzer type to single-word tokenization for the field.
field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SINGLEWORD)
# Create an index on the field of the Text type and set the analyzer type to fuzzy tokenization for the field.
#field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.FUZZY,analyzer_parameter=FuzzyAnalyzerParameter(1, 6))
# Create an index on the field of the Text type and set the analyzer type to delimiter tokenization for the field.
#field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SPLIT, analyzer_parameter = SplitAnalyzerParameter(","))
# Create an index on the field of the Geo-point type.
field_c = FieldSchema('g', FieldType.GEOPOINT, index=True, store=True)
# Create an index on the field of the array Keyword type.
field_d = FieldSchema('ka', FieldType.KEYWORD, index=True, is_array=True, store=True)
# Create an index on the field of the array Long type.
field_e = FieldSchema('la', FieldType.LONG, index=True, is_array=True, store=True)
# The field of the Nested type consists of three subfields: the nk subfield of the Keyword type, the nl subfield of the Long type, and the nt subfield of the Text type.
field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
FieldSchema('nk', FieldType.KEYWORD, index=True, store=True),
FieldSchema('nl', FieldType.LONG, index=True, store=True),
FieldSchema('nt', FieldType.TEXT, index=True, store=True),
])
fields = [field_a, field_b, field_c, field_d, field_e, field_n]
index_setting = IndexSetting(routing_fields=['PK1'])
index_sort = None # If the search index contains fields of the Nested type, presorting cannot be configured for the search index.
#index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
client.create_search_index('<TABLE_NAME>', '<SEARCH_INDEX_NAME>', index_meta)
Create a search index and configure vector fields
The following sample code provides an example on how to create a search index and configure a vector field. In this example, the search index consists of the following fields: the col_keyword field of the Keyword type, the col_long field of the Long type, and the col_vector field of the Vector type. The dot product algorithm is used to measure the distance of the vector field.
def create_search_index(client):
index_meta = SearchIndexMeta([
FieldSchema('col_keyword', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True), # The Keyword type.
FieldSchema('col_long', FieldType.LONG, index=True, store=True), # The Long type.
FieldSchema("col_vector", FieldType.VECTOR, # The Vector type.
vector_options=VectorOptions(
data_type=VectorDataType.VD_FLOAT_32,
dimension=4, # Number of dimensions for the vector: 4. Distance measurement algorithm for the vector: dot product.
metric_type=VectorMetricType.VM_DOT_PRODUCT
)),
])
client.create_search_index(table_name, index_name, index_meta)
FAQ
References
After you create a search index, you can use the query methods provided by the search index to query data from multiple dimensions based on your business requirements. When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, geo query, Boolean query, KNN vector query, nested query, and exists query.
If you call the Search operation to query data, you can sort or paginate the rows that meet the query conditions by using the sorting and paging features. For more information, see Sorting and paging.
If you call the Search operation to query data, you can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
You can specify the TTL for a search index to delete historical data in the search index or extend the retention period of data in the search index. For more information, see TTL of search indexes.
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.
If you want to add indexed columns to, update indexed columns in, or remove indexed columns from a search index, you can use the feature that allows you to dynamically modify the schema of the search index. For more information, see Dynamically modify the schema of a search index.
You can call the ListSearchIndex operation to query all search indexes that are created for a data table. For more information, see List search indexes.
You can call the DescribeSearchIndex operation to query the description of a search index. For example, you can query the field information and search index configurations. For more information, see Query the description of a search index.
If you no longer need to use a search index, you can delete the search index. For more information, see Delete search indexes.