All Products
Search
Document Center

Tablestore:Create a search index

Last Updated:Apr 30, 2025

You can call the CreateSearchIndex operation to create a search index for a data table. When you create a search index, you can add the fields that you want to query to the search index and configure advanced settings for the search index. For example, you can configure the routing key and presorting settings.

Prerequisites

  • A client is initialized. For more information, see Initialize a Tablestore client.

  • A data table that meets the following conditions is created. For more information, see Create a data table.

    • The max versions parameter is set to 1.

    • The time to live (TTL) is set to -1 or updates on the data table are prohibited.

Usage notes

  • The data types of the fields in a search index must match the data types of the fields in the data table for which the search index is created. For more information, see Data types.

  • To set the time_to_live parameter of a search index to a value other than -1, make sure that the UpdateRow operation is prohibited on the data table for which the search index is created. The value of the time_to_live parameter of the search index must be less than or equal to the value of the time_to_live parameter of the data table. For more information, see Specify the TTL of a search index.

Parameters

When you create a search index, you must specify the table_name, index_name, and schema parameters. In the schema parameter, configure the field_schemas, index_setting, and index_sort parameters. The following table describes the preceding parameters.

Parameter

Description

table_name

The name of the table.

index_name

The name of the search index.

field_schemas

The list of field schemas. In each field schema, configure the following parameters:

  • field_name (required): the name of the field in the search index. The value is used as a column name. Type: String.

    A field in a search index can be a primary key column or an attribute column.

  • field_type (required): the type of the field. Specify the type in the FieldType.XXX format. For more information, see Data types.

  • is_array (optional): specifies whether the value is an array. Type: Boolean.

    If you set this parameter to true, the field stores data as an array. Data written to the field must be a JSON array. Example: ["a","b","c"].

    Nested values are an array. If you set the field_type parameter to Nested, ignore this parameter.

  • index (optional): specifies whether to enable indexing for the field. Type: Boolean.

    Default value: true. A value of true specifies that Tablestore indexes the field that has an inverted indexing schema or a spatio-temporal indexing schema. A value of false specifies that Tablestore does not enable indexing for the field.

  • analyzer (optional): the type of the analyzer that you want to use. If you set the field_type parameter to Text, you can configure this parameter. Otherwise, single-word tokenization is automatically used as the analyzer. For more information, see Tokenization.

  • enable_sort_and_agg (optional): specifies whether to enable sorting and aggregation for the field. Type: Boolean.

    Sorting can be enabled only for fields for which the enable_sort_and_agg parameter is set to True. For more information, see Sorting and paging.

    Important

    Nested fields do not support the sorting and aggregation feature. The subfields of Nested fields support the sorting and aggregation feature.

  • store (optional): specifies whether to store the value of the field in the search index. Type: Boolean.

    If you set the store parameter to true, you can read the value of the field from the search index without the need to query the data table. This improves query performance.

  • sub_field_schemas (optional): the list of field schemas for subfields. If the field is a Nested field, you must configure this parameter to specify the index types of subfields in the Nested field.

  • isVirtualField (optional): specifies whether the field is a virtual column. Type: Boolean. Default value: false. If you set this parameter to True, you can use virtual columns. For more information, see Virtual columns.

  • source_field_name (optional): the name of the source field to which the virtual column is mapped in the data table. Type: String.

    Important

    If you set the is_virtual_field parameter to true, you must configure this parameter.

  • date_formats (optional): the format of dates. Type: String. For more information, see Date data type.

    Important

    If you set the field_type parameter to Date, you must configure this parameter.

  • enable_highlighting (optional): specifies whether to enable the highlight feature for the field. Type: Boolean. Default value: False. A value of False specifies that the highlight feature is disabled for the field. If you set this parameter to True, you can use the highlight feature. Only Text fields support the highlight feature. For more information, see Highlight the query results.

    Important

    Tablestore SDK for Python V6.0.0 or later supports the highlight feature.

  • vector_options (optional): the properties of Vector fields. If you set the field_type parameter to Vector, you must configure this parameter. A Vector field contains the following properties:

    • data_type: the vector type. Only float32 is supported. If you want to use other vector types, submit a ticket.

    • dimension: the vector dimension. The maximum number of dimensions for a Vector field is 2,048.

    • metric_type: the algorithm that you want to use to measure the distance between vectors. Valid values: euclidean, cosine, and dot_product.

      • euclidean: the Euclidean distance algorithm that measures the shortest path between two vectors in a multi-dimensional space. For better performance, the Euclidean distance algorithm in Tablestore does not perform the final square root calculation. A greater value that is obtained by using the Euclidean distance algorithm indicates a higher similarity between two vectors.

      • cosine: the cosine similarity algorithm that calculates the cosine of the angle between two vectors in a vector space. A greater value that is obtained by using the cosine similarity algorithm indicates a higher similarity between two vectors. In most cases, the algorithm is used to calculate the similarity between text data.

      • dot_product: the dot product algorithm that multiplies the corresponding coordinates of two vectors of the same dimension and adds the products. A greater value that is obtained by using the dot product algorithm indicates a higher similarity between two vectors.

      For more information, see Distance measurement algorithms for vectors.

index_setting

The settings of the search index, including the settings of the routing_fields parameter.

routing_fields (optional): the custom routing fields. You can specify multiple primary key columns as the routing fields. In most cases, you need to specify only one routing field. If you specify multiple routing fields, the system concatenates the values of the routing fields into one value as the partition key.

Tablestore distributes data that is written to a search index across different partitions based on the specified routing fields. The data that has the same routing field values is distributed to the same partition.

index_sort

The presorting settings of the search index, including the settings of the sorters parameter. If you do not configure the index_sort parameter, field values are sorted by primary key.

Note

If you set the field_type parameter to Nested, you cannot configure the indexSort parameter.

sorters (required): the presorting method of the search index. Valid values: PrimaryKeySort and FieldSort. For more information, see Sorting and paging.

  • PrimaryKeySort: sorts data by primary key. If you set the sorters parameter to PrimaryKeySort, you must configure the following parameter:

    sort_order: the sorting order. Data can be sorted in ascending or descending order. By default, data is sorted in ascending order.

  • FieldSort: sorts data by field value. If you set the sorters parameter to FieldSort, you must configure the following parameters:

    Only fields for which indexing is enabled and the enable_sort_and_agg parameter is set to true can be presorted.

    • field_name: the name of the field that is used to sort data.

    • sort_order: the sort order. Data can be sorted in ascending or descending order. By default, data is sorted in ascending order.

    • sort_mode: the sorting method that is used if the field contains multiple values.

Examples

Create a search index with an analyzer type specified

The following sample code provides an example on how to create a search index with an analyzer type specified. In this example, the search index consists of the following fields: the k field of the Keyword type, the t field of the Text type, the g field of the Geo-point type, the ka field of the array Keyword type, the la field of the array Long type, and the n field of the Nested type. The n field consists of the following subfields: the nk field of the Keyword type, the nl field of the Long type, and the nt field of the Text type.

def create_search_index(client):
    # Create an index on the Keyword field and enable the aggregation feature for the field. 
    field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True)
    # Create an index on the Text field and set the analyzer type to single-word tokenization. 
    field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SINGLEWORD)
    # Create an index on the Text field and set the analyzer type to fuzzy tokenization. 
    #field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.FUZZY,analyzer_parameter=FuzzyAnalyzerParameter(1, 6))
    # Create an index on the Text field and set the analyzer type to delimiter tokenization. 
    #field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SPLIT, analyzer_parameter = SplitAnalyzerParameter(","))
    # Create an index on the Geo-point field. 
    field_c = FieldSchema('g', FieldType.GEOPOINT, index=True, store=True)
    # Create an index on the array Keyword field. 
    field_d = FieldSchema('ka', FieldType.KEYWORD, index=True, is_array=True, store=True)
    # Create an index on the array Long field. 
    field_e = FieldSchema('la', FieldType.LONG, index=True, is_array=True, store=True)

    # The Nested field consists of three subfields: the nk subfield of the Keyword type, the nl subfield of the Long type, and the nt subfield of the Text type. 
    field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
        FieldSchema('nk', FieldType.KEYWORD, index=True, store=True),
        FieldSchema('nl', FieldType.LONG, index=True, store=True),
        FieldSchema('nt', FieldType.TEXT, index=True, store=True),
    ])

    fields = [field_a, field_b, field_c, field_d, field_e, field_n]

    index_setting = IndexSetting(routing_fields=['PK1']) 
    index_sort = None # If the search index contains Nested fields, you cannot configure presorting for the search index.
    #index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
    index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
    client.create_search_index('<TABLE_NAME>', '<SEARCH_INDEX_NAME>', index_meta)

Create a search index that contains Vector fields

The following sample code provides an example on how to create a search index that contains Vector fields. In this example, the search index consists of the following fields: the col_keyword field of the Keyword type, the col_long field of the Long type, and the col_vector field of the Vector type. The dot product algorithm is used to measure the distance of vectors.

def create_search_index(client):
    index_meta = SearchIndexMeta([
        FieldSchema('col_keyword', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True),  # The Keyword type.
        FieldSchema('col_long', FieldType.LONG, index=True, store=True),  # The Long type.
        FieldSchema("col_vector", FieldType.VECTOR,  # The Vector type.
                    vector_options=VectorOptions(
                        data_type=VectorDataType.VD_FLOAT_32,
                        dimension=4,  # Number of vector dimensions: 4. Distance measurement algorithm used for the vector: dot product.
                        metric_type=VectorMetricType.VM_DOT_PRODUCT
                    )),

    ])
    client.create_search_index(table_name, index_name, index_meta)

Create a search index with the highlight feature enabled

The following sample code provides an example on how to create a search index with the highlight feature enabled. In this example, the search index consists of the following fields: the k field of the Keyword type, the t field of the Text type, and the n field of the Nested type. The n field consists of the following subfields: the nk field of the Keyword type, the nk field of the Long type, and the nt field of the Text type. In addition, the highlight feature is enabled for the t field and nt subfield of the Text field.

def create_search_index0905(client):
    # Create an index on the Keyword field and enable the aggregation feature for the field. 
    field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True)
    # Create an index on the Text field, set the analyzer type of the field to single-word tokenization, and enable the highlight feature for the field. 
    field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SINGLEWORD,
                        enable_highlighting=True)

    # Create an index on the Nested field that consists of the following subfields: the nk field of the Keyword type, the nl field of the Long type, and the nt field of the Text type. Enable the highlight feature for the nt subfield of the Text type. 
    field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
        FieldSchema('nk', FieldType.KEYWORD, index=True, store=True),
        FieldSchema('nl', FieldType.LONG, index=True, store=True),
        FieldSchema('nt', FieldType.TEXT, index=True, store=True, enable_highlighting=True),
    ])

    fields = [field_a, field_b, field_n]

    index_setting = IndexSetting(routing_fields=['id'])
    index_sort = None # If the search index contains Nested fields, you cannot configure presorting for the search index.
    # index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
    index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
    client.create_search_index('pythontest', 'pythontest_0905', index_meta)

FAQ

References