You can call the CreateSearchIndex operation to create one or more search indexes for a data table.

Prerequisites

  • An OTSClient instance is initialized. For more information, see Initialization.
  • A data table whose time_to_live is set to -1 and max_versions is set to 1 is created.

Parameters

When you create a search index, you must configure the table_name, index_name, and schema parameters. You must also configure the field_schemas, index_setting, and index_sort parameters in schema. The following table describes the parameters.

Parameter Description
table_name The name of the data table.
index_name The name of the search index.
field_schemas The list of field schemas. You can configure the following parameters for each field schema:
  • field_name: required. This parameter specifies the name of the field in the search index. The value is used as the column name. Type: String.

    A field in a search index can be a primary key column or an attribute column.

  • field_type: required. This parameter specifies the type of the field. Use FieldType.XXX to set the type. For more information, see Data type mappings.
  • is_array: optional. This parameter specifies whether the value is an array. Type: Boolean.

    If you set this parameter to true, the column stores data as an array. Data written to the column must be a JSON array. Example: ["a","b","c"].

    Nested values are an array. If you set field_type to Nested, skip this parameter.

  • index: optional. This parameter specifies whether to enable indexing for the column. Type: Boolean.

    Default value: true. A value of true indicates that Tablestore indexes the column with an inverted indexing schema or a spatio-temporal indexing schema. A value of false indicates that Tablestore does not enable indexing for the column.

  • analyzer: optional. This parameter specifies the type of the analyzer that you want to use. If field_type is set to Text, you can configure this parameter. Otherwise, the default analyzer type single-word tokenization is used. For more information about tokenization, see Tokenization.
  • enable_sort_and_agg: optional. This parameter specifies whether to enable sorting and aggregation. Type: Boolean.
    Sorting can be performed only for fields for which enable_sort_and_agg is set to true. For more information about sorting, see Sorting and pagination.
    Important Fields of the Nested type do not support sorting and aggregation, but subcolumns of fields of the Nested type support sorting and aggregation.
  • store: optional. This parameter specifies whether to store the value of the field in the search index. Type: Boolean.

    If you set store to true, you can read the value of the field from the search index without querying the data table. This improves query performance.

  • sub_field_schemas: optional. This parameter specifies the list of field schemas for subfields. If the column is a Nested column, you must configure this parameter to specify the index types of subcolumns in the Nested column.
  • is_virtual_field: optional. This parameter specifies whether the field is a virtual column. Type: Boolean. Default value: false. This parameter is required only when you use virtual columns. For more information about virtual columns, see Virtual columns.
  • source_field_name: optional. This parameter specifies the name of the source field to which the virtual column is mapped in the data table. Type: String.
    Important This parameter is required when is_virtual_field is set to true.
  • date_formats: optional. This parameter specifies the format of dates. Type: String. For more information, see Types of date data.
    Important This parameter is required when the field type is DATE.
index_setting The settings of the search index, including routing_fields.

routing_fields: optional. This parameter specifies custom routing fields. You can specify some primary key columns as routing fields. Tablestore distributes data that is written to a search index across different partitions based on the specified routing fields. The data whose routing field values are the same is distributed to the same partition.

index_sort The presorting settings of the search index, including sorters. If no value is specified for the index_sort parameter, field values are sorted by primary key.
Note You can skip the presorting settings for search indexes that contain fields of the Nested type.
sorters: required. This parameter specifies the presorting method for the search index. PrimaryKeySort and FieldSort are supported. For more information about sorting, see Sorting and pagination.
  • PrimaryKeySort: Data is sorted by primary key. You can configure the following parameter for PrimaryKeySort:

    sort_order: the sort order. Data can be sorted in ascending or descending order. By default, data is sorted in ascending order.

  • FieldSort: Data is sorted by field value. You can configure the following parameters for FieldSort:

    Only fields for which indexing is enabled and enable_sort_and_agg is set to true can be presorted.

    • field_name: the name of the field that is used to sort data.
    • sort_order: the sort order. Data can be sorted in ascending or descending order. By default, data is sorted in ascending order.
    • sort_mode: the sorting method that is used when the field contains multiple values.

Examples

The following code provides an example on how to specify the analyzer types when a search index is created. In this example, the search index contains six fields, including the k (Keyword type), t (Text type), g (Geopoint type), ka (Keyword array type), la (Long array type), and n (Nested type) fields. The n field contains three subfields, including the nk (Keyword type), nl (Long type), and nt (Text type) subfields.

# Create an index on the field of the Keyword type and enable aggregation for the field. 
field_a = FieldSchema('k', FieldType.KEYWORD, index=True, enable_sort_and_agg=True, store=True)
# Create an index on the field of the Text type and set the analyzer type of the field to single-word tokenization. 
field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SINGLEWORD)
# Create an index on the field of the Text type and set the analyzer type of the field to fuzzy tokenization. 
#filed_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.FUZZY,analyzer_parameter=FuzzyAnalyzerParameter(1, 6))
# Create an index on the field of the Text type, set the analyzer type of the field to delimiter tokenization, and specify commas (,) as the delimiter. 
#field_b = FieldSchema('t', FieldType.TEXT, index=True, store=True, analyzer=AnalyzerType.SPLIT, analyzer_parameter = SplitAnalyzerParameter(","))
# Create an index on the field of the Geopoint type. 
field_c = FieldSchema('g', FieldType.GEOPOINT, index=True, store=True)
# Create an index on the field of the Keyword array type. 
field_d = FieldSchema('ka', FieldType.KEYWORD, index=True, is_array=True, store=True)
# Create an index on the field of the Long array type. 
field_e = FieldSchema('la', FieldType.LONG, index=True, is_array=True, store=True)

# Specify a Nested field that contains three subfields, including the nk (Keyword type), nl (Long type), and nt (Text type) subfields. 
field_n = FieldSchema('n', FieldType.NESTED, sub_field_schemas=[
    FieldSchema('nk', FieldType.KEYWORD, index=True, store=True),
    FieldSchema('nl', FieldType.LONG, index=True, store=True),
    FieldSchema('nt', FieldType.TEXT, index=True, store=True),
])

fields = [field_a, field_b, field_c, field_d, field_e, field_n]

index_setting = IndexSetting(routing_fields=['PK1']) 
index_sort = None # Presorting cannot be configured for a search index if the search index contains fields of the Nested type.
#index_sort = Sort(sorters=[PrimaryKeySort(SortOrder.ASC)])
index_meta = SearchIndexMeta(fields, index_setting=index_setting, index_sort=index_sort)
client.create_search_index(table_name, index_name, index_meta)