All Products
Search
Document Center

Tablestore:Create a search index

Last Updated:Aug 26, 2024

You can create one or more search indexes for a table in the Tablestore console or by using the Tablestore CLI or Tablestore SDKs.

Optimal method to create search indexes

We recommend that you determine the number of search indexes that you want to create for a data table based on your query requirements.

If you have a data table that contains the id, name, age, city, and sex fields, you can create search indexes to query data by name, age, or city by using one of the following methods:

  • Method 1: Create a search index for each field

    If you use this method, you must create the following search indexes: name_index, age_index, and city_index.

    • To query students by city, use the city_index search index. To query students by age, use the age_index search index.

    • However, if you want to query students who are younger than 12 years old and live in City A, this method does not work.

    The implementation of this method is similar to that of the global secondary index feature. However, this method is not cost-effective. We recommend that you use Method 2 to create search indexes.

  • Method 2: Create one search index for multiple fields

    In this method, a search index named student_index is created. The search index contains the following fields: name, age, and city.

    • To query students by city, query the city field in the student_index search index. To query students by age, query the age field in the student_index search index.

    • To query students who are younger than 12 years old and live in City A, query the age and city fields in the student_index search index.

    This method makes full use of the advantages of search indexes and is more cost-effective. We recommend that you use this method to create a search index.

Limits

  • Timeliness of search index creation

    After you create a search index, you must wait for a few seconds before you can use the search index. During this period, you can write data to the data table, but you cannot query the metadata of the index or query data by using the index.

  • When you create a search index, make sure that the types of fields in the search index match the types of fields in the data table for which the search index is created. For more information, see Data type mappings.

  • Number of search indexes

    For more information, see Search index limits.

Parameters

When you create a search index, you must configure the tableName, indexName, and indexSchema parameters. You must also configure the fieldSchemas, indexSetting, and indexSort parameters in the indexSchema parameter. The following table describes the parameters.

Parameter

Description

tableName

The name of the data table.

indexName

The name of the search index.

fieldSchemas

The list of field schemas. In each field schema, configure the following parameters:

  • fieldName: This parameter is required and specifies the name of the field in the search index. The value is used as a column name. Type: String.

    A field in a search index can be a primary key column or an attribute column of the data table for which the search index is created.

  • fieldType: This parameter is required and specifies the type of the field. Specify the type in the FieldType.XXX format. For more information, see Data type mappings.

  • array: This parameter is optional and specifies whether the value is an array. Type: Boolean.

    If you set this parameter to true, the column stores data as an array. Data written to the column must be a JSON array. Example: ["a","b","c"].

    The values of fields of the Nested type are arrays. If you set fieldType to Nested, skip this parameter.

  • analyzer: This parameter is optional and specifies the type of the analyzer that you want to use. If you set the fieldType parameter to Text, you can configure this parameter. If you do not configure this parameter, the default analyzer type single-word tokenization is used. For more information, see Tokenization.

  • isVirtualField: This parameter is optional and specifies whether the field is a virtual column. Type: Boolean. Default value: false. You must set this parameter to true for virtual columns. For more information, see Virtual columns.

  • sourceFieldName: This parameter is optional and specifies the name of the source field to which the virtual column is mapped in the data table. Type: String. If you set the isVirtualField parameter to true, you must configure this parameter.

  • dateFormats: This parameter is optional and specifies the format of dates. Type: String. If you set the fieldType parameter to Date, you must configure this parameter. For more information, see Date data type.

  • vectorOptions: This parameter is optional and specifies the properties of vector fields. If you set the fieldType parameter to Vector, you must configure this parameter. You can use the following parameters to specify the properties of vector fields:

    • dataType: the type of vector data. Only float32 is supported. If you want to use other types of vector data, submit a ticket.

    • dimension: the number of dimensions of the vector. For information about the limits on the number of dimensions of a vector, see Search index limits.

    • vectorOptions: This parameter is optional and specifies the properties of vector fields. If you set the fieldType parameter to Vector, you must configure this parameter. You can use the following parameters to specify the properties of vector fields:

      • dataType: the type of vector data. Only float32 is supported. If you want to use other types of vector data, submit a ticket.

      • dimension: the dimension of the vector. For information about the limits on the dimension of a vector, see Search index limits.

      • metricType: the algorithm that you want to use to measure the distance between vectors. Valid values: euclidean, cosine, and dot_product.

        • euclidean: the Euclidean distance algorithm that measures the shortest path between two vectors in a multi-dimensional space. The Euclidean distance algorithm in Tablestore does not perform the final square root calculation to achieve better performance. A greater value that is obtained by using the Euclidean distance algorithm indicates a higher similarity between two vectors.

        • cosine: the cosine similarity algorithm that calculates the cosine of the angle between two vectors in a vector space. A greater value that is obtained by using the cosine similarity algorithm indicates a higher similarity between two vectors. In most cases, the algorithm is used to calculate the similarity between text data.

        • dot_product: the dot product algorithm that multiplies the corresponding coordinates of two vectors of the same dimension and adds the products. A greater value that is obtained by using the dot product algorithm indicates a higher similarity between two vectors.

        For more information, see Appendix: distance measurement algorithms for vectors.

indexSetting

The settings of the search index, including routingFields.

routingFields: This parameter is optional and specifies custom routing fields. You can specify specific primary key columns as routing fields. Tablestore distributes data that is written to a search index across different partitions based on the specified routing fields. The data whose routing field values are the same is distributed to the same partition.

indexSort

The presorting settings of the search index, including sorters. If the indexSort parameter is left empty, field values are sorted by primary key.

Note

You can skip the presorting settings for search indexes that contain fields of the Nested type.

sorters: This parameter is optional and specifies the presorting method for the search index. PrimaryKeySort and FieldSort are supported. For more information, see Perform sorting and paging.

timeToLive

This parameter is optional and specifies the retention period of data in the search index. Unit: seconds. Default value: -1.

If the retention period of data exceeds the value of the timeToLive parameter, the data expires. Tablestore automatically deletes expired data.

The value of this parameter must be greater than or equal to 86400. A value of 86400 specifies one day. You can also set this parameter to -1, which specifies that data never expires.

For information about how to use the time to live (TTL) feature for search indexes, see Specify the TTL of a search index.

Usage notes

  • To perform full-text search, you can set the type of the field in a search index to Text and specify the type of the analyzer for the Text field. Then, you can use match query or match phrase query to query data. For information about tokenization, see Tokenization.

  • You can map columns in data tables to virtual columns in search indexes to query new fields and the data of new field types without the need to modify the storage schema and the data in the tables. For more information, see Virtual columns.

  • You can map columns of the String or Integer type in data tables to fields of the Date type in search indexes. This way, you can query data of the Date type. For more information, see Date data type.

  • If you want to perform an approximate nearest neighbor search based on vectors, you can set the type of the field to Vector in a search index and use the k-nearest neighbor (KNN) vector query feature to query data. For more information, see Overview.

  • If you want to store data as arrays in a search index, you can store the data in columns of the String type in the data table, and set the type of the mapped fields in the search index to Array.

    When you write data to the data table, the values of the columns to which the fields of the Array type in the search index is mapped must be JSON arrays. Example: ["a","b","c"].

    Note

    The fields of search indexes support the Array type. However, the columns of data tables do not support the Array type. For more information, see ARRAY and Nested field types.

  • If you want to use a field to store data that has a hierarchical relationship or one-to-many relationship, you can set the type of the field in the search index to Nested, and use nested query to query data.

    When you write data to the data table, the values of the columns to which the fields of the Nested type in the search index is mapped must be JSON arrays. Example: [{"tagName":"tag1", "score":0.8}, {"tagName":"tag2", "score":0.2}].

  • If the data that you want to store in the data table is geographical location information, you can set the type of the fields in the search index to Geo-point, and use geo query, geo-bounding box query, or geo-polygon query to query data.

  • By default, the system uses the partition key as the routing key. You can specify custom routing fields to use other primary key columns as the routing key. This reduces the query latency because the partitions scanned by Tablestore are narrowed down. For more information, see How do I use routing fields?

  • By default, if you use a search index to query data, the query results are returned and sorted in ascending order of the primary key. You can specify a presorting method to sort the query results based on the values of specific columns or in descending order of the primary key. For more information, see Index presorting.

  • You can use the TTL feature to automatically delete historical data in a search index. This reduces storage usage and storage costs. For more information, see Specify the TTL of a search index.

Methods

You can create a search index in the Tablestore console or by using the Tablestore CLI or Tablestore SDKs.

Before you create a search index, make sure that the following preparations are complete:

Use the Tablestore console

You can create a search index in the Tablestore console.

  1. Go to the Indexes tab.

    1. Log on to the Tablestore console.

    2. In the top navigation bar, select a resource group and a region.

    3. On the Overview page, click the name of the instance that you want to manage or click Manage Instance in the Actions column of the instance.

    4. On the Tables tab of the Instance Details tab, click the name of the data table for which you want to create a search index or click Indexes in the Actions column of the data table.

  2. On the Indexes tab, click Create Search Index.

  3. In the Create Index dialog box, configure the parameters of the search index.

    image.png

    1. Retain the default index name that is generated by the system or enter a custom index name based on your business requirements.

    2. Configure the schema of the search index.

      Important

      The values of the Field Name and Field Type parameters must match those of the data table. For more information about the mappings between the field types in data tables and those in search indexes, see Data type mappings.

      • If you set the Schema Generation Type parameter to Manual, specify the names and types for the fields. Specify whether to turn on Array for each field.

      • If you set the Schema Generation Type parameter to Auto Generate, the system automatically uses the primary key columns and attribute columns of the data table as index fields. You can select field types and specify whether to turn on Array based on your business requirements.

      Note

      To optimize performance in some cases, you can use virtual columns. For more information, see Virtual columns.

    3. If you want to configure advanced settings, such as the Routing Key, Time to Live, and Pre-sorting parameters, turn on Advanced Settings. The following table describes the parameters.

      Parameter

      Description

      Routing Key

      The custom routing fields. You can select one or more primary key columns as the routing fields. Tablestore calculates the distribution of the index data based on the values of the routing fields. The records with the same values of the routing fields are distributed to the same data partition.

      Time to Live

      The retention period of data in the search index. The default value is -1, which indicates that the data never expires. The value of this parameter must be greater than or equal to 86400. A value of 86400 indicates one day. You can also set this parameter to -1.

      Specify a value that is greater than or equal to 86400 if you want the system to automatically clear the historical data in the search index. If the retention period exceeds the TTL value, the data expires and Tablestore automatically deletes expired data.

      Pre-sorting

      The default order in which the data is returned.

      Valid values: Default and Custom. A value of Default indicates that the data is sorted based on the primary key. A value of Custom indicates that the data is sorted based on the field that you specify. Configure the Pre-sorting parameter based on your business requirements.

      Important

      Search indexes that contain fields of the Nested type do not support index presorting.

  4. Click OK.

    After the search index is created, click Index Details in the Actions column of the search index. You can view the information about the search index, such as the metering information and index fields.

Use the Tablestore CLI

You can run the create_search_index command by using the Tablestore CLI to create a search index. For more information, see Search index.

  1. Run the create_search_index command to create a search index named search_index.

    create_search_index -n search_index
  2. Enter the index schema as prompted:

    The index schema includes the settings of the search index (IndexSetting), the list of field schemas (FieldSchemas), and the presorting settings for the search index (IndexSort). For more information about the schema of a search index, see Create a search index.

     {
    
        "IndexSetting": {
            "RoutingFields": null
        },
        "FieldSchemas": [
            {
                "FieldName": "gid",
                "FieldType": "LONG",
                "Index": true,
                "EnableSortAndAgg": true,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": false
            },
            {
                "FieldName": "uid",
                "FieldType": "LONG",
                "Index": true,
                "EnableSortAndAgg": true,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": false
            },
            {
                "FieldName": "col2",
                "FieldType": "LONG",
                "Index": true,
                "EnableSortAndAgg": true,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": false
            },
            {
                "FieldName": "col3",
                "FieldType": "TEXT",
                "Index": true,
                "Analyzer": "single_word",
                "AnalyzerParameter": {
                    "CaseSensitive": true,
                    "DelimitWord": null
                },
                "EnableSortAndAgg": false,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": false
            },
            {
                "FieldName": "col1",
                "FieldType": "KEYWORD",
                "Index": true,
                "EnableSortAndAgg": true,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": false
            },
            {
                "FieldName": "col3V",
                "FieldType": "LONG",
                "Index": true,
                "EnableSortAndAgg": true,
                "Store": true,
                "IsArray": false,
                "IsVirtualField": true,
                "SourceFieldNames": [
                    "col3"
                ]
            }
        ]
    }

Use Tablestore SDKs

You can create a search index by using the following Tablestore SDKs: Tablestore SDK for Java, Tablestore SDK for Go, Tablestore SDK for Python, Tablestore SDK for Node.js, Tablestore SDK for .NET, and Tablestore SDK for PHP. In this example, Tablestore SDK for Java is used to create a search index.

Create a search index by using the default configurations

The following sample code provides an example on how to create a search index by using the default configurations. In this example, the search index consists of the following columns: the Col_Keyword column of the Keyword type, the Col_Long column of the Long type, and the Col_Vector column of the Vector type. The data in the search index is presorted based on the primary key of the data table and data in the search index never expires.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Specify the names and types of the fields. 
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG)));
            // Set the type of the field to Vector. 
            new FieldSchema("Col_Vector", FieldType.VECTOR).setIndex(true)
                    // Set the number of dimensions for the vector to 4 and the distance measurement algorithm for vectors to the dot product algorithm. 
                    .setVectorOptions(new VectorOptions(VectorDataType.FLOAT_32, 4, VectorMetricType.DOT_PRODUCT))
    ));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request); 
}

Create a search index with the indexSort parameter specified

The following sample code provides an example on how to create a search index with the indexSort parameter specified. In this example, the search index consists of the following columns: the Col_Keyword column of the String type, the Col_Long column of the Long type, the Col_Text column of the Text type, and the Timestamp column of the Long type. The data in the search index is presorted based on the Timestamp column.

private static void createSearchIndexWithIndexSort(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            new FieldSchema("Col_Keyword", FieldType.KEYWORD),
            new FieldSchema("Col_Long", FieldType.LONG),
            new FieldSchema("Col_Text", FieldType.TEXT),
            new FieldSchema("Timestamp", FieldType.LONG)
                    .setEnableSortAndAgg(true)));
    // Presort data based on the Timestamp column. 
    indexSchema.setIndexSort(new Sort(
            Arrays.<Sort.Sorter>asList(new FieldSort("Timestamp", SortOrder.ASC))));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request);
}

Create a search index with the TTL specified

Important

Make sure that update operations on the data table are disabled.

The following sample code provides an example on how to create a search index with the TTL specified. In this example, the search index consists of the following columns: the Col_Keyword column of the String type and the Col_Long column of the Long type. The TTL of the search index is seven days.

// Use Tablestore SDK for Java V5.12.0 or later to create a search index. 
public static void createIndexWithTTL(SyncClient client) {
    int days = 7;
    CreateSearchIndexRequest createRequest = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    createRequest.setTableName("<TABLE_NAME>");
    // Specify the name of the search index. 
    createRequest.setIndexName("<SEARCH_INDEX_NAME>");
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Specify the names and types of the fields. 
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG)));
    createRequest.setIndexSchema(indexSchema);
    // Specify the TTL for the search index. 
    createRequest.setTimeToLiveInDays(days);
    // Call the client to create the search index. 
    client.createSearchIndex(createRequest);
}

Create a search index with virtual columns specified

The following sample code provides an example on how to create a search index that consists of the Col_Keyword and Col_Long columns. Each of the columns has a virtual column. The Col_Keyword_Virtual_Long column is mapped to the Col_Keyword column in the data table, and the Col_Long_Virtual_Keyword column is mapped to the Col_Long column in the data table.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Specify the name of the data table. 
    request.setTableName("<TABLE_NAME>"); 
    // Specify the name of the search index. 
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
        // Specify the name and type of the field. 
        new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
        // Specify the name and type of the field. 
        new FieldSchema("Col_Keyword_Virtual_Long", FieldType.LONG) 
             // Specify whether the field is a virtual column. 
            .setVirtualField(true) 
             // Specify the name of the source field to which the virtual column is mapped in the data table. 
            .setSourceFieldName("Col_Keyword"), 
        new FieldSchema("Col_Long", FieldType.LONG),
        new FieldSchema("Col_Long_Virtual_Keyword", FieldType.KEYWORD)
            .setVirtualField(true)
            .setSourceFieldName("Col_Long")));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index. 
    client.createSearchIndex(request); 
}

What to do next

After you create a search index, you can use the search index to query, analyze, and export data.

Operation

Available features

Query data

Select a query method based on your business requirement.

When you call the Search operation to query data, you can sort or paginate the rows that meet the query conditions. For more information, see Perform sorting and paging.

Analyze data

Aggregation

Export data

Parallel scan

References

After you create a search index, you can perform operations on the search index based on your business requirements.

  • You can specify the TTL for a search index to delete historical data in the search index or extend the retention period of data in the search index. For more information, see Configure the TTL of a search index.

  • You can dynamically modify the schema of a search index to add, update, or remove index columns in the search index. For more information, see Dynamically modify the schema of a search index.

  • You can call the ListSearchIndex operation to query all search indexes that are created for a data table. For more information, see Query search indexes.

  • You can call the DescribeSearchIndex operation to query the description of a search index, such as the field information and search index configurations. For more information, see Query the description of a search index.

  • You can delete a search index that you no longer require. For more information, see Delete a search index.

  • You can also use the SQL query feature or compute engines, such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, Flink, and PrestoDB, to compute and analyze data in tables. For more information, see SQL query and Overview.

Appendix: distance measurement algorithms for vectors

The following table describes the algorithms that you can use to measure the distance for vectors. A greater value that is obtained by using an algorithm indicates a higher similarity.

MetricType

Formula

Performance

Description

Euclidean distance

image

Relatively high

Measures the shortest path between two vectors in a multi-dimensional space. The Euclidean distance algorithm in Tablestore does not perform the final square root calculation to achieve better performance. A greater value that is obtained by using the Euclidean distance algorithm indicates a higher similarity between two vectors.

Dot product

image

Highest

Multiplies the corresponding coordinates of two vectors of the same dimension and adds the products. A greater value that is obtained by using the dot product algorithm indicates a higher similarity between two vectors.

Float32 vectors must be normalized before they are written to tables. For example, you can use the L2 norm to normalize Float32 vectors. If you do not normalize Float32 vectors before they are written to tables, you may encounter issues such as inaccurate query results, slow construction of vector indexes, and poor query performance.

Cosine similarity

image

Relatively low

Calculates the cosine of the angle between two vectors in a vector space. A greater value that is obtained by using the cosine similarity algorithm indicates a higher similarity between two vectors. In most cases, the algorithm is used to calculate the similarity between text data.

If 0 is used as a divisor, the cosine similarity cannot be calculated because 0 cannot be used as a divisor. Therefore, the sum of squares of Float32 vectors cannot be 0.

It is complex to calculate the cosine similarity. We recommend that you normalize vectors before you write data to tables and use the dot product algorithm to measure the distance between vectors.

The following sample code provides an example on how to normalize vectors:

  public static float[] l2normalize(float[] v, boolean throwOnZero) {
    double squareSum = 0.0f;
    int dim = v.length;
    for (float x : v) {
      squareSum += x * x;
    }
    if (squareSum == 0) {
      if (throwOnZero) {
        throw new IllegalArgumentException("can't normalize a zero-length vector");
      } else {
        return v;
      }
    }
    double length = Math.sqrt(squareSum);
    for (int i = 0; i < dim; i++) {
      v[i] /= length;
    }
    return v;
  }