Create a search index by using Tablestore SDK for Java - Tablestore

Use the CreateSearchIndex method to create a search index for a data table. A data table supports multiple search indexes. When you create a search index, add the fields that you want to query to the index. You can also configure advanced options, such as routing fields and presorting.

Prerequisites

Initialize a Tablestore client. For more information, see Initialize a Tablestore client.
Create a data table that meets the following conditions. For more information, see Create a data table.
- The max versions must be 1.
- The time to live (TTL) is -1 or updates on the data table are disabled.

Usage notes

The data types of the fields in a search index must match the data types of the fields in the data table. For more information, see Data types.
To set a time to live (TTL) value other than -1 for a search index, you must disable the UpdateRow operation for the data table. The TTL value of the search index must be less than or equal to the TTL value of the data table. For more information, see Lifecycle management.

API

public class CreateSearchIndexRequest implements Request {
    /** The name of the data table. */
    private String tableName;
    /** The name of the search index. */
    private String indexName;
    /** The schema of the search index. */
    private IndexSchema indexSchema;
    /**
     * You do not need to set this parameter in most cases.
     * Set this parameter using the setter method only when you dynamically modify the search index schema. This parameter specifies the name of the source search index for reindexing.
     */
    private String sourceIndexName;
    /** The TTL for index data, in seconds. After you create the search index, you can call the UpdateSearchIndex operation to dynamically change this parameter. */
    private Integer timeToLive;
}

public class IndexSchema implements Jsonizable {
    /** The settings of the index. */
    private IndexSetting indexSetting;
    /** The settings for all fields in the index. */
    private List<FieldSchema> fieldSchemas;
    /** The custom presorting method for the index. */
    private Sort indexSort;
}

Parameters

When you create a search index, you must specify the data table name (tableName), search index name (indexName), and index schema (indexSchema). The indexSchema includes field schemas (fieldSchemas), index settings (indexSetting), and index presorting settings (indexSort). The following table describes the parameters.

Parameter	Description
tableName	The name of the data table.
indexName	The name of the search index.
fieldSchemas	A list of index fields. Each fieldSchema contains the following parameters: fieldName (Required): The name of the field to be indexed, which is the column name. Type: String. A field in a search index can be a primary key column or an attribute column. fieldType (Required): The data type of the field. Specify the type in the FieldType.XXX format. Note To store and query data with multilayer logical relationships, you can use the Nested type to store data. To store and query JSON-formatted data, you can store the data as strings in the data table. Then, use the array and Nested types in the search index to flexibly query the JSON data. For applications that require geo-queries, you can use the Geo-point field type to store data. Index (Optional): Specifies whether to create an index for the field. Type: Boolean. The default value is true, which means an inverted index or a spatial index is created for the column. If set to false, no index is created for this column. enableHighlighting (Optional): Specifies whether to enable the summary and highlighting feature. Type: Boolean. The default value is false. To use the summary and highlighting feature, set this parameter to true. Only fields of the Text type support this feature. analyzer (Optional): The tokenizer type. You can set this parameter for fields of the Text type. If you do not set this parameter, single-word tokenization is used by default. analyzerParameter (Optional): The parameter settings for the tokenizer. Set the parameters based on the tokenizer type. This parameter is required if you set the analyzer parameter. enableSortAndAgg (Optional): Specifies whether to enable sorting and aggregation. Type: Boolean. The default value is true. You can sort only on fields where enableSortAndAgg is set to true. Important Fields of the Text type do not support sorting and aggregation. To sort or aggregate a Text field, you can use a virtual column of the Keyword type. For more information, see Virtual columns. isArray (Optional): Specifies whether the field is an array. Type: Boolean. If set to true, the column is an array. Data written to the column must be in the JSON array format, such as `["a","b","c"]`. Because the Nested type is an array, you do not need to set this parameter when fieldType is Nested. subFieldSchemas (Optional): For a field of the Nested type, use this parameter to set the index types for the sub-columns. The type is a list of FieldSchema. isVirtualField (Optional): Specifies whether the field is a virtual column. Type: Boolean. The default value is false. To use a virtual column, set this parameter to true. sourceFieldName (Optional): The name of the source field in the data table. Type: String. This parameter is required if isVirtualField is set to true. dateFormats (Optional): The date format. Type: String. This parameter is required for fields of the Date type. For more information, see Date and time types. vectorOptions (Optional): The properties of the vector field. This parameter is required for fields of the Vector type. It includes the following parameters: dataType: The data type of the vector. Currently, only float32 is supported. For other data type requirements, submit a ticket. dimension: The number of vector dimensions. The maximum number of dimensions for a Vector field is 4,096. metricType: The algorithm to measure the distance between vectors. Supported algorithms: Euclidean distance (euclidean), cosine similarity (cosine), and dot product (dot_product). Euclidean distance (euclidean): The straight-line distance between two vectors in a multidimensional space. For performance reasons, the Euclidean distance algorithm in Tablestore does not perform the final square root calculation. A larger Euclidean distance score indicates a higher similarity between two vectors. Cosine similarity (cosine): The cosine of the angle between two vectors in a vector space. A higher cosine similarity score indicates a higher similarity between two vectors. This is often used to calculate the similarity of text data. Dot product (dot_product): Multiplies the corresponding coordinates of two vectors of the same dimension and then adds the results. A higher dot product score indicates a higher similarity between two vectors. For information about how to select a distance metric algorithm, see Distance metric algorithms. jsonType (Optional): The index type for JSON data. Valid values: OBJECT and NESTED. This parameter is required when the field type is JSON.
indexSetting	Index settings, which include the routingFields setting. routingFields (Optional): The custom routing fields. You can select some primary key columns as routing fields. In most cases, you only need to set one. If you set multiple routing keys, the system concatenates their values into a single value. When you write data to the index, the system uses the routing field values to determine the data distribution. Records with the same routing field values are indexed into the same data partition.
indexSort	Index presorting settings, which include the sorters setting. If you do not set this, data is sorted by primary key by default. Note The indexSort parameter is not supported for indexes that contain Nested types. No presorting is performed. sorters (Optional): A list of presorting methods for the index. You can sort by primary key or by field value. For more information about sorting, see Sorting and pagination. PrimaryKeySort sorts data by primary key. It includes the following setting: order: The sort order. You can sort in ascending or descending order. The default is ascending (SortOrder.ASC). FieldSort sorts data by field value. Only fields that are indexed and have sorting and aggregation enabled can be used for presorting. It includes the following settings: fieldName: The name of the field to sort by. order: The sort order. You can sort in ascending or descending order. The default is ascending (SortOrder.ASC). mode: The sorting method to use when a field has multiple values.
sourceIndexName	Optional. You do not need to set this parameter in most cases. Set this parameter using the setter method only when you dynamically modify the search index schema. This parameter specifies the name of the source search index for reindexing.
timeToLive	Optional. The time to live (TTL), which is the data retention period in seconds. The default value is -1, which means the data never expires. The value must be at least 86400 (one day) or -1. If the data retention period exceeds the specified TTL, the system automatically deletes the expired data. For more information about how to use the lifecycle of a search index, see Lifecycle management.

Examples

Create a search index with default configurations

The following example shows how to create a search index. The index contains three columns: Col_Keyword (KEYWORD type), Col_Long (LONG type), and Col_Vector (VECTOR type). The data is presorted by the primary key of the data table and never expires.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Set the data table name.
    request.setTableName("<TABLE_NAME>"); 
    // Set the search index name.
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Set the field name and type.
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG),
            // Set the vector type.
            new FieldSchema("Col_Vector", FieldType.VECTOR).setIndex(true)
                    // The vector dimension is 4, and the similarity algorithm is dot product.
                    .setVectorOptions(new VectorOptions(VectorDataType.FLOAT_32, 4, VectorMetricType.DOT_PRODUCT))
    ));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index.
    client.createSearchIndex(request); 
}

Create a search index and specify IndexSort

The following example shows how to create a search index. The index contains four columns: Col_Keyword (KEYWORD type), Col_Long (LONG type), Col_Text (TEXT type), and Timestamp (LONG type). The data is presorted by the Timestamp column.

private static void createSearchIndexWithIndexSort(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Set the data table name.
    request.setTableName("<TABLE_NAME>"); 
    // Set the search index name.
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            new FieldSchema("Col_Keyword", FieldType.KEYWORD),
            new FieldSchema("Col_Long", FieldType.LONG),
            new FieldSchema("Col_Text", FieldType.TEXT),
            new FieldSchema("Timestamp", FieldType.LONG)
                    .setEnableSortAndAgg(true)));
    // Set presorting by the Timestamp column.
    indexSchema.setIndexSort(new Sort(
            Arrays.<Sort.Sorter>asList(new FieldSort("Timestamp", SortOrder.ASC))));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index.
    client.createSearchIndex(request);
}

Create a search index and set the lifecycle

Important

Make sure that updates to the data table are disabled.

The following example shows how to create a search index. The index contains two columns: Col_Keyword (KEYWORD type) and Col_Long (LONG type). The lifecycle of the search index is set to 7 days.

// Use Tablestore SDK for Java 5.12.0 or later.
public static void createIndexWithTTL(SyncClient client) {
    int days = 7;
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Set the data table name.
    request.setTableName("<TABLE_NAME>");
    // Set the search index name.
    request.setIndexName("<SEARCH_INDEX_NAME>");
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Set the field name and type.
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG)));
    request.setIndexSchema(indexSchema);
    // Set the TTL for the search index.
    request.setTimeToLiveInDays(days);
    // Call the client to create the search index.
    client.createSearchIndex(request);
}

Create a search index and specify virtual columns

The following example shows how to create a search index that contains two columns: Col_Keyword (KEYWORD type) and Col_Long (LONG type). The index also includes two virtual columns: Col_Keyword_Virtual_Long (LONG type), which is mapped to the Col_Keyword column in the data table, and Col_Long_Virtual_Keyword (KEYWORD type), which is mapped to the Col_Long column in the data table.

private static void createSearchIndex(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Set the data table name.
    request.setTableName("<TABLE_NAME>"); 
    // Set the search index name.
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
        // Set the field name and type.
        new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
        // Set the field name and type.
        new FieldSchema("Col_Keyword_Virtual_Long", FieldType.LONG) 
             // Specify whether the field is a virtual column.
            .setVirtualField(true) 
             // The corresponding field in the data table for the virtual column.
            .setSourceFieldName("Col_Keyword"), 
        new FieldSchema("Col_Long", FieldType.LONG),
        new FieldSchema("Col_Long_Virtual_Keyword", FieldType.KEYWORD)
            .setVirtualField(true)
            .setSourceFieldName("Col_Long")));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index.
    client.createSearchIndex(request); 
}

Enable summary and highlighting when you create a search index

The following example shows how to create a search index. The index contains three columns: Col_Keyword (KEYWORD type), Col_Long (LONG type), and Col_Text (TEXT type). The summary and highlighting feature is enabled for the Col_Text column.

private static void createSearchIndexWithHighlighting(SyncClient client) {
    CreateSearchIndexRequest request = new CreateSearchIndexRequest();
    // Set the data table name.
    request.setTableName("<TABLE_NAME>"); 
    // Set the search index name.
    request.setIndexName("<SEARCH_INDEX_NAME>"); 
    IndexSchema indexSchema = new IndexSchema();
    indexSchema.setFieldSchemas(Arrays.asList(
            // Set the field name and type.
            new FieldSchema("Col_Keyword", FieldType.KEYWORD), 
            new FieldSchema("Col_Long", FieldType.LONG),
            // Enable the summary and highlighting feature for the field.
            new FieldSchema("Col_Text", FieldType.TEXT).setIndex(true).setEnableHighlighting(true)
    ));
    request.setIndexSchema(indexSchema);
    // Call the client to create the search index.
    client.createSearchIndex(request); 
}

FAQ

References

After you create a search index, you can select an appropriate query type to perform multi-dimensional data queries. Search index query types include Term query, Terms query, Match all query, Match query, Match phrase query, Prefix query, Suffix query, Range query, Wildcard query, Geo-query, Boolean query, Vector search, Nested query, and Exists query.
When you query data, you can perform sorting and pagination, highlighting, or collapse (deduplication) operations on the result set.
After you create a search index, you can manage it as required. Management operations include dynamically modifying a schema, Lifecycle management, listing search indexes, querying search index descriptions, and deleting a search index.
To perform data analytics, such as finding the maximum or minimum value, calculating a sum, or counting rows, you can use the aggregation feature or the SQL query feature.
To quickly export data without preserving the order of the entire result set, you can use the parallel scan feature.