Use search indexes to manage and query data - Tablestore

Search indexes support multi-dimensional data queries and statistical analysis in big data scenarios based on inverted indexes and column stores. This topic describes how to use the search index feature by using Tablestore SDK for Python.

Manage indexes

The following table describes the management operations supported by search indexes.

Operation	Description
Create a search index	Create a search index for a data table.
List search indexes	Query search indexes that are created for a table.
Update the configurations of a search index	Update the time to live (TTL) of a search index.
Query the description of a search index	Query the description of a search index, including the information about the fields in the search index and configurations of the search index.
Delete a search index	Delete a search index that you no longer need.
Specify the TTL of a search index	Delete historical data or extend the retention period of data in a search index based on business requirements.

Data types

In addition to basic data types, such as Long, Double, Boolean, Keyword, Text, Date, Geopoint, and Vector, search indexes support two special data types: Array and Nested. For more information, see Data types.

Data query

The following table describes the query types supported by search indexes. Select a query type based on your business requirements.

Feature	Query type	Description
Basic query	Match all query	This query matches all rows in a table to query the total number of rows in the table or return multiple random rows.
	Term query	This query uses exact matches to retrieve data from a data table. A term query is similar to a query based on string matching.
	Terms query	This query allows you to specify multiple keywords to search for the rows that match at least one of the keywords. A row of data is returned if the column value matches at least one of the keywords. Terms queries can be used in the same manner as the IN operator in SQL statements.
	Prefix query	This query retrieves data that contains the specified prefix from a data table.
	Range query	This query retrieves data that falls within the specified range.
	Wildcard query	This query retrieves data that matches a string containing wildcard characters.
	Exists query	This query is also called NULL query or NULL-value query, which is used in sparse data to determine whether a column of a row exists.
	Collapse (distinct)	This query collapses the result set based on a specific column to display data of the specified type only once in the returned results, ensuring the diversity of the result types.
	Geo query	Geo queries are classified into the following types: geo-distance query, geo-bounding box query, and geo-polygon query. Geo-distance query: This query allows you to specify a circular geographical area that is defined by a central point and a radius as a query condition. Tablestore returns the rows in which the value of the specified field falls within the circular geographical area. Geo-bounding box query: This query allows you to specify a rectangular geographical area as a query condition. Tablestore returns the rows in which the value of the specified field falls within the rectangular geographical area. Geo-polygon query: This query allows you to specify a polygon geographical area as a query condition. Tablestore returns the rows in which the value of the specified field falls within the polygon geographical area.
	Nested query	This query retrieves the data in the child rows of nested fields.
Data processing	Sorting and paging	You can predefine a sorting method when you create a search index or specify a sorting method when you use the search index to query data. This way, the rows that meet the query conditions are returned based on the order that you predefined or specified. If a large number of rows are included in the response, you can locate data by configuring the limit and offset parameters or by using tokens.
Data processing	Aggregation	You can perform aggregation operations to obtain the minimum value, maximum value, sum, average, count and distinct count of rows, percentile statistics, and rows in each group. You can also perform aggregation operations to group results by field value, range, geographical location, filter, histogram, or date histogram, and perform nested queries. You can perform multiple aggregation operations for complex queries.
Query based on a combination of subquery conditions	Boolean query	This query retrieves data from a data table based on a combination of subqueries. Tablestore returns the rows that match the subqueries.
Full-text search	Match query	This query uses approximate matches to retrieve data from a data table.
	Match phrase query	This query is similar to a match query, except that a match phrase query evaluates the positions of tokens. A row meets the query condition only if the order and positions of the tokens in the row match the specified order and positions.
	Highlight the query results	You can configure highlight parameters to highlight the query strings in the segments of the rows that meet the query conditions.
Vector search	KNN vector query	You can use the k-nearest neighbor (KNN) vector query feature to perform an approximate nearest neighbor search based on vectors. This way, you can find data items that have the highest similarity to the vector that you want to query in a large-scale dataset.

Data export

If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner. For more information, see Parallel scan.