Search indexes support multi-dimensional data queries and statistical analysis in big data scenarios based on inverted indexes and column stores. This topic describes how to use the search index feature by using Tablestore SDK for Python.
Manage indexes
The following table describes the management operations supported by search indexes.
Operation | Description |
Create a search index for a data table. | |
Query search indexes that are created for a table. | |
Update the time to live (TTL) of a search index. | |
Query the description of a search index, including the information about the fields in the search index and configurations of the search index. | |
Delete a search index that you no longer need. | |
Delete historical data or extend the retention period of data in a search index based on business requirements. |
Data types
In addition to basic data types, such as Long, Double, Boolean, Keyword, Text, Date, Geopoint, and Vector, search indexes support two special data types: Array and Nested. For more information, see Data types.
Data query
The following table describes the query types supported by search indexes. Select a query type based on your business requirements.
Feature | Query type | Description |
Basic query | This query matches all rows in a table to query the total number of rows in the table or return multiple random rows. | |
This query uses exact matches to retrieve data from a data table. A term query is similar to a query based on string matching. | ||
This query allows you to specify multiple keywords to search for the rows that match at least one of the keywords. A row of data is returned if the column value matches at least one of the keywords. Terms queries can be used in the same manner as the IN operator in SQL statements. | ||
This query retrieves data that contains the specified prefix from a data table. | ||
This query retrieves data that falls within the specified range. | ||
This query retrieves data that matches a string containing wildcard characters. | ||
This query is also called NULL query or NULL-value query, which is used in sparse data to determine whether a column of a row exists. | ||
This query collapses the result set based on a specific column to display data of the specified type only once in the returned results, ensuring the diversity of the result types. | ||
Geo queries are classified into the following types: geo-distance query, geo-bounding box query, and geo-polygon query.
| ||
This query retrieves the data in the child rows of nested fields. | ||
Data processing | You can predefine a sorting method when you create a search index or specify a sorting method when you use the search index to query data. This way, the rows that meet the query conditions are returned based on the order that you predefined or specified. If a large number of rows are included in the response, you can locate data by configuring the limit and offset parameters or by using tokens. | |
You can perform aggregation operations to obtain the minimum value, maximum value, sum, average, count and distinct count of rows, percentile statistics, and rows in each group. You can also perform aggregation operations to group results by field value, range, geographical location, filter, histogram, or date histogram, and perform nested queries. You can perform multiple aggregation operations for complex queries. | ||
Query based on a combination of subquery conditions | This query retrieves data from a data table based on a combination of subqueries. Tablestore returns the rows that match the subqueries. | |
Full-text search | This query uses approximate matches to retrieve data from a data table. | |
This query is similar to a match query, except that a match phrase query evaluates the positions of tokens. A row meets the query condition only if the order and positions of the tokens in the row match the specified order and positions. | ||
You can configure highlight parameters to highlight the query strings in the segments of the rows that meet the query conditions. | ||
Vector search | You can use the k-nearest neighbor (KNN) vector query feature to perform an approximate nearest neighbor search based on vectors. This way, you can find data items that have the highest similarity to the vector that you want to query in a large-scale dataset. |
Data export
If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner. For more information, see Parallel scan.