What is a search index? - Tablestore - Alibaba Cloud Documentation Center

Search indexes are used for multi-dimensional data queries and statistical analysis in big data scenarios based on inverted indexes and column stores. Tablestore provides the search index feature to meet your data analysis requirements such as obtaining extreme values, counting rows, and grouping data. If your business requires multi-dimensional queries such as queries based on non-primary key columns, Boolean queries, and fuzzy queries, you can create a search index based on the fields that you need. Then, you can query and analyze the data by using the search index.

Background information

Search indexes can solve complex query problems in big data scenarios. Other systems such as databases and search engines can also solve data query problems. The following figure shows the differences among Tablestore, databases, and search engines.

Tablestore can provide all features of databases and search engines except JOIN operations, transactions, and relevance of search results. Tablestore also has high data reliability of databases and supports advanced queries of search engines. Therefore, Tablestore can be used to replace the common architecture that consists of databases and search engines. If you do not need JOIN operations, transactions, or relevance of search results, we recommend that you use the search index feature of Tablestore.

fig_20220325

Introduction

Search indexes are used for multi-dimensional data queries and statistical analysis in big data scenarios based on inverted indexes and column stores. Search indexes support various query features, including query based on non-primary key columns, full-text query, prefix query, fuzzy query, Boolean query, nested query, and geo query. Search indexes also support multiple aggregation operations. You can perform aggregation operations to obtain the maximum and minimum values, count and distinct count of rows, sums, averages, and percentiles, group results by specified conditions, and display data as histograms.

The following figure shows how inverted indexes and column stores are used for search indexes, as well as the structure of a multi-dimensional spatial index.

Compared with indexes of traditional database services such as MySQL, the search index feature of Tablestore is not subject to the leftmost matching principle. Therefore, the search index feature can be used in more scenarios. In most cases, only one search index is required for a data table. For example, a data table about student information contains the following columns: student name, ID, gender, grade, class, and home address. If you want to query the data in the data table by specifying a combination of conditions, such as querying students named Zhang San in Grade Three, querying male students who live within one kilometer of the school, and querying students in Class Two, Grade Three from the specified residential community, you can create a search index and add these columns to the search index.

Aside from queries based on primary key columns in data tables, Tablestore provides the following two index schemas for accelerated data queries: secondary index and search index. The following table describes the differences among the three types of query methods.

Query method	Description	Scenario
Query based on primary key columns in data tables	A data table is similar to a large map. Data tables support queries based only on primary key columns.	Primary key-based queries are suitable for scenarios in which the values of all primary key columns or the primary key prefix can be determined.
Query by using a secondary index	You can create one or more index tables for a data table and perform queries by using the primary key columns of the index tables.	Secondary indexes are suitable for scenarios in which the columns to be queried can be determined, the number of columns to be queried is small, and the values of all primary key columns or the primary key prefix can be determined.
Query by using a search index	Search indexes use inverted indexes, Bkd-trees, and column stores for various query scenarios.	Search indexes are suitable for all query and analysis scenarios in which queries based on the primary key columns and secondary indexes of data tables cannot meet your business requirements. For example, you can perform non-primary key column conditional queries, Boolean queries, relational queries, full-text queries, geo queries, prefix queries, fuzzy queries, nested queries, and NULL queries and aggregation operations by using search indexes.

Scenarios

Search indexes can be widely used in various application systems for data query and analysis. The following table describes partial scenarios in which search indexes can be used.

Application system	Scenario
E-commerce platform	You can use search indexes on e-commerce platforms to classify products and filter attributes. This facilitates product search and filtering for customers.
Social media application	You can use search indexes in social media applications to query the follower and friend connections between users and implement the recommendation and matching features based on the interests of users.
Log analysis	You can use search indexes to query logs based on conditions such as keywords and time ranges. This allows you to identify issues in an efficient manner and analyze log data.
LinkAnalytics (LA)	You can use search indexes to query and analyze device data in Internet of things (IoT) scenarios. For example, you can filter device data and collect statistics based on device types and geographic locations.
Application performance monitoring	You can use search indexes to aggregate and query metric data, which is essential for monitoring application performance. For example, you can filter and aggregate data by time range and application name.
Location-based service	You can use search indexes to query geographic locations and search for nearby points of interest such as stores, attractions, and services.
Text search engine	You can use search indexes for full-text queries and relevance-based sorts in a text search engine. This way, you can search for information such as documents and articles in an efficient manner.

Features

Search indexes support the following core query features: query based on primary key columns or non-primary key columns, Boolean query, geo query, full-text query, fuzzy query, NULL query, and nested query. For more information, see Features.

Limits

For more information, see Search index limits.

Usage notes

Important

To create a search index for a data table, you do not need to configure predefined columns in the data table.

Applicable model
Search indexes are applicable only to the Wide Column model.
Index synchronization
After a search index is created for a data table, data is written to the data table first. After the data is written to the data table, a success message is returned. When the system writes data to the data table, another asynchronous thread reads the newly written data from the data table and writes the data to the search index. The write performance of Tablestore is not affected when data is being asynchronously synchronized from a data table to a search index.
In most cases, the latency for synchronizing data to a search index is within 3 seconds. You can view the latency in real time in the Tablestore console.

Time to live (TTL)

If the UpdateRow operation is disabled for a data table, you can use the TTL feature of the search index that is created for the data table. For more information, see TTL of search indexes.

If you want to retain data only for a period of time and the time field does not need to be updated, you can implement the TTL feature by partitioning a data table into several data tables based on the time field. The following table describes the principle, rule, and benefits of table partitioning by time.

Item	Table partitioning by time
Principle	Partition a data table based on fixed periods of time, such as daily, weekly, monthly, or annually. Then, create a search index for each partitioned table. This way, you can retain tables for the specified periods of time based on your business requirements. For example, to retain data for a six-month period, you can store the data for each month in a data table. Label the data tables in sequence from table_1 to table_6 and create a search index for each data table. Make sure that each data table and its search index exclusively store data for their respective month. To implement the TTL feature, you need to only delete data tables and their associated search indexes that are retained for longer than six months. When you query data by using a search index, you need to only query a specific data table if the data table contains all the data that falls within your desired time range. If the data that falls within your desired time range spans multiple data tables, you need to query all these data tables and merge the query results.
Rule	The size of a single search index can be up to 50 billion rows. To ensure the optimal query performance, we recommend that you limit the size of a single search index to 20 billion rows or fewer.
Benefits	You can adjust the data storage duration based on the number of data tables retained. Query performance is directly proportional to data volumes. After a data table is partitioned into multiple data tables, the data volume of each data table has an upper limit. This ensures better query performance and avoids high query latencies or timeouts.

Max versions
You cannot create a search index for a data table for which you have specified the max versions parameter.
You can specify the timestamp when you write data to a column that allows only a single version. If you write data with a greater version number first and then write data with a smaller version number, the data with the greater version number may be overwritten by data with the smaller version number.
The results of the Search and ParallelScan requests may not include the timestamp attribute.

API operations

Tablestore provides API operations to manage search indexes and implement the features of search indexes. You can call the Search or ParallelScan operation to implement the features of search indexes. Most features that are provided by the two API operations are the same. However, to improve the performance and throughput, the ParallelScan operation does not provide some features of the Search operation.

Category	Operation	Description
Management operations	CreateSearchIndex	Creates a search index.
	DescribeSearchIndex	Queries the details about a search index.
	ListSearchIndex	Queries a list of search indexes.
	DeleteSearchIndex	Deletes a search index.
Query operations	Search	Supports all features of search indexes. You can call the Search operation to query data by using all supported query features and analyze data by performing sorting and aggregation operations. The query results are returned based on the specified order. Query features: query based on non-primary key columns, full-text query, prefix query, exists query, fuzzy query, Boolean query, nested query, and geo query. Collapse (distinct) Sorting Aggregation Match all query
Query operations	ParallelScan	Exports data in parallel. You can call the ParallelScan operation to query data by using all supported query features. However, to ensure faster retrieval of query results, the ParallelScan operation does not support analysis features such as sorting and aggregation. The ParallelScan operation offers superior query performance compared to the Search operation. When the ParallelScan operation processes a single query request at a time, its throughput is five times that of the Search operation. Query features: query based on non-primary key columns, full-text query, prefix query, exists query, fuzzy query, Boolean query, nested query, and geo query. Multiple concurrent queries in a single request If you call this operation, you must call the ComputeSplits operation to query the maximum number of parallel scan tasks that are supported by a single ParallelScan request.

Procedure

Step	Operation	Description
1	Create a search index for a data table	After you create a search index for a data table, you can query data in the data table based on the fields that are used to create the search index.
2	Use the search index to query data	The following query features are provided: match all query, match query, match phrase query, term query, terms query, prefix query, range query, wildcard query, Boolean query, nested query, geo-distance query, geo-bounding box query, geo-polygon query, exists query, and collapse (distinct). Use query features based on your business requirements. If you use a search index to query data, the field values can be tokenized into multiple tokens based on the tokenization method that you specify. The rows that meet the query conditions can be returned by order and page based on the sorting and paging methods that you specify. For more information, see Tokenization and Sorting and paging.
3	Use the search index to analyze data	To analyze data in the data table by using the search index, you can perform aggregation operations to obtain the minimum value, maximum value, sum, average value, count and distinct count of rows, and percentile statistics. You can also perform aggregation operations to group results by field value, range, geographical location, filter, histogram, or date histogram.
4	Use the search index to export data	If you do not have requirements on the order of query results, you can use the parallel scan feature to obtain query results in an efficient manner.

Usage method

You can use search indexes by using the Tablestore console, Tablestore CLI, and Tablestore SDKs.

Billing

The billable items of a search index include the read throughput, data storage size, and outbound traffic over the Internet of the search index. The read throughput comprises the reserved read throughput and the additional read throughput that exceeds the reserved read throughput. You are charged for the additional read throughput on a pay-as-you-go basis. For more information, see Billable items of search indexes.

FAQ

References

Tablestore allows you to query and analyze data by using the SQL query feature. For more information, see Overview.
Note
You can also use compute engines such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, and Realtime Compute for Apache Flink to analyze data in Tablestore. For more information, see Overview.
Tablestore provides examples of using search indexes in the following scenarios: e-commerce orders, store searches, geo-fence, and intelligent metadata. For more information, visit the Scenario Demos page in the Tablestore console.