Search index has multiple efficient index schemas, which can help resolve complex query problems in big data scenarios.

Tables in Tablestore are typical distributed NoSQL data structures. Tables support storage and reads or writes of large-scale data such as monitoring data and log data. Tablestore only supported queries based on primary keys, such as reading data within a single row or specified range. Other types of queries, such as queries based on non-primary key columns and the bool query, were not available.

To resolve this issue, Tablestore introduced the search index feature. Search index supports multiple types of queries based on inverted indexes and column-oriented storage, including but not limited to:

  • Query based on non-primary key columns
  • Bool query
  • Full-text search
  • Query by geographical location
  • Query by prefix
  • Fuzzy query
  • Nested query

Index differences

Aside from queries based on primary keys in the base table, Tablestore provides two index schemas for accelerated queries: global secondary index and search index. The following table describes the differences among the three types of indexes.

Index type Description Scenario
Table A table is similar to a big map. Tables only support queries based on primary keys.
  • You can specify the complete primary key.
  • You can specify the prefix of primary key.
Global secondary index You can create one or more global secondary indexes and issue query requests against these indexes. This way, you can perform queries based on the primary key columns of these indexes.
  • You can determine the required columns in advance, and only a few columns are required.
  • You can specify the complete primary key or the prefix of primary key.
Search index Search index uses inverted indexes, Bkd-trees, and column-oriented storage for various query scenarios. All query and analysis scenarios that the table and the global secondary index do not support.

APIs

Search index provides the following APIs:
  • Common query API: Search
  • Data export API: ParallelScan
Most functions of the two operations are the same. However, to improve the performance and throughput, the ParallelScan API does not provide some functions of the Search operation. The following table describes a comparison of the two APIs.
API Description API operation Performance
Search An API that supports all functions of search index.
  • Query
    • Query based on non-primary key columns
    • Bool query
    • Query by geographical location
    • Full-text search
    • Fuzzy query
    • Query by prefix
    • Nested query
  • Deduplication
  • Sort
  • Aggregation
  • Row counts
None.
ComputeSplits+ ParallelScan An API used to export data for multiple concurrent queries. This API supports the query function of search index but not the analysis function such as sorting and aggregation. Compared with the Search API, the ParallelScan API provides better performance.
  • Query
    • Query based on non-primary key columns
    • Bool query
    • Query by geographical location
    • Full-text search
    • Fuzzy query
    • Query by prefix
    • Nested query
  • Multiple concurrent queries in a single request
The throughput of multiple queries in a single request is five times that of the Search API.

Precautions

  • Index synchronization

    If you have created a search index for a table, data is written to the table first. When the write is successful, a success message is immediately returned. At the same time, another asynchronous thread reads the newly written data from the table and writes the data to the search index. This is an asynchronous process.

    The asynchronous data synchronization between a table and search index does not affect the write performance of Tablestore. The indexing latency is within seconds, most of which are within 10 seconds. You can view the indexing latency in the Tablestore console in real time.

  • TTL

    You cannot create a search index within a table that has the time to live (TTL) parameter set.

  • max versions

    You cannot create a search index in a table where you have specified the max versions parameter.

    You can customize the timestamp whenever you write data to an attribute column that only allows a single version. If you first write a major version number and then a minor version number, the index of the major version number may be overwritten by the index of the minor version number.

Features

Search index can solve complex query problems in big data scenarios. Other systems such as databases and search engines can also solve data query problems. The following figure shows the differences between Tablestore and databases and search engines.Differences between Tablestore and databases and search engines

Tablestore can provide all features of databases and search engines, except for join operations, transactions, and relevance of search results. Tablestore also has high data reliability of databases and supports advanced queries of search engines. Therefore, Tablestore can replace the common architecture of database + search engine. If you do not need join operations, transactions, and relevance of search results, we recommend that you use search index of Tablestore.