This topic describes queries based on primary keys, global secondary indexes, and search indexes in detail.

Primary key-based queries

Primary key-based queries use the GetRow and GetRange operations. The filter feature is used to query data in the attribute columns. However, this feature affects performance when a large amount of data exists and the entire table is scanned. In actual business scenarios, primary key-based queries cannot meet requirements. The filter feature affects efficiency when a large amount of data exists. Tablestore provides global secondary indexes and search indexes to address the deficiencies of primary key-based queries. This topic analyzes the differences between global secondary indexes and search indexes to help you choose the one that suits your business.

Global secondary index-based queries

After a global secondary index is created for a base table, an index table is created for the base table. The model of an index table is the same as that of the base table. The index table provides another sorting method for the base table: Filtering conditions are predefined to distribute data and improve the efficiency of data queries. The index table uses a primary key column, a primary key column range, and the range of prefixes of the primary key to query data. To ensure the uniqueness and consistency of primary key columns between the index table and base table, the global secondary index adds the primary key columns of the base table to the index table.

Search index-based queries

The underlying logic of search index adds inverted indexes and column-oriented storage to support multiple query methods such as Boolean query, wildcard query, geo-distance query, and full-text search. Search index provides multiple query conditions to query data in multiple dimensions.

Determine whether to use an index

  • An index is not required
    • There is no need to create an index if primary key-based queries can meet your business requirements.
    • To specify a range to filter data, use the filter feature if the amount of data within the range is small or queries per second (QPS) is low. In this case, you do not need to create an index.
    • When you perform a complex query for your business that features low QPS and is insensitive to latencies, you can use Data Lake Analytics (DLA) to access Tablestore and use SQL statements to query data in Tablestore.
  • Use global secondary index or search index

    A global secondary index is an index table that is similar to a base table. Global secondary index provides another mode that uses a primary key-based query method to distribute data. One index supports a type of query conditions. You can pre-define query conditions of this type to improve query efficiency. An index table and its base table can support the same amount of data. Additionally, hashing is also a consideration during the design of primary keys of global secondary index.

    A search index contains a combination of schemas. Each column supports schemas such as inverted indexes. You can sort query results based on one column. One search index can support multiple query methods. You do not need to create multiple search indexes for different query methods. Compared with global secondary index, search index also supports multiple query methods such as Boolean query, fuzzy query, full-index search, and geo query. Search indexes are less efficient than global secondary indexes when data is read in fixed order. The query efficiency of search index is related to the size of the entire table (inverted index-based length). When the data volume reaches at least 10 million rows of data, we recommend that you use RoutingKey to shard data. You can also use RoutingKey to reduce the amount of data involved in a query. In short, the data size affects query efficiency.