Index optimization - AnalyticDB - Alibaba Cloud Documentation Center

AnalyticDB for PostgreSQL supports B-tree, bitmap, BRIN, GiST, and GIN indexes, but does not support hash indexes. Each type of indexes uses a specific algorithm and is suitable for specific query types. The B-tree index is the default index type that is ideal for the most common scenarios.

Note

AnalyticDB for PostgreSQL in Serverless mode allows you to create only B-tree indexes.

In most online transaction processing (OLTP) databases, indexes can significantly improve query efficiency. However, indexes must be chosen with caution in distributed data warehousing services such as AnalyticDB for PostgreSQL. In most scenarios, AnalyticDB for PostgreSQL is ideal for rapid sequential scans and can use sparse indexes to reduce I/O operations. AnalyticDB for PostgreSQL distributes data evenly across all compute nodes that each scan only data of their own. If an AnalyticDB for PostgreSQL instance has a large number of compute nodes, each node scans only a small amount of data. For BI report queries that return a large amount of data, indexes cannot accelerate queries.

When you use AnalyticDB for PostgreSQL, we recommend that you first try to execute queries without indexes. Indexes are more suitable for OLTP scenarios where only a single row or a small amount of data is returned. Indexes also incur additional database overheads such as more storage, write amplification, and index maintenance during data updates. Therefore, we recommend that you create indexes only when they can provide higher query efficiency than full table scans.

Note

AnalyticDB for PostgreSQL allows unique indexes only if index key columns are identical to or are a superset of distribution key columns. Append-optimized (AO) tables do not support unique indexes. Unique indexes can only be used within a single partition. They cannot be used across different partitions in a partitioned table.

Before you create indexes, take note of the following items:

Measure your query workload: Indexes can significantly improve the performance of queries that return a single row or a small dataset, such as transaction processing queries.
Determine compressed tables or range query scenarios where sparse indexes are suitable: Indexes can improve the query efficiency of compressed AO or append-optimized column-oriented (AOCO) tables. In range queries, sparse indexes can avoid loading of invalid data and optimize I/O performance. Less data loading means less data decompression, which reduces the CPU utilization during the decompression.
Do not create indexes on frequently updated columns: If data is frequently updated on specific columns, their indexes are also frequently updated and their data update performance significantly degrades.
Create B-tree indexes only on columns that have a high index selection rate: The index selection rate is an important metric for you to choose the index type. We recommend that you create B-tree indexes only on columns that have a high index selection rate. For example, if your table contains 1,000 rows of data and the column to be indexed contains 800 distinct values, the index selection rate for this column is 800/1000 = 0.8. B-tree indexes are suitable in this scenario. If the column to be indexed has unique values, we recommend that you select B-tree indexes.
Create bitmap indexes on columns that have a low index selection rate: Bitmap indexes are ideal for AnalyticDB for PostgreSQL in scenarios where a column has 1,000 to 100,000 distinct values. If sort keys are also used to aggregate data, bitmap indexes can significantly accelerate queries.
Create indexes on frequently joined columns: Indexes can be used together with JOIN operations to improve query performance.
Create indexes on columns that have frequent conditional queries: We recommend that you create indexes on columns that are frequently referenced in WHERE clauses.
Do not create multiple indexes on the same column: Multiple prefix indexes of the same type on the same column do not improve query performance and may cause additional overheads on write and update operations.
Use sort keys or CLUSTER statements to improve index efficiency: The efficiency of most indexes, especially BRIN and sparse indexes, is affected by the physical distribution of data. Appropriate physical distribution of data based on indexes can improve index performance. You can use compound or interleaved sort keys to distribute data in a more appropriate manner. If B-tree indexes are created for a row-oriented table, we recommend that you execute CLUSTER statements to implement physical sorting.
Create sparse indexes to handle large amounts of data: If your table contains large amounts of data and if you intend to retrieve less than 50% of data by using the <, <=, =, >=, and > operators, you can create sparse indexes to reduce the amount of invalid data that is loaded. Sparse index examples include the BRIN index and the metadata feature of AOCO tables.
Before you load large amounts of data to a table, we recommend that you delete indexes from the table. After data is loaded, you can recreate indexes in the table. This method is faster than updating indexes.
Do not create bitmap indexes on frequently updated tables: Bitmap indexes are ideal for data warehousing scenarios where data is queried but not updated. They are not applicable to OLTP scenarios where data is modified by large numbers of concurrent transactions.
Use appropriate expression indexes: An index key column can be a column of the table, or a function or scalar expression computed from one or more columns of the table. This feature helps obtain fast access to tables based on the computation results. For example, to optimize the SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith'; statement, you can execute the CREATE INDEX people_names ON people ((first_name || ' ' || last_name)); statement to create an expression index.