AnalyticDB for PostgreSQL supports B-tree, bitmap, BRIN, GiST, and GIN indexes, but does not support hash indexes. Each type of indexes uses a specific algorithm and is applicable to specific query types. The B-tree index is the default index type that is ideal for the most common scenarios.
In most traditional transaction processing databases, indexes can significantly improve query efficiency. However, indexes must be chosen with caution in distributed data warehousing services such as AnalyticDB for PostgreSQL. In most scenarios, AnalyticDB for PostgreSQL is applicable to rapid sequential scans and can use sparse indexes to perform I/O operations that reduce data. AnalyticDB for PostgreSQL distributes data evenly across all compute nodes that each scan only data of their own. If an AnalyticDB for PostgreSQL instance has a large number of compute nodes, each node scans only a small amount of data. For BI report query scenarios where a large dataset is returned, indexes cannot accelerate queries.
When you use AnalyticDB for PostgreSQL, we recommend that you first try to execute queries without indexes. Indexes are more suitable for transaction processing scenarios where only a single row or a small dataset is returned. Indexes also incur additional database overheads such as more storage, write amplification, and index maintenance during data updates. Therefore, we recommend that you create indexes only when they can provide higher query efficiency than full table scans.
Before you create indexes, take note of the following items:
- Measure your query workload: Indexes can significantly improve the performance of queries that return a single row or a small dataset, such as transaction processing queries.
- Determine compressed tables or range query scenarios where sparse indexes are suitable: Indexes can improve the query efficiency of compressed AO or append-optimized column-oriented (AOCO) tables. In range queries, sparse indexes can avoid loading of invalid data and optimize I/O performance. Less data loading means less data decompression, which reduces the CPU utilization during the decompression.
- Do not create indexes on frequently updated columns: If data is frequently updated on specific columns, their indexes are also frequently updated and their data update performance significantly degrades.
- Create B-tree indexes only on columns that have a high index selection rate: The index selection rate is an important metric for you to choose the index type. We recommend that you create B-tree indexes only on columns that have a high index selection rate. For example, if your table contains 1,000 rows of data and the column to be indexed contains 800 distinct values, the index selection rate for this column is 800/1000 = 0.8. B-tree indexes are suitable in this scenario. If the column to be indexed has unique values, B-tree indexes are the best choice.
- Create bitmap indexes on columns that have a low index selection rate: Bitmap indexes are ideal for AnalyticDB for PostgreSQL in scenarios where a column has 1,000 to 100,000 distinct values. If sort keys are also used to aggregate data, bitmap indexes can significantly accelerate queries.
- Create indexes on frequently joined columns: Indexes can be used together with JOIN operations to improve query performance.
- Create indexes on columns that have frequent conditional queries: We recommend that you create indexes on columns that are frequently referenced in WHERE clauses.
- Do not create multiple indexes on the same column: Multiple prefix indexes of the same type on the same column do not improve query performance and may cause additional overheads on write and update operations.
- Use sort keys or CLUSTER statements to improve index efficiency: The efficiency of most indexes, especially BRIN and sparse indexes, is affected by the physical distribution of data. Appropriate physical distribution of data based on indexes can improve index performance. You can use compound or interleaved sort keys to distribute data in a more appropriate manner. If B-tree indexes are created for a row-oriented table, we recommend that you execute CLUSTER statements to implement physical sorting.
- Create sparse indexes in large dataset scenarios: If your table contains large amounts of data and if you intend to retrieve less than 50% of data by using the <, <=, =, >=, and > operators, you can create sparse indexes to reduce the amount of invalid data that is loaded. Sparse index examples include the BRIN index and the metadata feature of AOCO tables.
- Delete indexes before you load large amounts of data: Before you load large amounts of data to a table, we recommend that you delete indexes from the table. After data is loaded, you can recreate indexes in the table. This method is faster than updating indexes.
- Do not create bitmap indexes on frequently updated tables: Bitmap indexes are ideal for data warehousing scenarios where data is queried but not updated. They are not applicable to online transaction processing (OLTP) scenarios where data is modified by large numbers of concurrent transactions.
- Use appropriate expression indexes: An index column can be a column of the table, or a function or scalar expression
computed from one or more columns of the table. This feature helps obtain fast access
to tables based on the computation results. For example, to optimize queries of
SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';, you can execute
CREATE INDEX people_names ON people ((first_name || ' ' || last_name));to create an expression index.