Condition Index (Partial Index) in PostgreSQL and Use of Certain Indexes in Bypass

By digoal

Background

create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);  
  
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();  
  
create index idx_c3 on aa using gin (c3);  
  
create index idx_c4 on aa (c4);  
  
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;

The statistics are not generated, so the database selects an incorrect execution plan (for example, the GIN index is not used) and uses the c4 sorting index, resulting in a slowdown.

postgres=> select * from pg_stats where attname='c4' and tablename='aa';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_  
freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------  
------+----------------------  
(0 rows)  
  
postgres=> select * from pg_stats where attname='c3' and tablename='aa';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_  
freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------  
------+----------------------  
(0 rows)  
  
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;  
  
                                                                   QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=2039.27..2102.98 rows=10 width=52) (actual time=2382.421..2382.421 rows=0 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=130422 read=27325  
   I/O Timings: read=116.959  
   ->  Index Scan Backward using idx_c4 on public.aa  (cost=0.43..318568.03 rows=50000 width=52) (actual time=2382.418..2382.419 rows=0 loops=1)  
         Output: id, c1, c2, c3, c4  
         Filter: (aa.c3 @> '{-1}'::integer[])  
         Rows Removed by Filter: 10000000  
         Buffers: shared hit=130422 read=27325  
         I/O Timings: read=116.959  
 Planning Time: 2.304 ms  
 Execution Time: 2382.443 ms  
(12 rows)

After statistics are generated, it is perfect:

postgres=> vacuum analyze aa;  
VACUUM  
  
-- 查询pg_stats已经有统计信息了  
  
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;  
                                                          QUERY PLAN                                                             
-------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=2857.15..2857.17 rows=10 width=49) (actual time=0.017..0.017 rows=0 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=3  
   ->  Sort  (cost=2856.35..2862.60 rows=2500 width=49) (actual time=0.016..0.016 rows=0 loops=1)  
         Output: id, c1, c2, c3, c4  
         Sort Key: aa.c4 DESC  
         Sort Method: quicksort  Memory: 25kB  
         Buffers: shared hit=3  
         ->  Bitmap Heap Scan on public.aa  (cost=28.17..2739.27 rows=2500 width=49) (actual time=0.013..0.013 rows=0 loops=1)  
               Output: id, c1, c2, c3, c4  
               Recheck Cond: (aa.c3 @> '{-1}'::integer[])  
               Buffers: shared hit=3  
               ->  Bitmap Index Scan on idx_c3  (cost=0.00..27.55 rows=2500 width=0) (actual time=0.011..0.011 rows=0 loops=1)  
                     Index Cond: (aa.c3 @> '{-1}'::integer[])  
                     Buffers: shared hit=3  
 Planning Time: 0.197 ms  
 Execution Time: 0.045 ms  
(17 rows)

There is also a more powerful method by forcibly using the index that orders by c4. In other words, forcibly execute a where condition in the c4 index. Then, common queries without the where condition cannot use this index.

create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);  
  
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();  
  
create index idx_c3 on aa using gin (c3);  
  
create index idx_c4 on aa (c4) where c4 >= '1970-01-01';   
  
-- 即使不收集统计信息, 也不使用c4索引了  
  
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;   
  
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;   
                                                                   QUERY PLAN                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=43941.49..43942.69 rows=10 width=52) (actual time=22.029..22.029 rows=0 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=3  
   ->  Gather Merge  (cost=43903.17..49889.91 rows=50000 width=52) (actual time=22.027..27.849 rows=0 loops=1)  
         Output: id, c1, c2, c3, c4  
         Workers Planned: 4  
         Workers Launched: 4  
         Buffers: shared hit=31  
         ->  Sort  (cost=42903.12..42934.37 rows=12500 width=52) (actual time=0.070..0.070 rows=0 loops=5)  
               Output: id, c1, c2, c3, c4  
               Sort Key: aa.c4 DESC  
               Sort Method: quicksort  Memory: 25kB  
               Worker 0:  Sort Method: quicksort  Memory: 25kB  
               Worker 1:  Sort Method: quicksort  Memory: 25kB  
               Worker 2:  Sort Method: quicksort  Memory: 25kB  
               Worker 3:  Sort Method: quicksort  Memory: 25kB  
               Buffers: shared hit=31  
               Worker 0: actual time=0.047..0.047 rows=0 loops=1  
                 Buffers: shared hit=7  
               Worker 1: actual time=0.050..0.050 rows=0 loops=1  
                 Buffers: shared hit=7  
               Worker 2: actual time=0.049..0.050 rows=0 loops=1  
                 Buffers: shared hit=7  
               Worker 3: actual time=0.058..0.058 rows=0 loops=1  
                 Buffers: shared hit=7  
               ->  Parallel Bitmap Heap Scan on public.aa  (cost=409.50..42317.72 rows=12500 width=52) (actual time=0.040..0.040 rows=0 loops=5)  
                     Output: id, c1, c2, c3, c4  
                     Recheck Cond: (aa.c3 @> '{-1}'::integer[])  
                     Buffers: shared hit=3  
                     Worker 0: actual time=0.013..0.013 rows=0 loops=1  
                     Worker 1: actual time=0.013..0.013 rows=0 loops=1  
                     Worker 2: actual time=0.012..0.012 rows=0 loops=1  
                     Worker 3: actual time=0.020..0.020 rows=0 loops=1  
                     ->  Bitmap Index Scan on idx_c3  (cost=0.00..397.00 rows=50000 width=0) (actual time=0.008..0.008 rows=0 loops=1)  
                           Index Cond: (aa.c3 @> '{-1}'::integer[])  
                           Buffers: shared hit=3  
 Planning Time: 0.162 ms  
 Execution Time: 27.878 ms  
(38 rows)  
  
-- 带where条件的可以走索引, 不带不走  
explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;   
  
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;   
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=17.09..17.61 rows=10 width=52) (actual time=0.107..0.110 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=5 read=4  
   I/O Timings: read=0.011  
   ->  Index Scan Backward using idx_c4 on public.aa  (cost=0.43..173457.83 rows=3333333 width=52) (actual time=0.021..0.084 rows=330 loops=1)  
         Output: id, c1, c2, c3, c4  
         Buffers: shared hit=5 read=4  
         I/O Timings: read=0.011  
 Planning Time: 0.163 ms  
 Execution Time: 0.124 ms  
(10 rows)  
  
explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;   
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;   
                                                                    QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=217794.43..217795.63 rows=10 width=49) (actual time=686.640..686.644 rows=10 loops=1)  
   Output: id, c1, c2, c3, c4  
   Buffers: shared hit=19346  
   ->  Gather Merge  (cost=217755.73..1426997.81 rows=9999950 width=49) (actual time=686.541..706.945 rows=330 loops=1)  
         Output: id, c1, c2, c3, c4  
         Workers Planned: 5  
         Workers Launched: 5  
         Buffers: shared hit=103313  
         ->  Sort  (cost=216755.65..221755.63 rows=1999990 width=49) (actual time=669.873..669.897 rows=291 loops=6)  
               Output: id, c1, c2, c3, c4  
               Sort Key: aa.c4 DESC  
               Sort Method: top-N heapsort  Memory: 117kB  
               Worker 0:  Sort Method: top-N heapsort  Memory: 117kB  
               Worker 1:  Sort Method: top-N heapsort  Memory: 117kB  
               Worker 2:  Sort Method: top-N heapsort  Memory: 117kB  
               Worker 3:  Sort Method: top-N heapsort  Memory: 117kB  
               Worker 4:  Sort Method: top-N heapsort  Memory: 117kB  
               Buffers: shared hit=103313  
               Worker 0: actual time=653.603..653.630 rows=330 loops=1  
                 Buffers: shared hit=16563  
               Worker 1: actual time=662.200..662.228 rows=330 loops=1  
                 Buffers: shared hit=16725  
               Worker 2: actual time=667.136..667.163 rows=330 loops=1  
                 Buffers: shared hit=16757  
               Worker 3: actual time=672.997..673.024 rows=330 loops=1  
                 Buffers: shared hit=16895  
               Worker 4: actual time=676.989..677.017 rows=330 loops=1  
                 Buffers: shared hit=17027  
               ->  Parallel Seq Scan on public.aa  (cost=0.00..123092.90 rows=1999990 width=49) (actual time=0.014..266.190 rows=1666667 loops=6)  
                     Output: id, c1, c2, c3, c4  
                     Buffers: shared hit=103093  
                     Worker 0: actual time=0.011..265.481 rows=1602052 loops=1  
                       Buffers: shared hit=16516  
                     Worker 1: actual time=0.014..269.871 rows=1617766 loops=1  
                       Buffers: shared hit=16678  
                     Worker 2: actual time=0.013..273.440 rows=1620849 loops=1  
                       Buffers: shared hit=16710  
                     Worker 3: actual time=0.012..275.377 rows=1634256 loops=1  
                       Buffers: shared hit=16848  
                     Worker 4: actual time=0.026..276.290 rows=1648515 loops=1  
                       Buffers: shared hit=16995  
 Planning Time: 0.161 ms  
 Execution Time: 707.004 ms  
(43 rows)

Community

Condition Index (Partial Index) in PostgreSQL and Use of Certain Indexes in Bypass

Background

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

Vector Retrieval Service for Milvus

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL