By digoal
create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();
create index idx_c3 on aa using gin (c3);
create index idx_c4 on aa (c4);
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
The statistics are not generated, so the database selects an incorrect execution plan (for example, the GIN index is not used) and uses the c4 sorting index, resulting in a slowdown.
postgres=> select * from pg_stats where attname='c4' and tablename='aa';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_
freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------
------+----------------------
(0 rows)
postgres=> select * from pg_stats where attname='c3' and tablename='aa';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_
freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------
------+----------------------
(0 rows)
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2039.27..2102.98 rows=10 width=52) (actual time=2382.421..2382.421 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=130422 read=27325
I/O Timings: read=116.959
-> Index Scan Backward using idx_c4 on public.aa (cost=0.43..318568.03 rows=50000 width=52) (actual time=2382.418..2382.419 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Filter: (aa.c3 @> '{-1}'::integer[])
Rows Removed by Filter: 10000000
Buffers: shared hit=130422 read=27325
I/O Timings: read=116.959
Planning Time: 2.304 ms
Execution Time: 2382.443 ms
(12 rows)
After statistics are generated, it is perfect:
postgres=> vacuum analyze aa;
VACUUM
-- 查询pg_stats已经有统计信息了
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2857.15..2857.17 rows=10 width=49) (actual time=0.017..0.017 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=3
-> Sort (cost=2856.35..2862.60 rows=2500 width=49) (actual time=0.016..0.016 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3
-> Bitmap Heap Scan on public.aa (cost=28.17..2739.27 rows=2500 width=49) (actual time=0.013..0.013 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Recheck Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
-> Bitmap Index Scan on idx_c3 (cost=0.00..27.55 rows=2500 width=0) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Planning Time: 0.197 ms
Execution Time: 0.045 ms
(17 rows)
There is also a more powerful method by forcibly using the index that orders by c4. In other words, forcibly execute a where condition in the c4 index. Then, common queries without the where condition cannot use this index.
create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();
create index idx_c3 on aa using gin (c3);
create index idx_c4 on aa (c4) where c4 >= '1970-01-01';
-- 即使不收集统计信息, 也不使用c4索引了
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43941.49..43942.69 rows=10 width=52) (actual time=22.029..22.029 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=3
-> Gather Merge (cost=43903.17..49889.91 rows=50000 width=52) (actual time=22.027..27.849 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=31
-> Sort (cost=42903.12..42934.37 rows=12500 width=52) (actual time=0.070..0.070 rows=0 loops=5)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31
Worker 0: actual time=0.047..0.047 rows=0 loops=1
Buffers: shared hit=7
Worker 1: actual time=0.050..0.050 rows=0 loops=1
Buffers: shared hit=7
Worker 2: actual time=0.049..0.050 rows=0 loops=1
Buffers: shared hit=7
Worker 3: actual time=0.058..0.058 rows=0 loops=1
Buffers: shared hit=7
-> Parallel Bitmap Heap Scan on public.aa (cost=409.50..42317.72 rows=12500 width=52) (actual time=0.040..0.040 rows=0 loops=5)
Output: id, c1, c2, c3, c4
Recheck Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Worker 0: actual time=0.013..0.013 rows=0 loops=1
Worker 1: actual time=0.013..0.013 rows=0 loops=1
Worker 2: actual time=0.012..0.012 rows=0 loops=1
Worker 3: actual time=0.020..0.020 rows=0 loops=1
-> Bitmap Index Scan on idx_c3 (cost=0.00..397.00 rows=50000 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Planning Time: 0.162 ms
Execution Time: 27.878 ms
(38 rows)
-- 带where条件的可以走索引, 不带不走
explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17.09..17.61 rows=10 width=52) (actual time=0.107..0.110 rows=10 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=5 read=4
I/O Timings: read=0.011
-> Index Scan Backward using idx_c4 on public.aa (cost=0.43..173457.83 rows=3333333 width=52) (actual time=0.021..0.084 rows=330 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=5 read=4
I/O Timings: read=0.011
Planning Time: 0.163 ms
Execution Time: 0.124 ms
(10 rows)
explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=217794.43..217795.63 rows=10 width=49) (actual time=686.640..686.644 rows=10 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=19346
-> Gather Merge (cost=217755.73..1426997.81 rows=9999950 width=49) (actual time=686.541..706.945 rows=330 loops=1)
Output: id, c1, c2, c3, c4
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=103313
-> Sort (cost=216755.65..221755.63 rows=1999990 width=49) (actual time=669.873..669.897 rows=291 loops=6)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: top-N heapsort Memory: 117kB
Worker 0: Sort Method: top-N heapsort Memory: 117kB
Worker 1: Sort Method: top-N heapsort Memory: 117kB
Worker 2: Sort Method: top-N heapsort Memory: 117kB
Worker 3: Sort Method: top-N heapsort Memory: 117kB
Worker 4: Sort Method: top-N heapsort Memory: 117kB
Buffers: shared hit=103313
Worker 0: actual time=653.603..653.630 rows=330 loops=1
Buffers: shared hit=16563
Worker 1: actual time=662.200..662.228 rows=330 loops=1
Buffers: shared hit=16725
Worker 2: actual time=667.136..667.163 rows=330 loops=1
Buffers: shared hit=16757
Worker 3: actual time=672.997..673.024 rows=330 loops=1
Buffers: shared hit=16895
Worker 4: actual time=676.989..677.017 rows=330 loops=1
Buffers: shared hit=17027
-> Parallel Seq Scan on public.aa (cost=0.00..123092.90 rows=1999990 width=49) (actual time=0.014..266.190 rows=1666667 loops=6)
Output: id, c1, c2, c3, c4
Buffers: shared hit=103093
Worker 0: actual time=0.011..265.481 rows=1602052 loops=1
Buffers: shared hit=16516
Worker 1: actual time=0.014..269.871 rows=1617766 loops=1
Buffers: shared hit=16678
Worker 2: actual time=0.013..273.440 rows=1620849 loops=1
Buffers: shared hit=16710
Worker 3: actual time=0.012..275.377 rows=1634256 loops=1
Buffers: shared hit=16848
Worker 4: actual time=0.026..276.290 rows=1648515 loops=1
Buffers: shared hit=16995
Planning Time: 0.161 ms
Execution Time: 707.004 ms
(43 rows)
Alibaba Clouder - January 9, 2018
digoal - September 6, 2019
ApsaraDB - August 8, 2023
digoal - June 26, 2019
ApsaraDB - July 4, 2022
digoal - June 26, 2019
Vector Retrieval Service for Milvus
A cloud-native vector search engine that is 100% compatible with open-source Milvus, extensively optimized in performance, stability, availability, and management capabilities.
Learn More
PolarDB for PostgreSQL
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn More
AnalyticDB for PostgreSQL
An online MPP warehousing service based on the Greenplum Database open source program
Learn More
ApsaraDB RDS for PostgreSQL
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by digoal