By digoal
The goal of optimization is to minimize computing, IO, filter, recheck, etc. For example, the following query is difficult to optimize at first glance:
select * from tbl where ts between x and y
order by z limit x;
Because ts and z are two fields, index (ts,z) or index (z,ts) cannot be used to completely perform indexing.
First, let’s understand the background. z is pk. z and ts have linear correlation (positive correlation or negative correlation), and both have directionality, such as increasing or decreasing. Therefore, this query can be modified.
select * from tbl where ts between x and y
and z?
order by z limit x;
n records are queried each time, and all records meeting the conditions of ts between x and y are successively filtered and processed.
Optimization Ideas:
1. Supplement the condition of z as the initial condition, so there is no need to filter too much.
select min(z), max(z) into id1,id2 from tbl where ts between x and y;
select * from tbl where ts between x and y
and z>=id1 and z<=id2
order by z desc limit x;
min(z) into id2
2. Each time a batch of records is found, the id is passed in as a new condition, so there is no need to filter too much every time.
select * from tbl where ts between x and y and z<id2 order by z desc limit x;
min(z) into id2
create table tbl (id int primary key, info text, crt_time timestamp);
insert into tbl select generate_series(1,100000000), md5(random()::text) , clock_timestamp();
Original SQL statement:
postgres=# select min(crt_time), max(crt_time) from tbl;
min | max
----------------------------+----------------------------
2020-07-10 14:04:46.600332 | 2020-07-10 14:08:06.737822
(1 row)
select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
order by id limit 100;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
order by id limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..9.22 rows=100 width=45) (actual time=4003.046..4003.071 rows=100 loops=1)
Output: id, info, crt_time
Buffers: shared hit=288650 read=63193
-> Index Scan using tbl_pkey on public.tbl (cost=0.57..3208769.85 rows=37078282 width=45) (actual time=4003.045..4003.062 rows=100 loops=1)
Output: id, info, crt_time
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Rows Removed by Filter: 29130419
Buffers: shared hit=288650 read=63193
Planning Time: 0.093 ms
Execution Time: 4003.096 ms
(10 rows)
-- 大量filter
Optimize the initial SQL:
select min(id),max(id) from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00';
min | max
----------+----------
29130420 | 66423729
(1 row)
select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id>=29130420 and id<=66423729
order by id desc limit 100;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id>=29130420 and id<=66423729
order by id desc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..10.57 rows=100 width=45) (actual time=0.022..0.051 rows=100 loops=1)
Output: id, info, crt_time
Buffers: shared hit=6
-> Index Scan Backward using tbl_pkey on public.tbl (cost=0.57..1374883.82 rows=13745329 width=45) (actual time=0.021..0.043 rows=100 loops=1)
Output: id, info, crt_time
Index Cond: ((tbl.id >= 29130420) AND (tbl.id <= 66423729))
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Buffers: shared hit=6
Planning Time: 0.121 ms
Execution Time: 0.068 ms
(10 rows)
Optimize SQL statements for batch query:
select * from tbl where
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id<66423630
order by id desc limit 100;
select * from tbl where
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id<66423530
order by id desc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..9.90 rows=100 width=45) (actual time=0.042..0.074 rows=100 loops=1)
Output: id, info, crt_time
Buffers: shared hit=8
-> Index Scan Backward using tbl_pkey on public.tbl (cost=0.57..2298576.97 rows=24640916 width=45) (actual time=0.041..0.065 rows=100 loops=1)
Output: id, info, crt_time
Index Cond: (tbl.id < 66423530)
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Buffers: shared hit=8
Planning Time: 0.107 ms
Execution Time: 0.097 ms
(10 rows)
It has 40000 times better performance.
Use Mixed Storage of Rows and Columns in PostgreSQL Zedstore
ApsaraDB - May 18, 2022
Alibaba Cloud MaxCompute - February 18, 2024
digoal - December 23, 2020
digoal - December 11, 2019
zhuodao - July 30, 2020
digoal - January 19, 2021
PolarDB for PostgreSQL
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn More
AnalyticDB for PostgreSQL
An online MPP warehousing service based on the Greenplum Database open source program
Learn More
ApsaraDB RDS for PostgreSQL
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn More
Database for FinTech Solution
Leverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal