# Community

Blog Events Webinars Tutorials Forum
×
Community Blog PostgreSQL Optimized Case – Where a Field Range Order by B Field Sort Limit X

# PostgreSQL Optimized Case – Where a Field Range Order by B Field Sort Limit X

This short article explains a PostgreSQL optimization case.

By digoal

## Background Information

The goal of optimization is to minimize computing, IO, filter, recheck, etc. For example, the following query is difficult to optimize at first glance:

``````select * from tbl where ts between x and y
order by z limit x;  ``````
1. The query condition of the ts field is a wide range with a large scale, which gives many results. A large number of sort records are displayed sorting by z.
2. Records that meet the ts criteria are returned in the z-field, and qualified results are ranked behind. Therefore, a large number of filters are required.

Because ts and z are two fields, index (ts,z) or index (z,ts) cannot be used to completely perform indexing.

## Optimization

First, let’s understand the background. z is pk. z and ts have linear correlation (positive correlation or negative correlation), and both have directionality, such as increasing or decreasing. Therefore, this query can be modified.

``````select * from tbl where ts between x and y
and z?
order by z limit x;``````

n records are queried each time, and all records meeting the conditions of `ts between x and y` are successively filtered and processed.

Optimization Ideas:

1.  Supplement the condition of z as the initial condition, so there is no need to filter too much.

``````select min(z), max(z) into id1,id2 from tbl where ts between x and y;

select * from tbl where ts between x and y
and z>=id1 and z<=id2
order by z desc limit x;

min(z) into id2  ``````

2.  Each time a batch of records is found, the id is passed in as a new condition, so there is no need to filter too much every time.

``````select * from tbl where ts between x and y and z<id2 order by z desc limit x;

min(z) into id2  ``````

## Example

``````create table tbl (id int primary key, info text, crt_time timestamp);
insert into tbl select generate_series(1,100000000), md5(random()::text) , clock_timestamp();  ``````

Original SQL statement:

``````postgres=# select min(crt_time), max(crt_time) from tbl;
min             |            max
----------------------------+----------------------------
2020-07-10 14:04:46.600332 | 2020-07-10 14:08:06.737822
(1 row)

select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
order by id limit 100;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
order by id limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.57..9.22 rows=100 width=45) (actual time=4003.046..4003.071 rows=100 loops=1)
Output: id, info, crt_time
->  Index Scan using tbl_pkey on public.tbl  (cost=0.57..3208769.85 rows=37078282 width=45) (actual time=4003.045..4003.062 rows=100 loops=1)
Output: id, info, crt_time
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Rows Removed by Filter: 29130419
Planning Time: 0.093 ms
Execution Time: 4003.096 ms
(10 rows)

-- 大量filter  ``````

Optimize the initial SQL:

``````select min(id),max(id) from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00';
min    |   max
----------+----------
29130420 | 66423729
(1 row)

select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id>=29130420 and id<=66423729
order by id desc limit 100;

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id>=29130420 and id<=66423729
order by id desc limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.57..10.57 rows=100 width=45) (actual time=0.022..0.051 rows=100 loops=1)
Output: id, info, crt_time
Buffers: shared hit=6
->  Index Scan Backward using tbl_pkey on public.tbl  (cost=0.57..1374883.82 rows=13745329 width=45) (actual time=0.021..0.043 rows=100 loops=1)
Output: id, info, crt_time
Index Cond: ((tbl.id >= 29130420) AND (tbl.id <= 66423729))
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Buffers: shared hit=6
Planning Time: 0.121 ms
Execution Time: 0.068 ms
(10 rows)
``````

Optimize SQL statements for batch query:

``````select * from tbl where
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id<66423630
order by id desc limit 100;

select * from tbl where
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'
and id<66423530
order by id desc limit 100;   ``````
``````                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.57..9.90 rows=100 width=45) (actual time=0.042..0.074 rows=100 loops=1)
Output: id, info, crt_time
Buffers: shared hit=8
->  Index Scan Backward using tbl_pkey on public.tbl  (cost=0.57..2298576.97 rows=24640916 width=45) (actual time=0.041..0.065 rows=100 loops=1)
Output: id, info, crt_time
Index Cond: (tbl.id < 66423530)
Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))
Buffers: shared hit=8
Planning Time: 0.107 ms
Execution Time: 0.097 ms
(10 rows)  ``````

It has 40000 times better performance.

0 0 0
Share on

# digoal

262 posts | 23 followers

# digoal

262 posts | 23 followers

# Related Products

• ## AnalyticDB for PostgreSQL

An online MPP warehousing service based on the Greenplum Database open source program

• ## ApsaraDB RDS for PostgreSQL

An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities

• ## Database for FinTech Solution

Leverage cloud-native database solutions dedicated for FinTech.