×
Community Blog An Example of PostgreSQL join+order by limit Optimization

An Example of PostgreSQL join+order by limit Optimization

This article illustrates the key role of data distribution and scanning methods for optimization.

By digoal

Background

Data organization + scanning method determines how much data is filtered and how to achieve the performance limit, which is the index and accurate positioning, and requires no filter at all.

Create a table with the following data distribution:

Gid=1, 100,000, gid=2, 100,000, ... gid=10, 100,000. Continuous distribution. crt_time is written in sequence from beginning to end, with gid 10 at the end of the table.

Then, we need to find the data with gid=9,10 and sort it by crt_time.

This example mainly shows that the scanning method determines how much data is filtered and how the performance is limited, meaning indexing and accurate positioning requires no filter at all.

The method used in this example is merge sort.

Examples are listed below:

Write data to the tbl table and create indexes for gid and crt_time:

create unlogged table tbl (gid int, crt_time timestamp, info text, v numeric);  
insert into tbl select g, clock_timestamp(), md5(random()::text), random()*1000 from generate_series(1,10) g, generate_series(1,100000) t;  
create index idx_tbl_1 on tbl (gid,crt_time);  
  
postgres=# select * from tbl limit 10;  
 gid |          crt_time          |               info               |        v           
-----+----------------------------+----------------------------------+------------------  
   1 | 2022-08-26 10:33:40.167693 | dca2a67612cfe0502a984690088f9d4e | 926.480833187289  
   1 | 2022-08-26 10:33:40.169892 | bfd8a1c94b4c8d7f7da639c005462f1e | 553.004215673141  
   1 | 2022-08-26 10:33:40.169905 | 7d9ecdd22f261b361402620f48a5ffde | 992.857001112929  
   1 | 2022-08-26 10:33:40.169907 | 4c646b30e9f2a1442f20689179d7dbdc | 882.250459518574  
   1 | 2022-08-26 10:33:40.169909 | b9efd4224ed1e3f5a7db32eaa10e7c21 | 348.748854964515  
   1 | 2022-08-26 10:33:40.169911 | 09c7dcd2846b54c7f623010a9499de84 | 569.219270037151  
   1 | 2022-08-26 10:33:40.169913 | 955f1f35934c79a8ca3558c68a39377b | 11.7484035042552  
   1 | 2022-08-26 10:33:40.169915 | b660a864dab7a28e409e92d2aa769e7a | 719.183417035431  
   1 | 2022-08-26 10:33:40.169916 | 14c4c30150371a5a9bbc10373c45680c | 809.989509768999  
   1 | 2022-08-26 10:33:40.169918 | 00532483f3e44f2dcde934fb5e2c960f | 753.914120371643  
(10 rows)  

The gid to be checked is stored in the tbl_gid table.

postgres=# create table tbl_gid (gid int);  
CREATE TABLE  
postgres=# insert into tbl_gid values (9),(10);  
INSERT 0 2  

The query statement is listed below:

postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;  
                                                                        QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=543741.56..543741.59 rows=10 width=56) (actual time=332.189..332.197 rows=10 loops=1)  
   Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
   Buffers: shared hit=15206  
   ->  Sort  (cost=543741.56..575616.56 rows=12750000 width=56) (actual time=332.186..332.189 rows=10 loops=1)  
         Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
         Sort Key: tbl.crt_time  
         Sort Method: top-N heapsort  Memory: 26kB  
         Buffers: shared hit=15206  
         ->  Merge Join  (cost=142.03..224991.56 rows=12750000 width=56) (actual time=202.986..305.187 rows=200000 loops=1)  
               Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
               Merge Cond: (tbl.gid = tbl_gid.gid)  
               Buffers: shared hit=15203  
               ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.42..31099.96 rows=1000000 width=56) (actual time=0.038..176.019 rows=1000000 loops=1)  
                     Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
                     Buffers: shared hit=15198  
               ->  Sort  (cost=179.78..186.16 rows=2550 width=4) (actual time=0.053..14.009 rows=100001 loops=1)  
                     Output: tbl_gid.gid  
                     Sort Key: tbl_gid.gid  
                     Sort Method: quicksort  Memory: 25kB  
                     Buffers: shared hit=5  
                     ->  Seq Scan on public.tbl_gid  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.011..0.011 rows=2 loops=1)  
                           Output: tbl_gid.gid  
                           Buffers: shared hit=1  
 Planning:  
   Buffers: shared hit=199  
 Planning Time: 1.688 ms  
 Execution Time: 332.400 ms  
(27 rows)  

It seems the outer table is the large table, the inner table is the small table, and there is no explicit sorting. This method looks smart but is time-consuming.

postgres=# set enable_mergejoin =off;  
SET  
postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=788931.38..788931.40 rows=10 width=56) (actual time=236.632..236.637 rows=10 loops=1)  
   Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
   Buffers: shared hit=11365  
   ->  Sort  (cost=788931.38..820806.38 rows=12750000 width=56) (actual time=236.631..236.633 rows=10 loops=1)  
         Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
         Sort Key: tbl.crt_time  
         Sort Method: top-N heapsort  Memory: 26kB  
         Buffers: shared hit=11365  
         ->  Hash Join  (cost=67.38..470181.38 rows=12750000 width=56) (actual time=151.584..211.768 rows=200000 loops=1)  
               Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
               Hash Cond: (tbl.gid = tbl_gid.gid)  
               Buffers: shared hit=11365  
               ->  Seq Scan on public.tbl  (cost=0.00..21364.00 rows=1000000 width=56) (actual time=0.023..79.634 rows=1000000 loops=1)  
                     Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
                     Buffers: shared hit=11364  
               ->  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual time=0.011..0.012 rows=2 loops=1)  
                     Output: tbl_gid.gid  
                     Buckets: 4096  Batches: 1  Memory Usage: 33kB  
                     Buffers: shared hit=1  
                     ->  Seq Scan on public.tbl_gid  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.007 rows=2 loops=1)  
                           Output: tbl_gid.gid  
                           Buffers: shared hit=1  
 Planning Time: 0.173 ms  
 Execution Time: 236.682 ms  
(24 rows)  
  
  
postgres=# set enable_hashjoin =off;  
SET  

After hashjoin and mergejoin are closed, it becomes faster.

Moreover, the optimizer chose the large table as the inner table, which shows that the filtering performance is better. At this point, PG is good. If it is MySQL, you have to write STRAIGHT_JOIN to fix the large table as the inner table.

postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;  
                                                                      QUERY PLAN                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=7349107.95..7349107.98 rows=10 width=56) (actual time=96.025..96.028 rows=10 loops=1)  
   Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
   Buffers: shared hit=3048  
   ->  Sort  (cost=7349107.95..7380982.95 rows=12750000 width=56) (actual time=96.023..96.025 rows=10 loops=1)  
         Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
         Sort Key: tbl.crt_time  
         Sort Method: top-N heapsort  Memory: 26kB  
         Buffers: shared hit=3048  
         ->  Nested Loop  (cost=0.42..7030357.95 rows=12750000 width=56) (actual time=0.090..67.554 rows=200000 loops=1)  
               Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
               Buffers: shared hit=3048  
               ->  Seq Scan on public.tbl_gid  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.030..0.031 rows=2 loops=1)  
                     Output: tbl_gid.gid  
                     Buffers: shared hit=1  
               ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.42..1756.99 rows=100000 width=56) (actual time=0.032..17.225 rows=100000 loops=2)  
                     Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
                     Index Cond: (tbl.gid = tbl_gid.gid)  
                     Buffers: shared hit=3047  
 Planning Time: 0.219 ms  
 Execution Time: 96.072 ms  
(20 rows)  

The data search method above can obtain gid from tbl_gid and search all records of gid in tbl.

You may say, why not scan directly from the crT_time of gid in index order and then find the matching gid and limit to return? As mentioned earlier, the data are linearly distributed. According to the order of crt_time, when you find the data with gid=9,10, you have filtered 800,000 rows of useless records.

The real optimization is to eliminate filtering. So, you can use union all instead of join and use merge sort to return.

If the optimizer can skip scan in the future, this writing may not be needed.

  • Optimizer of a small step may solve a big problem in some scenarios of an industry. This is the charm of open-source. With demand, there is the power transformation. We constantly develop in the direction of user needs.

Generate SQL:

do language plpgsql $$  
declare  
  sql text := '';  
  i int;  
  u text := ' union all ';  
begin  
  sql := 'select * from ';  
  for i in select tbl_gid.gid from tbl_gid loop  
    sql := sql || format (' (select * from tbl where gid=%s order by crt_time) ', i);  
    sql := sql || u;  
  end loop;  
  sql := rtrim(sql, u) || ' order by crt_time limit 10;';  
  raise notice '%', sql;  
end;  
$$;  

Execution Time: 100 milliseconds reduced to less than 1 millisecond

select * from
(select * from tbl where gid=9 order by crt_time ) -- No limit is needed inside because PG supports merge append.
union all
(select * from tbl where gid=10 order by crt_time ) -- No limit is needed inside because PG supports merge append.
order by crt_time limit 10;
  
                                                                      QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.86..1.80 rows=10 width=56) (actual time=0.036..0.045 rows=10 loops=1)  
   Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
   Buffers: shared hit=9  
   ->  Merge Append  (cost=0.86..18461.52 rows=197233 width=56) (actual time=0.035..0.042 rows=10 loops=1)  
         Sort Key: tbl.crt_time  
         Buffers: shared hit=9  
         ->  Index Scan using idx_tbl_1 on public.tbl  (cost=0.42..8257.79 rows=99100 width=56) (actual time=0.022..0.026 rows=10 loops=1)  
               Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v  
               Index Cond: (tbl.gid = 9)  
               Buffers: shared hit=5  
         ->  Index Scan using idx_tbl_1 on public.tbl tbl_1  (cost=0.42..8231.38 rows=98133 width=56) (actual time=0.011..0.011 rows=1 loops=1)  
               Output: tbl_1.gid, tbl_1.crt_time, tbl_1.info, tbl_1.v  
               Index Cond: (tbl_1.gid = 10)  
               Buffers: shared hit=4  
 Planning Time: 0.165 ms  
 Execution Time: 0.069 ms  
(16 rows)  

Replace function calls, and they can be easily used.

create or replace function get_tbl_from_tbl_gid() returns TABLE(gid int,crt_time timestamp, info text, v numeric) as $$  
declare  
  sql text := '';  
  i int;  
  u text := ' union all ';  
begin  
  sql := 'select * from ';  
  for i in select tbl_gid.gid from tbl_gid loop  
    sql := sql || format (' (select * from tbl where gid=%s order by crt_time) ', i);  
    sql := sql || u;  
  end loop;  
  sql := rtrim(sql, u) || ' order by crt_time limit 10;';  
  return query execute sql ;   
end;   
$$ language plpgsql strict;   
  
  
  
  
postgres=# select * from get_tbl_from_tbl_gid();  
 gid |          crt_time          |               info               |        v           
-----+----------------------------+----------------------------------+------------------  
   9 | 2022-08-26 10:33:41.986023 | 9ae6e7501235704db4a06a2583fa3869 | 934.268270867721  
   9 | 2022-08-26 10:33:41.986025 | 2f7b856bf6118a2f705550e50046db1a | 444.519415857999  
   9 | 2022-08-26 10:33:41.986027 | 7bbfcba972224fbd1feaedae7f468391 | 224.418366498535  
   9 | 2022-08-26 10:33:41.986028 | ac93a0fdd1e753bec5d102b92a829510 |  977.99573846894  
   9 | 2022-08-26 10:33:41.98603  | 1fa1c6aac66c12b80c8c35a6032be0e7 | 741.469261203189  
   9 | 2022-08-26 10:33:41.986032 | 65c4e99b0fd2d818159f504cc3238e1b | 434.032166357292  
   9 | 2022-08-26 10:33:41.986033 | c3a0e0f437145b2c0ccbf569e713d457 | 107.788739293836  
   9 | 2022-08-26 10:33:41.986035 | d0a9af284773e7b3516ea2d3753afb0d | 718.387729379674  
   9 | 2022-08-26 10:33:41.986037 | 2c8a1ae7d26b896563b5bcd89f0d0782 |  254.24026094129  
   9 | 2022-08-26 10:33:41.986058 | 551224cd6c0359c6076f2ea78e01576f | 80.4301336515814  
(10 rows)  
  
Time: 0.724 ms  

Will you encounter such a data distribution scenario in reality?

Sensors, feed data, monitoring data, streaming data, and time series data

When the data is discrete (for example, there is data only when the alerting is triggered), the time is divided into the alert time and the data written time. If the difference between the alert and arrival time of the sensors is large, when we search by alert time, such a problem may occur.

0 2 0
Share on

digoal

276 posts | 24 followers

You may also like

Comments

digoal

276 posts | 24 followers

Related Products