By digoal
Data organization + scanning method determines how much data is filtered and how to achieve the performance limit, which is the index and accurate positioning, and requires no filter at all.
Create a table with the following data distribution:
Gid=1, 100,000, gid=2, 100,000, ... gid=10, 100,000.
Continuous distribution. crt_time is written in sequence from beginning to end, with gid 10 at the end of the table.
Then, we need to find the data with gid=9,10 and sort it by crt_time.
This example mainly shows that the scanning method determines how much data is filtered and how the performance is limited, meaning indexing and accurate positioning requires no filter at all.
The method used in this example is merge sort.
Examples are listed below:
Write data to the tbl table and create indexes for gid and crt_time:
create unlogged table tbl (gid int, crt_time timestamp, info text, v numeric);
insert into tbl select g, clock_timestamp(), md5(random()::text), random()*1000 from generate_series(1,10) g, generate_series(1,100000) t;
create index idx_tbl_1 on tbl (gid,crt_time);
postgres=# select * from tbl limit 10;
gid | crt_time | info | v
-----+----------------------------+----------------------------------+------------------
1 | 2022-08-26 10:33:40.167693 | dca2a67612cfe0502a984690088f9d4e | 926.480833187289
1 | 2022-08-26 10:33:40.169892 | bfd8a1c94b4c8d7f7da639c005462f1e | 553.004215673141
1 | 2022-08-26 10:33:40.169905 | 7d9ecdd22f261b361402620f48a5ffde | 992.857001112929
1 | 2022-08-26 10:33:40.169907 | 4c646b30e9f2a1442f20689179d7dbdc | 882.250459518574
1 | 2022-08-26 10:33:40.169909 | b9efd4224ed1e3f5a7db32eaa10e7c21 | 348.748854964515
1 | 2022-08-26 10:33:40.169911 | 09c7dcd2846b54c7f623010a9499de84 | 569.219270037151
1 | 2022-08-26 10:33:40.169913 | 955f1f35934c79a8ca3558c68a39377b | 11.7484035042552
1 | 2022-08-26 10:33:40.169915 | b660a864dab7a28e409e92d2aa769e7a | 719.183417035431
1 | 2022-08-26 10:33:40.169916 | 14c4c30150371a5a9bbc10373c45680c | 809.989509768999
1 | 2022-08-26 10:33:40.169918 | 00532483f3e44f2dcde934fb5e2c960f | 753.914120371643
(10 rows)
The gid to be checked is stored in the tbl_gid table.
postgres=# create table tbl_gid (gid int);
CREATE TABLE
postgres=# insert into tbl_gid values (9),(10);
INSERT 0 2
The query statement is listed below:
postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=543741.56..543741.59 rows=10 width=56) (actual time=332.189..332.197 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=15206
-> Sort (cost=543741.56..575616.56 rows=12750000 width=56) (actual time=332.186..332.189 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Sort Key: tbl.crt_time
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=15206
-> Merge Join (cost=142.03..224991.56 rows=12750000 width=56) (actual time=202.986..305.187 rows=200000 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Merge Cond: (tbl.gid = tbl_gid.gid)
Buffers: shared hit=15203
-> Index Scan using idx_tbl_1 on public.tbl (cost=0.42..31099.96 rows=1000000 width=56) (actual time=0.038..176.019 rows=1000000 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=15198
-> Sort (cost=179.78..186.16 rows=2550 width=4) (actual time=0.053..14.009 rows=100001 loops=1)
Output: tbl_gid.gid
Sort Key: tbl_gid.gid
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=5
-> Seq Scan on public.tbl_gid (cost=0.00..35.50 rows=2550 width=4) (actual time=0.011..0.011 rows=2 loops=1)
Output: tbl_gid.gid
Buffers: shared hit=1
Planning:
Buffers: shared hit=199
Planning Time: 1.688 ms
Execution Time: 332.400 ms
(27 rows)
It seems the outer table is the large table, the inner table is the small table, and there is no explicit sorting. This method looks smart but is time-consuming.
postgres=# set enable_mergejoin =off;
SET
postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=788931.38..788931.40 rows=10 width=56) (actual time=236.632..236.637 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=11365
-> Sort (cost=788931.38..820806.38 rows=12750000 width=56) (actual time=236.631..236.633 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Sort Key: tbl.crt_time
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=11365
-> Hash Join (cost=67.38..470181.38 rows=12750000 width=56) (actual time=151.584..211.768 rows=200000 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Hash Cond: (tbl.gid = tbl_gid.gid)
Buffers: shared hit=11365
-> Seq Scan on public.tbl (cost=0.00..21364.00 rows=1000000 width=56) (actual time=0.023..79.634 rows=1000000 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=11364
-> Hash (cost=35.50..35.50 rows=2550 width=4) (actual time=0.011..0.012 rows=2 loops=1)
Output: tbl_gid.gid
Buckets: 4096 Batches: 1 Memory Usage: 33kB
Buffers: shared hit=1
-> Seq Scan on public.tbl_gid (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.007 rows=2 loops=1)
Output: tbl_gid.gid
Buffers: shared hit=1
Planning Time: 0.173 ms
Execution Time: 236.682 ms
(24 rows)
postgres=# set enable_hashjoin =off;
SET
After hashjoin and mergejoin are closed, it becomes faster.
Moreover, the optimizer chose the large table as the inner table, which shows that the filtering performance is better. At this point, PG is good. If it is MySQL, you have to write STRAIGHT_JOIN to fix the large table as the inner table.
postgres=# explain (analyze,verbose,timing,costs,buffers) select tbl.* from tbl join tbl_gid on tbl.gid=tbl_gid.gid order by tbl.crt_time limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7349107.95..7349107.98 rows=10 width=56) (actual time=96.025..96.028 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=3048
-> Sort (cost=7349107.95..7380982.95 rows=12750000 width=56) (actual time=96.023..96.025 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Sort Key: tbl.crt_time
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=3048
-> Nested Loop (cost=0.42..7030357.95 rows=12750000 width=56) (actual time=0.090..67.554 rows=200000 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=3048
-> Seq Scan on public.tbl_gid (cost=0.00..35.50 rows=2550 width=4) (actual time=0.030..0.031 rows=2 loops=1)
Output: tbl_gid.gid
Buffers: shared hit=1
-> Index Scan using idx_tbl_1 on public.tbl (cost=0.42..1756.99 rows=100000 width=56) (actual time=0.032..17.225 rows=100000 loops=2)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Index Cond: (tbl.gid = tbl_gid.gid)
Buffers: shared hit=3047
Planning Time: 0.219 ms
Execution Time: 96.072 ms
(20 rows)
The data search method above can obtain gid from tbl_gid and search all records of gid in tbl.
You may say, why not scan directly from the crT_time of gid in index order and then find the matching gid and limit to return? As mentioned earlier, the data are linearly distributed. According to the order of crt_time, when you find the data with gid=9,10, you have filtered 800,000 rows of useless records.
The real optimization is to eliminate filtering. So, you can use union all instead of join and use merge sort to return.
If the optimizer can skip scan in the future, this writing may not be needed.
Generate SQL:
do language plpgsql $$
declare
sql text := '';
i int;
u text := ' union all ';
begin
sql := 'select * from ';
for i in select tbl_gid.gid from tbl_gid loop
sql := sql || format (' (select * from tbl where gid=%s order by crt_time) ', i);
sql := sql || u;
end loop;
sql := rtrim(sql, u) || ' order by crt_time limit 10;';
raise notice '%', sql;
end;
$$;
Execution Time: 100 milliseconds reduced to less than 1 millisecond
select * from
(select * from tbl where gid=9 order by crt_time ) -- No limit is needed inside because PG supports merge append.
union all
(select * from tbl where gid=10 order by crt_time ) -- No limit is needed inside because PG supports merge append.
order by crt_time limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.86..1.80 rows=10 width=56) (actual time=0.036..0.045 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Buffers: shared hit=9
-> Merge Append (cost=0.86..18461.52 rows=197233 width=56) (actual time=0.035..0.042 rows=10 loops=1)
Sort Key: tbl.crt_time
Buffers: shared hit=9
-> Index Scan using idx_tbl_1 on public.tbl (cost=0.42..8257.79 rows=99100 width=56) (actual time=0.022..0.026 rows=10 loops=1)
Output: tbl.gid, tbl.crt_time, tbl.info, tbl.v
Index Cond: (tbl.gid = 9)
Buffers: shared hit=5
-> Index Scan using idx_tbl_1 on public.tbl tbl_1 (cost=0.42..8231.38 rows=98133 width=56) (actual time=0.011..0.011 rows=1 loops=1)
Output: tbl_1.gid, tbl_1.crt_time, tbl_1.info, tbl_1.v
Index Cond: (tbl_1.gid = 10)
Buffers: shared hit=4
Planning Time: 0.165 ms
Execution Time: 0.069 ms
(16 rows)
Replace function calls, and they can be easily used.
create or replace function get_tbl_from_tbl_gid() returns TABLE(gid int,crt_time timestamp, info text, v numeric) as $$
declare
sql text := '';
i int;
u text := ' union all ';
begin
sql := 'select * from ';
for i in select tbl_gid.gid from tbl_gid loop
sql := sql || format (' (select * from tbl where gid=%s order by crt_time) ', i);
sql := sql || u;
end loop;
sql := rtrim(sql, u) || ' order by crt_time limit 10;';
return query execute sql ;
end;
$$ language plpgsql strict;
postgres=# select * from get_tbl_from_tbl_gid();
gid | crt_time | info | v
-----+----------------------------+----------------------------------+------------------
9 | 2022-08-26 10:33:41.986023 | 9ae6e7501235704db4a06a2583fa3869 | 934.268270867721
9 | 2022-08-26 10:33:41.986025 | 2f7b856bf6118a2f705550e50046db1a | 444.519415857999
9 | 2022-08-26 10:33:41.986027 | 7bbfcba972224fbd1feaedae7f468391 | 224.418366498535
9 | 2022-08-26 10:33:41.986028 | ac93a0fdd1e753bec5d102b92a829510 | 977.99573846894
9 | 2022-08-26 10:33:41.98603 | 1fa1c6aac66c12b80c8c35a6032be0e7 | 741.469261203189
9 | 2022-08-26 10:33:41.986032 | 65c4e99b0fd2d818159f504cc3238e1b | 434.032166357292
9 | 2022-08-26 10:33:41.986033 | c3a0e0f437145b2c0ccbf569e713d457 | 107.788739293836
9 | 2022-08-26 10:33:41.986035 | d0a9af284773e7b3516ea2d3753afb0d | 718.387729379674
9 | 2022-08-26 10:33:41.986037 | 2c8a1ae7d26b896563b5bcd89f0d0782 | 254.24026094129
9 | 2022-08-26 10:33:41.986058 | 551224cd6c0359c6076f2ea78e01576f | 80.4301336515814
(10 rows)
Time: 0.724 ms
Will you encounter such a data distribution scenario in reality?
Sensors, feed data, monitoring data, streaming data, and time series data
When the data is discrete (for example, there is data only when the alerting is triggered), the time is divided into the alert time and the data written time. If the difference between the alert and arrival time of the sensors is large, when we search by alert time, such a problem may occur.
ApsaraDB - August 7, 2023
ApsaraDB - May 18, 2022
digoal - February 17, 2022
digoal - December 21, 2020
digoal - July 4, 2019
ApsaraDB - October 19, 2023
ApsaraDB: Faster, Stronger, More Secure
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreMigrating to fully managed cloud databases brings a host of benefits including scalability, reliability, and cost efficiency.
Learn MoreMore Posts by digoal