PostgreSQL Convergence Filtering Optimization for Segment SQL with Multiple Range Conditions

By digoal

Background

When there are multiple range conditions, the database can only use one range condition. Other range conditions cannot converge accurately based on composite indexes.

The following ts and col1 are converted into range conditions:

create unlogged table tbl (id int, col1 text, ts timestamp);  
insert into tbl select random()*1000000, random()::text, now()+(random()*1000||' hour')::interval from generate_series(1,10000000);  
  
select * from tbl   
where ts >= $1 and ts <= $2   
and col1 like '0.1%';   
  
select * from tbl   
where ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00'  
and col1 like '0.1%';

Normally, we will create the following indexes for the database to select one index:

If lc_collate=C, you can use:

postgres=# create index idx_tbl_2 on tbl(col1);  
CREATE INDEX  
Time: 5713.425 ms (00:05.713)  
  
postgres=# create index idx_tbl_3 on tbl(ts);  
CREATE INDEX  
Time: 4155.860 ms (00:04.156)

Even if you create a composite index of ts and col1, it doesn’t work. As explained earlier, the second range field cannot be filtered through the index.

The query performance is listed below:

postgres=# explain                                                                                                                                                                                        select count(*) from tbl                                                                                                                                                                                  where col1 like '0.1%'                                                                                                                                                                                    and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';  
                                                                               QUERY PLAN                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=114785.25..114785.26 rows=1 width=8)  
   ->  Index Scan using idx_tbl_2 on tbl  (cost=0.56..114353.77 rows=172590 width=0)  
         Index Cond: ((col1 >= '0.1'::text) AND (col1 < '0.2'::text))  
         Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))  
(4 rows)  
  
Time: 0.674 ms

postgres=#                                                                                                                                                                                                select count(*) from tbl                                                                                                                                                                                  where col1 like '0.1%'                                                                                                                                                                                    and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';  
 count    
--------  
 169344  
(1 row)  
  
Time: 4087.236 ms (00:04.087)

Optimization Method

Time is stepped and can be segmented by day to create an expression index. The purpose of this is to eliminate a range so the index skip scan can be used to accelerate in the leading query, and the second range becomes an accurate index to converge.

create or replace function im_to_char (timestamp,text) returns text as $$  
  select to_char($1,$2);  
$$ language sql strict immutable parallel safe;  
  
create or replace function im_to_char (timestamptz,text) returns text as $$  
  select to_char($1,$2);  
$$ language sql strict immutable parallel safe;  
  
create or replace function im_to_char (date,text) returns text as $$  
  select to_char($1,$2);  
$$ language sql strict immutable parallel safe;

create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1 text_pattern_ops);

If lc_collate=C, you can use:

create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1);

SQL rewrite is listed below:

select * from tbl   
where im_to_char(ts,'yyyymmdd') = any(array(  
  select im_to_char(ts::date+i,'yyyymmdd') from generate_series(0,$2::date-$1::date+1) i  
  ))  
and col1 like '0.1%'  
and ts >= $1 and ts <= $2;

Get SQL like this:

select count(*) from tbl   
where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])  
and col1 like '0.1%'  
and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';

As seen from the execution plan, this composite index has been used, and index cond is used for the like and segmentation conditions we want. Filter the exact time and like in the filter:

postgres=# explain select count(*) from tbl                                                                                                                                                               where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])                                                                     and col1 like '0.1%'                                                                                                                                                                                      and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';  
                                                                                               QUERY PLAN                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=36081.60..36081.61 rows=1 width=8)  
   ->  Index Scan using idx_t_1 on tbl  (cost=0.56..36064.35 rows=6900 width=0)  
         Index Cond: ((im_to_char(ts, 'yyyymmdd'::text) = ANY ('{20210521,20210522,20210523,20210524,20210525,20210526,20210527,20210528}'::text[])) AND (col1 >= '0.1'::text) AND (col1 < '0.2'::text))  
         Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))  
(4 rows)  
  
Time: 0.883 ms

Time consumption is reduced significantly.

postgres=# select count(*) from tbl   
postgres-# where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])  
postgres-# and col1 like '0.1%'  
postgres-# and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';  
 count    
--------  
 169344  
(1 row)  
  
Time: 680.602 ms

Community

PostgreSQL Convergence Filtering Optimization for Segment SQL with Multiple Range Conditions

Background

Optimization Method

Read previous post:

Read next post:

digoal

You may also like

Comments

Dikky Ryan Pratama June 23, 2023 at 8:46 am

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for SQL Server

ApsaraDB RDS for PostgreSQL