×
Community Blog Range Types in PostgreSQL and GiST Indexes

Range Types in PostgreSQL and GiST Indexes

This article describes the range types introduced in PostgreSQL 9.2 and indexes for range types that can significantly improve query performance.

By Digoal

Recently, a friend in the community of Alibaba Cloud developers complained that their database did not show very good performance when processing thousands of concurrent IP address library queries in MySQL. So I asked him for his IP address library data, SQL query statements, and the table structure in MySQL. I told him that I was going to transfer the data into PostgreSQL and perform stress testing to see the query performance in PostgreSQL.

This issue and its related requirement - the need to process thousands of concurrent IP address library queries in MySQL - is also common in many business scenarios, such as calculations involving age ranges, income ranges, frequent activity ranges, geo location chunks, geometric chunks, and line segments. In reality, all these scenarios have the same description dimension. It's scope. With the development of the Internet of Things (IoT), this type of query will become more and more common in the years to come.

However, a lack of good indexing mechanisms will lead to heavy CPU usage and performance bottlenecks. This article describes the range types introduced in PostgreSQL 9.2 and indexes for range types that can significantly improve query performance.

Some Preliminary Considerations

For a range type to be considered a range type, it must contain a subtype. For example, the range of int4 is called int4range. Next, to use int4range to represent 1, 2, 3, 4, and 5, we can use '(0,6)'::int4range, '[1,6)'::int4range, '[1,5]'::int4range or '(0,5]'::int4range.

postgres=# select '(0,6)'::int4range;  
 int4range   
-----------  
 [1,6)  
(1 row)  
  
postgres=# select '[1,6)'::int4range;  
 int4range   
-----------  
 [1,6)  
(1 row)  
  
postgres=# select '[1,5]'::int4range;  
 int4range   
-----------  
 [1,6)  
(1 row)  
  
postgres=# select '(0,5]'::int4range;  
 int4range   
-----------  
 [1,6)  
(1 row)  

Next, sparse range types use [) to represent a range of elements. In this, the bracket indicates inclusion and a parenthesis indicates exclusion. Sparse range types must have canonical functions defined, which converts the storage format to [). And for a sparse range type, we know what the previous value and the following value of a specific value. For example, the value 1 in an int range is preceded by 0 and followed by 2. If this is a numeric range, we do not the value before 1 (0.9999999999... till infinite) and the value after 1 (1.00000000... 1). However, continuous range types store accurate elements, for example:

postgres=# select '(0,5]'::numrange;  
 numrange   
----------  
 (0,5]  
(1 row)  
  
postgres=# select '[0,5]'::numrange;  
 numrange   
----------  
 [0,5]  
(1 row)  
  
postgres=# select '[0,5)'::numrange;  
 numrange   
----------  
 [0,5)  
(1 row)  
  
postgres=# select '(0,5)'::numrange;  
 numrange   
----------  
 (0,5)  
(1 row)  

Some Features in PostgreSQL

PostgreSQL provides several useful features for range types, such as the inclusion, exclusion, and intersection. Specifically, PostgreSQL comes with the following built-in range types:

INT4RANGE — Range of INTEGER  
INT8RANGE — Range of BIGINT  
NUMRANGE — Range of NUMERIC  
TSRANGE — Range of TIMESTAMP WITHOUT TIME ZONE  
TSTZRANGE — Range of TIMESTAMP WITH TIME ZONE  
DATERANGE — Range of DATE  

The following can be found in the system table.

digoal=# select oid, typname from pg_type where typname ~ 'range';  
  oid  |  typname     
-------+------------  
  3904 | int4range  
  3905 | _int4range  
  3906 | numrange  
  3907 | _numrange  
  3908 | tsrange  
  3909 | _tsrange  
  3910 | tstzrange  
  3911 | _tstzrange  
  3912 | daterange  
  3913 | _daterange  
  3926 | int8range  
  3927 | _int8range  
  3831 | anyrange  
 11026 | pg_range  

Let's see which functions are related to anyrange .

digoal=# select proname,proargtypes from pg_proc where proargtypes::text ~ '3831';  
         proname         |     proargtypes        
-------------------------+----------------------  
 anyrange_out            | 3831  
 range_out               | 3831  
 range_send              | 3831  
 lower                   | 3831   -- the lower bound of a range. Sparse types and continuous types are different in this regard. See the example section  
 upper                   | 3831  -- the upper bound of a range. Sparse types and continuous types are different in this regard. See the example section  
 isempty                 | 3831  -- indicates if a range does not contain any elements  
 lower_inc               | 3831  -- indicates that the lower bound is inclusive. Sparse types and continuous types are different in this regard. See the example section  
 upper_inc               | 3831  -- indicates that the upper bound is inclusive. Sparse types and continuous types are different in this regard. See the example section  
 lower_inf               | 3831  -- indicates that the lower bound is an infinitely small value. (Note that this infinite value is never the value of the subtype of the range. It means that the lower bound is not defined. See the example section.)  
 upper_inf               | 3831  -- indicates that the upper bound is an infinitely large value. (Note that this infinite value is never the value of the subtype of the range. It means that the upper bound is not defined. See the example section.)  
 range_eq                | 3831 3831  
 range_ne                | 3831 3831  
 range_overlaps          | 3831 3831  
 range_contains_elem     | 3831 2283  
 range_contains          | 3831 3831  
 elem_contained_by_range | 2283 3831  
 range_contained_by      | 3831 3831  
 range_adjacent          | 3831 3831  
 range_before            | 3831 3831  
 range_after             | 3831 3831  
 range_overleft          | 3831 3831  
 range_overright         | 3831 3831  
 range_union             | 3831 3831  
 range_intersect         | 3831 3831  
 range_minus             | 3831 3831  
 range_cmp               | 3831 3831  
 range_lt                | 3831 3831  
 range_le                | 3831 3831  
 range_ge                | 3831 3831  
 range_gt                | 3831 3831  
 range_gist_consistent   | 2281 3831 23 26 2281  
 range_gist_same         | 3831 3831 2281  
 hash_range              | 3831  

The following are the related operators:

postgres=# select oprname from pg_operator where oprleft=3831 or oprright=3831;  
 oprname   
---------  
 =  
 <>  
 <  
 <=  
 >=  
 >  
 &&  
 @>  
 @>  
 <@  
 <@  
 <<  
 >>  
 &<  
 &>  
 -|-  
 +  
 -  
 *  

Examples of the Built-in Range Types

To understand the built-in range types and how they work. Follow these steps:

1.  Create a test table.

digoal=# CREATE TABLE reservation ( room int, during TSRANGE );  

2.  Insert a range of test data of the subtype timestamp.

digoal=# INSERT INTO reservation VALUES  
digoal-#   ( 1108, '[2010-01-01 14:30, 2010-01-01 15:30)' );  
INSERT 0 1  

Note that @> determines the inclusivity.

digoal=# SELECT int4range(10, 20) @> 3;  
 ? column?   
----------  
 f  
(1 row)  

Also note that && judges specifies whether two ranges have overlaps.

digoal=# SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);  
 ? column?   
----------  
 t  
(1 row)  

Note that upper finds the upper bound, and that int8range is also a sparse range.

digoal=# SELECT upper(int8range(15, 25));  
 upper   
-------  
    25  
(1 row)  

3.  Find the upper bound of the continuous range numrange. The result is consistent with that of the previous range.

digoal=# SELECT upper(numrange(15, 25));  
 upper   
-------  
    25  
(1 row)  

To put it simply, the output of a sparse range is an exclusive upper bound. In other words, they are the values of the upper and lower bounds in the [) range pattern.

digoal=# SELECT upper('(15,25)'::int8range);  
 upper   
-------  
    25  
(1 row)  
  
digoal=# SELECT upper('(15,25]'::int8range);  
 upper   
-------  
    26  
(1 row)  

For continuous ranges, the output depends on the range input.

digoal=# SELECT upper('(15,25]'::numrange);  
 upper   
-------  
    25  
(1 row)  
  
digoal=# SELECT upper('(15,25)'::numrange);  
 upper   
-------  
    25  
(1 row)  

An askerisk (*) returns the output of the intersection of two ranges.

digoal=# SELECT int4range(10, 20) * int4range(15, 25);  
 ? column?   
----------  
 [15,20)  
(1 row)  

The isempty function indicates if a range is empty.

digoal=# SELECT isempty(numrange(1, 5));  
 isempty   
---------  
 f  
(1 row)  

Infinitely Large Elements

The following examples are used to explain infinitely large elements. Below, the following is a time range from now to an infinitely large value.

digoal=# SELECT '(now,)'::tsrange;  
             tsrange               
---------------------------------  
 ("2012-05-17 16:32:43.055233",)  
(1 row)  

And the following, here, is a time range from an infinitely small value to an infinitely large value.

digoal=# SELECT '(,)'::tsrange;  
 tsrange   
---------  
 (,)  
(1 row)  

Last, the following shows a time range from an infinitely value to now.

digoal=# SELECT '(,now)'::tsrange;  
             tsrange               
---------------------------------  
 (,"2012-05-17 16:32:55.800172")  
(1 row)  

Range Value Patterns

Range values entered must follow one of the example formats below:

(lower-bound,upper-bound)  
(lower-bound,upper-bound]  
[lower-bound,upper-bound)  
[lower-bound,upper-bound]  
empty  

For example, the following range is empty. It does not contain any elements.

digoal=# SELECT 'empty'::tsrange;  
 tsrange   
---------  
 empty  
(1 row)  

Each range type has a constructor function with the same name as the range type. The constructor function can accept three arguments: lower bound, upper bound, and boundary pattern (with one of the strings "()", "(]", "[)", or "[]"). For example, the constructor function of the int4range type is also called int4range.

digoal=# select int4range(1,2,'()');  
 int4range   
-----------  
 empty  
(1 row)  
  
digoal=# select int4range(1,2,'(]');  
 int4range   
-----------  
 [2,3)  
(1 row)  
  
digoal=# select int4range(null,2,'(]');  
 int4range   
-----------  
 (,3)  
(1 row)  

Customize Range Types

Defining a new range type will also create a constructor function with the same name as that new range type. Consider the following example:

digoal=# create type iprange as range (subtype=inet);  
CREATE TYPE  
  
digoal=# select iprange('1.1.1.1'::inet,null);  
  iprange     
------------  
 [1.1.1.1,)  
(1 row)  

Let's see the example syntax that creates a range type:

CREATE TYPE name AS RANGE (  
    SUBTYPE = subtype  
    [ , SUBTYPE_OPCLASS = subtype_operator_class ]  
    [ , COLLATION = collation ]  
    [ , CANONICAL = canonical_function ]  
    [ , SUBTYPE_DIFF = subtype_diff_function ]  
)  

In the above syntax, subtype_diff_function improves the GiST index for better query performance, and canonical_function defines sparse range types. Next, we can use a GiST index on a range type to accelerate queries in some specific scenarios.

A GiST index can accelerate queries involving these range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &>  

Note that B-tree or hash indexes are not suitable for range types.

digoal=# CREATE INDEX reservation_idx ON reservation USING gist (during);  
CREATE INDEX  
digoal=# \d reservation  
  Table "public.reservation"  
 Column |  Type   | Modifiers   
--------+---------+-----------  
 room   | integer |   
 during | tsrange |   
Indexes:  
    "reservation_idx" gist (during)  
digoal=# insert into reservation values (1,'(,now)'::tsrange);  
INSERT 0 1  
  
digoal=# select * from reservation ;  
 room |                    during                       
------+-----------------------------------------------  
 1108 | ["2010-01-01 14:30:00","2010-01-01 15:30:00")  
    1 | (,"2012-05-17 16:49:13.40783")  
(2 rows)  
  
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;  
                                          QUERY PLAN                                            
----------------------------------------------------------------------------------------------  
 Seq Scan on reservation  (cost=0.00.. 1.01 rows=1 width=36)  
   Filter: (during @> '["2012-05-17 16:50:18.794268","2012-05-17 16:50:18.794268"]'::tsrange)  
(2 rows)  

This example has a too small number of records and the query is performed without using the index. Next we can force a query with the index. Consider the following code.

digoal=# set enable_seqscan=off;  
SET  
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;  
                                            QUERY PLAN                                              
--------------------------------------------------------------------------------------------------  
 Index Scan using reservation_idx on reservation  (cost=0.00.. 8.27 rows=1 width=36)  
   Index Cond: (during @> '["2012-05-17 16:50:59.716661","2012-05-17 16:50:59.716661"]'::tsrange)  
(2 rows)  

The EXCLUDE Constraint

The EXCLUDE constraint is also mentioned in the PostgreSQL DBA2000 training material. The following is an example of the EXCLUDE constraint in a range:

digoal=# delete from reservation ;  
DELETE 2  

The following constraint prevents any overlapping time values from existing in the during field at the same time.

digoal=# ALTER TABLE reservation  
digoal-#   ADD EXCLUDE USING gist (during WITH &&);  
NOTICE:  ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_during_excl" for table "reservation"  
ALTER TABLE  
  
digoal=# INSERT INTO reservation VALUES  
  ( 1108, '[2010-01-01 11:30, 2010-01-01 13:00)' );  
INSERT 0 1  

The insert operation is a failure due to the time conflicts (or overlaps), indicating that the constraint is effective.

digoal=# INSERT INTO reservation VALUES  
  ( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );  
ERROR:  conflicting key value violates exclusion constraint "reservation_during_excl"  
DETAIL:  Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).  
STATEMENT:  INSERT INTO reservation VALUES  
          ( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );  
ERROR:  conflicting key value violates exclusion constraint "reservation_during_excl"  
DETAIL:  Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).  

You can also install the btree_gist module to enhance the EXCLUDE constraint. Because currently I have not installed the btree_gist module, I have failed to use the GiST index on the column of the int range type.

A GiST index cannot be created on the column of the int range type. The following SQL statement returns an error:

digoal=# ALTER TABLE reservation         
  ADD EXCLUDE USING gist (room WITH =, during WITH &&);  
ERROR:  data type integer has no default operator class for access method "gist"  
HINT:  You must specify an operator class for the index or define a default operator class for the data type.  
STATEMENT:  ALTER TABLE reservation  
          ADD EXCLUDE USING gist (room WITH =, during WITH &&);  
ERROR:  data type integer has no default operator class for access method "gist"  
HINT:  You must specify an operator class for the index or define a default operator class for the data type. 

Retry the preceding SQL statement after the btree_gist module is loaded. The statement runs successfully.

digoal=# create extension btree_gist;  
CREATE EXTENSION  
digoal=# ALTER TABLE reservation       
  ADD EXCLUDE USING gist (room WITH =, during WITH &&);  
NOTICE:  ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_room_during_excl" for table "reservation"  
ALTER TABLE  

The constraint rejects records with the same room number and overlapping during field values. Consider the following example:

digoal=# CREATE TABLE room_reservation  
digoal-# (  
digoal(#   room TEXT,  
digoal(#   during TSRANGE,  
digoal(#   EXCLUDE USING gist (room WITH =, during WITH &&)  
digoal(# );  
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "room_reservation_room_during_excl" for table "room_reservation"  
CREATE TABLE  
  
digoal=# INSERT INTO room_reservation VALUES  
digoal-#   ( '123A', '[2010-01-01 14:00, 2010-01-01 15:00)' );  
INSERT 0 1  
  
digoal=# INSERT INTO room_reservation VALUES  
digoal-#   ( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );  
ERROR:  conflicting key value violates exclusion constraint "room_reservation_room_during_excl"  
DETAIL:  Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).  
STATEMENT:  INSERT INTO room_reservation VALUES  
          ( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );  
ERROR:  conflicting key value violates exclusion constraint "room_reservation_room_during_excl"  
DETAIL:  Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).  
  
digoal=# INSERT INTO room_reservation VALUES  
digoal-#   ( '123B', '[2010-01-01 14:30, 2010-01-01 15:30)' );  
INSERT 0 1  

btree_gist supports the following types:

int2, int4, int8, float4, float8, numeric, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, char, varchar, text, bytea, bit, varbit, macaddr, inet, and cidr.  

An Application Scenario of Range Types

Consider the following scenario, you want to use an IP address to locate the region where that specific IP address is. Now, assume that you will use iprange to store IPs and their corresponding region names. And then assume that you need to find the region of an IP address that a user has submitted. To do this, you need to additionally follow these steps:

1.  Create an iprange.

digoal=# create type iprange as range (subtype=inet);  
CREATE TYPE  

2.  Create a test table.

digoal=# create table ip_info (id serial primary key,iprange iprange,location text);  
NOTICE:  CREATE TABLE will create implicit sequence "ip_info_id_seq" for serial column "ip_info.id"  
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "ip_info_pkey" for table "ip_info"  
CREATE TABLE  

3.  Create an EXCLUDE constraint. Note that I first need to install the btree_gist module because the GiST index used here is of type Text. Otherwise, the creation would fail.

digoal=# alter table ip_info add constraint ck_exclude_iprange exclude using gist(location with =, iprange with &&);  
NOTICE:  ALTER TABLE / ADD EXCLUDE will create implicit index "ck_exclude_iprange" for table "ip_info"  
ALTER TABLE  

4.  Insert test data:

digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.0'::inet,'192.168.1.10'::inet,'[]'),'Beijing');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.11'::inet,'192.168.1.20'::inet,'[]'),'Shanghai');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.21'::inet,'192.168.1.30'::inet,'[]'),'Nanjing');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.31'::inet,'192.168.1.40'::inet,'[]'),'Hangzhou');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.41'::inet,'192.168.1.50'::inet,'[]'),'Nanchang');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.51'::inet,'192.168.1.60'::inet,'[]'),'Guangzhou');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.61'::inet,'192.168.1.70'::inet,'[]'),'Chongqing');  
INSERT 0 1  
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.71'::inet,'192.168.1.80'::inet,'[]'),'Hong Kong');  
INSERT 0 1  

5.  View the current table structure.

digoal=# \d ip_info  
                          Table "public.ip_info"  
  Column  |  Type   |                      Modifiers                         
----------+---------+------------------------------------------------------  
 id       | integer | not null default nextval('ip_info_id_seq'::regclass)  
 iprange  | iprange |   
 location | text    |   
Indexes:  
    "ip_info_pkey" PRIMARY KEY, btree (id)  
    "ck_exclude_iprange" EXCLUDE USING gist (location WITH =, iprange WITH &&)  

6.  Perform a test query.

digoal=# select * from ip_info where iprange @> '192.168.1.1'::inet;  
 id |          iprange           | location   
----+----------------------------+----------  
  1 | [192.168.1.0,192.168.1.10] | 北京  
(1 row)  

7.  View the query execution plan.

digoal=# explain select * from ip_info where iprange @> '192.168.1.1'::inet;  
                                    QUERY PLAN                                       
-----------------------------------------------------------------------------------  
 Index Scan using ck_exclude_iprange on ip_info  (cost=0.00.. 8.27 rows=1 width=68)  
   Index Cond: (iprange @> '192.168.1.1'::inet)  
(2 rows)  

References

Additional links

You can use features similar to thoses in the temporal module in versions of PostgreSQL earlier than version 9.1. For more information, see the following blogs:

0 0 0
Share on

digoal

277 posts | 24 followers

You may also like

Comments

digoal

277 posts | 24 followers

Related Products