×
Community Blog PostGIS Long Lat Geometry Distance Search Tuning Using the GiST KNN Function

PostGIS Long Lat Geometry Distance Search Tuning Using the GiST KNN Function

This article discusses accelerating retrieval of coordinate points in PostGIS using GiST indexes and demonstrates it using example code.

By Digoal

Background

It is very common to retrieve nearby points in spatial data on PostgreSQL. For example, you can take latitude and longitude as a coordinate point and retrieve other points within 1 km from this point.

Recently, some readers asked about how to optimize this process.

The current version supports this feature, and you do not need to use subqueries.

Demo

This article shows how to use GiST indexes to accelerate retrieval in PostGIS.

Creating a Test Table:

create table cust_jw        
(        
 dz varchar(300),        
 jwd geometry        
);      

The test data is obtained from the internet.

insert into cust_jw values ('杭州', ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163));          
insert into cust_jw values ('北京', ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163));          
insert into cust_jw values ('南京', ST_Transform(ST_GeomFromText('POINT(118.78 32.04)', 4326), 2163));          
insert into cust_jw values ('南宁', ST_Transform(ST_GeomFromText('POINT(108.33 22.84)', 4326), 2163));          
insert into cust_jw values ('贵阳', ST_Transform(ST_GeomFromText('POINT(106.71 26.57)', 4326), 2163));          
insert into cust_jw values ('南昌', ST_Transform(ST_GeomFromText('POINT(115.89 28.68)', 4326), 2163));          
insert into cust_jw values ('余杭', ST_Transform(ST_GeomFromText('POINT(120.3 30.43)', 4326), 2163));     

Creating a GiST Index:

create index idx_cust_jw_1 on cust_jw using gist(jwd);          

This indexing method supports distance sorting (<->) and intersection (&&) between two geometries.

For more information, see system tables such as pg_amop, pg_am, pg_operator, and pg_opfamily.

The following SQL statement queries the straight distance (in meters) from Beijing to Hangzhou:

SELECT ST_Distance(          
ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163),          
ST_Transform(ST_GeomFromText('POINT(116.46 39.92)', 4326), 2163)          
);          
   st_distance              
------------------          
 1256521.71432098          
(1 row)          

The following SQL statement queries the coordinates in the table that are 20 km away from the point ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163). For more information about how to use the functions, see the PostGIS document.

digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);          
  dz  |                        jwd                         |   st_distance              
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
(2 rows)          
Time: 1.335 ms      

As mentioned earlier, this index-based access method supports the && and <-> operators.

digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw where jwd && ST_Buffer(ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163), 20000, 10);                                                                                                                   
                          QUERY PLAN                                                                                                          
                                                                                                                                              
                                                                                                                                              
                                                                                                                                              
                                                                                                                                              
                                                                         
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
------------------------------------------------------------------------------------------------------------------------------------          
---------------------------------------------------------------          
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..3.41 rows=1 width=548)          
   Index Cond: (jwd && '01030000207308000001000000290000004C94087DD53B54C173AA7759E8FB5D411122F50B133C54C154295A2DDAF85D41D751B134CA          
3C54C1F4F2B643DFF55D41B6BBAE74F63D54C10FB6A0650AF35D41CDDC4767903F54C1D331586C6DF05D4124855AF48D4154C14B9BC9D018EE5D41AC1BE98FE24354          
C1F4F2B6431BEC5D41E89F31897F4654C1DDD11D5181EA5D41CDDC4767544954C1FE67201155E95D412D13EB504F4C54C1383864E89DE85D414C94087D5D4F54C173          
AA775960E85D416B1526A96B5254C1383864E89DE85D41CB4BC992665554C1FE67201155E95D41B088DF703B5854C1DDD11D5181EA5D41EC0C286AD85A54C1F4F2B6          
431BEC5D4174A3B6052D5D54C14B9BC9D018EE5D41CB4BC9922A5F54C1D331586C6DF05D41E26C6285C46054C10FB6A0650AF35D41C1D65FC5F06154C1F4F2B643DF          
F55D4187061CEEA76254C154295A2DDAF85D414C94087DE56254C173AA7759E8FB5D4187061CEEA76254C1922B9585F6FE5D41C1D65FC5F06154C1F261386FF1015E          
41E26C6285C46054C1D79E4E4DC6045E41CB4BC9922A5F54C11323974663075E4174A3B6052D5D54C19BB925E2B7095E41EC0C286AD85A54C1F261386FB50B5E41B0          
88DF703B5854C10983D1614F0D5E41CB4BC992665554C1E8ECCEA17B0E5E416B1526A96B5254C1AE1C8BCA320F5E414C94087D5D4F54C173AA7759700F5E412D13EB          
504F4C54C1AE1C8BCA320F5E41CDDC4767544954C1E8ECCEA17B0E5E41E89F31897F4654C10983D1614F0D5E41AC1BE98FE24354C1F261386FB50B5E4124855AF48D          
4154C19BB925E2B7095E41CDDC4767903F54C11323974663075E41B6BBAE74F63D54C1D79E4E4DC6045E41D751B134CA3C54C1F261386FF1015E411122F50B133C54          
C1922B9585F6FE5D414C94087DD53B54C173AA7759E8FB5D41'::geometry)          
(2 rows)          
Time: 1.218 ms          

The following SQL statement enables sorting by distance.

digoal=# select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);          
  dz  |                        jwd                         |   st_distance              
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
 南京 | 0101000020730800000FFE5AD1D62653C16F4F972A10635E41 | 321491.591341196          
 南昌 | 010100002073080000B2744BA1FE5253C10975D1494AA25F41 | 503843.306221247          
 北京 | 0101000020730800006EBBB0F1AB0E4FC17207C71D44525E41 | 1256521.71432098          
 南宁 | 01010000207308000030806B3882F451C18E3F38DCBB686141 |  1409624.7420143          
 贵阳 | 01010000207308000082EA89026EE14FC1D6A3AD6E9E786141 | 1732521.31784296          
(7 rows)          
Time: 0.598 ms    

The following method forcibly enables index-based sorting.

digoal=# set enable_seqscan=off;          
SET          
Time: 0.109 ms          
digoal=# explain select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163);          
                                      QUERY PLAN                                                
--------------------------------------------------------------------------------------          
 Index Scan using idx_cust_jw_1 on cust_jw  (cost=0.14..54.44 rows=140 width=548)          
   Order By: (jwd <-> '0101000020730800004C94087D5D4F54C173AA7759E8FB5D41'::geometry)          
(2 rows)          

The following method is used for further optimization, especially when points are dense.

digoal=# select * from (select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) limit 1000) t where dist<15000;          
  dz  |                        jwd                         |       dist                 
------+----------------------------------------------------+------------------          
 杭州 | 0101000020730800004C94087D5D4F54C173AA7759E8FB5D41 |                0          
 余杭 | 0101000020730800000E6E5A20494854C121FC688DA9EF5D41 | 14483.9823187612          
(2 rows)          
Time: 0.634 ms          

Ultimate Optimization

Use a cursor for further optimization to reduce data scanning to a specified limit (provided that an index is used in SQL Order By in the FOR loop).

digoal=# do language plpgsql $$          
declare          
  v_rec record;          
  v_limit int := 1000;          
begin          
  set local enable_seqscan=off;  -- 强制索引, 因为扫描行数够就退出.          
  for v_rec in select *,ST_Distance(jwd, ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163)) AS dist from cust_jw order by jwd <-> ST_Transform(ST_GeomFromText('POINT(120.19 30.26)', 4326), 2163) loop          
    if v_limit <=0 then           
      raise notice '已经取足数据';          
      return;          
    end if;          
    if v_rec.dist > 20000 then           
      raise notice '满足条件的点已输出完毕';          
      return;          
    else          
      raise notice 'do someting, v_rec:%', v_rec;          
    end if;          
    v_limit := v_limit -1;          
  end loop;          
end;          
$$;          
NOTICE:  do someting, v_rec:(杭州,0101000020730800004C94087D5D4F54C173AA7759E8FB5D41,0)          
NOTICE:  do someting, v_rec:(余杭,0101000020730800000E6E5A20494854C121FC688DA9EF5D41,14483.9823187612)          
NOTICE:  满足条件的点已输出完毕          
DO          

This method scans at most one more line than the required result.

Code sample for optimizing the performance by using functions:

select * from 
(
select *,
  ST_Distance (ST_Transform ($1, 26986), ST_Transform (jwd, 26986) ) as dist 
  from cust_jw 
  where dz='杭州'
  order by ST_Transform (pos, 26986) <-> ST_Transform ($1, 26986) limit 200
) t
where
distance2Center <= 2000.0;


create or replace function ff(geometry, float8, int) returns setof record as $$                                                        
declare
  v_rec record;
  v_limit int := $3;
begin
  set local enable_seqscan=off;   -- 强制索引, 扫描行数够就退出.
  for v_rec in 
    select *,
    ST_Distance ( ST_Transform ($1, 26986), ST_Transform (jwd, 26986) ) as dist 
    from cust_jw 
    order by ST_Transform (jwd, 26986) <-> ST_Transform ($1, 26986)
  loop
    if v_limit <=0 then
      raise notice '已经取足数据';
      return;
    end if;
    if v_rec.dz='杭州' and v_rec.dist > $2 then
      raise notice '满足条件的点已输出完毕';
      return;
    elsif v_rec.dz='杭州' then
      raise notice 'do someting, v_rec:%', v_rec;
      return next v_rec;
    else
      NULL;
    end if;
    v_limit := v_limit -1;
  end loop;
end;
$$ language plpgsql strict volatile;
  
select * from ff(ST_GeomFromText ('POINT(114.111618652344 28.332331814237)', 4326),2000.0,1) as t(dz varchar,jwd geometry,dist float8);

SRID

To convert the SRID, use an expression index, for example, ST_Transform (pos, 26986).

References

1) http://www.ximizi.com/jingweidu.php
2) http://postgis.net/docs/manual-2.0/ST_Distance_Sphere.html
3) http://postgis.net/docs/manual-2.0/ST_Buffer.html
4) http://postgis.net/docs/manual-2.0/ST_Transform.html
5) http://postgis.net/docs/manual-2.0/ST_GeomFromText.html
6) http://postgis.net/docs/manual-2.0/geometry_distance_centroid.html

0 0 0
Share on

digoal

224 posts | 16 followers

You may also like

Comments