×
Community Blog GIS-based Grid Operations of New Retail Merchants

GIS-based Grid Operations of New Retail Merchants

In today's retail industry, data must be used to its greatest value if online and offline business are to be integrated and inventories optimized.

Background

Jack Ma once said,

"The era of pure E-commerce will soon come to an end. In the next ten or twenty years, there won't be E-commerce, just new retail. That is to say that online and offline business has to be integrated with logistics to create a new retail model."

Here, online business refers to cloud platforms and offline business refers to stores or manufacturers. New logistics will eliminate inventories to reduce the volume of stock-ups.

The disappearance of E-commerce platforms means the decentralization of existing E-commerce platforms, where merchants will possess their own independent platforms and no longer settle their business on large-scale platforms like Tmall, JD.com, and Amazon. For example, if everyone owns a store on an E-commerce platform for centralized sales, the sales are restricted.

Data must be used to its greatest value if online and offline business are to be integrated and inventories eliminated. For example, it can be used to predict sales volume ahead of time. Connecting online and offline business also brings new challenges to data operation, and leads to gridded operation based on geographic locations.

1

Requirements

1.  Support quick data search based on geographic locations (GIS)

2.  Support analysis and exploration of massive sales data

Architecture design

2

1.  Sales data usually enters the Alibaba Cloud HybridDB for PostgreSQL database concurrently by means of OSS.

2.  The RDS PostgreSQL is responsible for online transaction processing and selecting merchants through gridded operation.

3.  The ETL program is responsible for data scheduling.

4.  The BI application is connected to HDB for PG and PG. Its drive and syntax are also compatible with PostgreSQL.

5.  HybridDB for PostgreSQL provides basic high-availability and backup functions, and also provides a function for one-click storage resizing. Users do not need to worry about performance stress from future data increases.

6.  HDB PG and RDS PG can transparently access (read/write) OSS data using an OSS_EXT external table plugin. The OSS provides massive shared storage, which allows RDS PG and HDB PG to share data. The OSS can also be used as an external source of massive data and is concurrently imported to an Express Connect of HDB PG. The OSS can also be used as cold data storage for RDS PG and HDB PG.

DEMO and Performance

Gridded Merchant Search

Built-in Geometry Type, Gridded Merchant Search, Testing

Using built-in polygon, box, circle and point types, GiST spatial indexes, KNN sorting operator <->, and operator @>.

1.  Construct 100 million pieces of merchant geographic location data

postgres=# create table pos(id int, pos point);    
CREATE TABLE    
postgres=# insert into pos select generate_series(1,100000000), point(5000-random()*10000, 5000-random()*10000);    
INSERT 0 10000000    
postgres=# select * from pos limit 10;    
 id |                  pos                      
----+---------------------------------------    
  1 | (603.396683000028,3740.25050085038)    
  2 | (4177.6926163584,4295.85348349065)    
  3 | (-2897.50102907419,4393.90230923891)    
  4 | (-2756.50105439126,2930.08491862565)    
  5 | (-1679.21951506287,-2329.10942286253)    
  6 | (2323.99420812726,-4727.32939757407)    
  7 | (-1572.33712729067,-3614.81220461428)    
  8 | (-1383.57343617827,312.93470878154)    
  9 | (-2942.08695180714,4876.54477357864)    
 10 | (-2387.8013016656,-141.320424154401)    
(10 rows)    

2.  Create spatial indexes

postgres=# create index idx_pos on pos using gist(pos);    

3.  Create a query optimization function

Input a polygon and return the merchants that fall into its bounds.

select * from pos where polygon('((10,2),(-10,-100),(0,10))') @> pos;    

If a spatial query with other conditions is required, a spatial composite partial index can be used, for example

create index idx_pos on pos using gist(pos) where shard condition 1;    
...    
create index idx_pos on pos using gist(pos) where shard condition n;    

4.  Spatial index performance verification, where a grid-based query of 100 million pieces of data takes about 0.8 ms.

postgres=# explain (analyze,verbose,timing,costs,buffers)     
    
select * from pos where polygon('((10,2),(-10,-100),(0,10))') @> pos;    
    
                                                            QUERY PLAN                                                                 
-----------------------------------------------------------------------------------------------------------------------------------    
 Index Scan using idx_pos on postgres.pos  (cost=0.42..123470.72 rows=100000 width=20) (actual time=0.099..0.737 rows=618 loops=1)    
   Output: id, pos    
   Index Cond: ('((10,2),(-10,-100),(0,10))'::polygon @> pos.pos)    
   Buffers: shared hit=660    
 Planning time: 0.031 ms    
 Execution time: 0.778 ms    
(6 rows)    
    
postgres=# select * from pos where polygon('((10,2),(-10,-100),(0,10))') @> pos;    
    id    |                   pos                        
----------+------------------------------------------    
 14028137 | (-9.47874505072832,-94.8515953496099)    
 43891480 | (-9.1992225497961,-92.9797394201159)    
  1247175 | (-0.888188369572163,-28.0744722113013)    
  4631961 | (-0.548232346773148,-31.1226723715663)    
  5458615 | (-1.67813152074814,-29.4832326471806)    
  6057261 | (-0.965241342782974,-24.8730508610606)    
  ......    
 72818882 | (-0.214213505387306,-38.5544309392571)    
 84374336 | (-0.350810587406158,-38.3379962295294)    
 93014418 | (1.69238075613976,-38.5063700377941)    
 94375565 | (-0.0325776636600494,-43.1329058483243)    
(618 rows)

PostGIS Spatial Database, Gridded Merchant Search, Testing

In practice, longitude and latitude values are usually stored in a PostGIS spacial database. We will use the built-in polygon types during testing for the sake of convenience.

There are two gridded merchant search functions that require the use of PostGIS

http://postgis.net/docs/manual-2.3/ST_Within.html

1.  ST_within

ST_Within - Returns true if the geometry A is completely inside geometry B

boolean ST_Within(geometry A, geometry B);

Returns TRUE if geometry A is completely inside geometry B. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. It is a given that if ST_Within(A,B) is true and ST_Within(B,A) is true, then the two geometries are considered spatially equal.

This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Within.

-- a circle within a circle  
SELECT ST_Within(smallc,smallc) As smallinsmall,  
    ST_Within(smallc, bigc) As smallinbig,  
    ST_Within(bigc,smallc) As biginsmall,  
    ST_Within(ST_Union(smallc, bigc), bigc) as unioninbig,  
    ST_Within(bigc, ST_Union(smallc, bigc)) as biginunion,  
    ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion  
FROM  
(  
SELECT ST_Buffer(ST_GeomFromText('POINT(50 50)'), 20) As smallc,  
    ST_Buffer(ST_GeomFromText('POINT(50 50)'), 40) As bigc) As foo;  
-- Result  
 smallinsmall | smallinbig | biginsmall | unioninbig | biginunion | bigisunion  
--------------+------------+------------+------------+------------+------------  
 t            | t          | f          | t          | t          | t  
(1 row)  

2.  ST_Contains

ST_Contains - Returns true if and only if no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A.

boolean ST_Contains(geometry geomA, geometry geomB);

Returns TRUE if geometry B is completely inside geometry A. For this function to make sense, the source geometries must both be of the same coordinate projection, having the same SRID. ST_Contains is the inverse of ST_Within. So ST_Contains(A,B) implies ST_Within(B,A) except in the case of invalid geometries where the result is always false regardless or not defined.

This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries. To avoid index use, use the function _ST_Contains.

-- A circle within a circle  
SELECT ST_Contains(smallc, bigc) As smallcontainsbig,  
       ST_Contains(bigc,smallc) As bigcontainssmall,  
       ST_Contains(bigc, ST_Union(smallc, bigc)) as bigcontainsunion,  
       ST_Equals(bigc, ST_Union(smallc, bigc)) as bigisunion,  
       ST_Covers(bigc, ST_ExteriorRing(bigc)) As bigcoversexterior,  
       ST_Contains(bigc, ST_ExteriorRing(bigc)) As bigcontainsexterior  
FROM (SELECT ST_Buffer(ST_GeomFromText('POINT(1 2)'), 10) As smallc,  
             ST_Buffer(ST_GeomFromText('POINT(1 2)'), 20) As bigc) As foo;  
  
-- Result  
  smallcontainsbig | bigcontainssmall | bigcontainsunion | bigisunion | bigcoversexterior | bigcontainsexterior  
------------------+------------------+------------------+------------+-------------------+---------------------  
 f                | t                | t                | t          | t        | f  
  
-- Example demonstrating difference between contains and contains properly  
SELECT ST_GeometryType(geomA) As geomtype, ST_Contains(geomA,geomA) AS acontainsa, ST_ContainsProperly(geomA, geomA) AS acontainspropa,  
   ST_Contains(geomA, ST_Boundary(geomA)) As acontainsba, ST_ContainsProperly(geomA, ST_Boundary(geomA)) As acontainspropba  
FROM (VALUES ( ST_Buffer(ST_Point(1,1), 5,1) ),  
             ( ST_MakeLine(ST_Point(1,1), ST_Point(-1,-1) ) ),  
             ( ST_Point(1,1) )  
      ) As foo(geomA);  
  
  geomtype    | acontainsa | acontainspropa | acontainsba | acontainspropba  
--------------+------------+----------------+-------------+-----------------  
ST_Polygon    | t          | f              | f           | f  
ST_LineString | t          | f              | f           | f  
ST_Point      | t          | t              | f           | f  

3

4

Meanwhile, we often also need GiST spatial indexes. KNN sorting operator <->, functions to determine the center of an excircle, and a data construction function ST_PointFromText may also be necessary. For details, see the postgis manual

http://postgis.net/docs/manual-2.3/reference.html

1.  Create a table and spatial indexes

postgres=# create table pos(id int, pos geometry);  
CREATE TABLE  
postgres=# create index idx_pos on pos using gist(pos);  

2.  Construct 100 million pieces of test data

postgres=# insert into pos select generate_series(1,100000000), ST_PointFromText('POINT('||180-random()*180||' '||90-random()*90||')', 4326);  
INSERT 0 100000000  
  
postgres=# select id,st_astext(pos) from pos limit 10;  
 id |                 st_astext                   
----+-------------------------------------------  
  1 | POINT(33.1504055019468 0.432478752918541)  
  2 | POINT(21.6662147920579 76.3528884295374)  
  3 | POINT(23.3734973240644 72.9692681785673)  
  4 | POINT(24.6665199659765 8.37537375278771)  
  5 | POINT(42.0769318845123 56.6206424776465)  
  6 | POINT(151.37722584419 81.2602719990537)  
  7 | POINT(137.438789913431 52.9424488730729)  
  8 | POINT(36.6096187848598 87.09903978277)  
  9 | POINT(3.17187242209911 50.9399658115581)  
 10 | POINT(177.163629597053 49.6609620703384)  
(10 rows)  

3.  Gridded merchant search and query

select id,st_astext(pos) from pos where st_within(  
  pos,   
  ST_PolygonFromText('POLYGON((10 10, 20 10, 15 15, 10 10))', 4326)   
);  
  
   id   |                st_astext                   
--------+------------------------------------------  
 117850 | POINT(19.6388734783977 10.0914861587808)  
 447534 | POINT(19.6453922521323 10.0930827856064)  
 735712 | POINT(19.5879830047488 10.1306327059865)  
 828120 | POINT(19.6418435219675 10.054949526675)  
 965836 | POINT(19.5668494608253 10.052738590166)  
  45480 | POINT(18.0746335722506 10.0232297228649)  
  65043 | POINT(19.3460685387254 10.1494021341205)  
......  
 981674 | POINT(16.9359557982534 10.033694235608)  
 998555 | POINT(15.9493325371295 10.1035685883835)  
 999472 | POINT(14.3828116636723 10.001640371047)  
(1536 rows)  
Time: 11.678 ms  

Execute the plan that uses spatial indexes and also involves partial filtering (a process similar to one that I will mention later. Essentially an excircle is generated and results are output according to distance. Points that fall outside the polygon are filtered out.)

postgres=# explain (analyze,verbose,timing,costs,buffers) select id,st_astext(pos) from pos where st_within(  
pos,   
ST_PolygonFromText('POLYGON((10 10, 20 10, 15 15, 10 10))', 4326)   
);  
                                                                                                   QUERY PLAN                                                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx_pos on public.pos  (cost=0.29..1219.62 rows=333 width=36) (actual time=0.143..12.563 rows=1536 loops=1)  
   Output: id, st_astext(pos)  
   Index Cond: ('0103000020E6100000010000000400000000000000000024400000000000002440000000000000344000000000000024400000000000002E400000000000002E4000000000000024400000000000002440'::geometry ~ pos.pos)  
   Filter: _st_contains('0103000020E6100000010000000400000000000000000024400000000000002440000000000000344000000000000024400000000000002E400000000000002E4000000000000024400000000000002440'::geometry, pos.pos)  
   Rows Removed by Filter: 1611  
   Buffers: shared hit=3151  
 Planning time: 0.394 ms  
 Execution time: 12.688 ms  
(8 rows)  

4.  Query breakdown

4.1 Find the smallest possible circle that contains the polygon (in fact a polygon contains multiple segments)

select st_astext(ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text));  
  
POLYGON((20 10,19.9973229373818 9.83640458589112,19.989294616193 9.67298435384929,19.975923633361 9.5099142983522,19.9572243068691 9.34736903889974,19.9332166604244 9.18552263302706,19.9039264020162 9.02454838991936,19.8693848963867 8.8  
6461868482814,19.8296291314453 8.7059047744874,19.784701678661 8.54857661372769,19.7346506474755 8.39280267348419,19.6795296337866 8.23874976039383,19.6193976625564 8.08658283817455,19.5543191246059 7.93646485097803,19.4843637076634 7.78  
8556548905,19.4096063217418 7.64301631587001,19.3301270189222 7.5,19.2460109076329 7.35966074674816,19.1573480615127 7.22214883490199,19.0642334229581 7.08761151566099,18.9667667014562 6.9561928549564,18.8650522668137 6.82803357918178,18  
.7591990373949 6.70327092449966,18.6493203634892 6.58203848988565,18.5355339059327 6.46446609406727,18.4179615101144 6.35067963651082,18.2967290755003 6.24080096260512,18.1719664208182 6.13494773318632,18.0438071450436 6.03323329854383,1  
7.912388484339 5.93576657704193,17.777851165098 5.84265193848728,17.6403392532518 5.75398909236711,17.5 5.66987298107781,17.35698368413 5.59039367825823,17.211443451095 5.51563629233656,17.063535149022 5.44568087539412,16.9134171618255 5  
.38060233744357,16.7612502396062 5.32047036621337,16.6071973265158 5.26534935252447,16.4514233862723 5.21529832133896,16.2940952255126 5.17037086855466,16.1353813151719 5.13061510361333,15.9754516100806 5.09607359798385,15.814477366973 5  
.06678333957561,15.6526309611003 5.04277569313095,15.4900857016478 5.02407636663902,15.3270156461507 5.01070538380698,15.1635954141089 5.00267706261817,15 5,14.8364045858911 5.00267706261817,14.6729843538493 5.01070538380698,14.509914298  
3522 5.02407636663902,14.3473690388998 5.04277569313095,14.1855226330271 5.0667833395756,14.0245483899194 5.09607359798385,13.8646186848281 5.13061510361333,13.7059047744874 5.17037086855466,13.5485766137277 5.21529832133895,13.392802673  
4842 5.26534935252447,13.2387497603938 5.32047036621337,13.0865828381746 5.38060233744356,12.936464850978 5.44568087539412,12.788556548905 5.51563629233655,12.64301631587 5.59039367825822,12.5 5.6698729810778,12.3596607467482 5.753989092  
3671,12.222148834902 5.84265193848727,12.087611515661 5.93576657704192,11.9561928549564 6.03323329854382,11.8280335791818 6.13494773318631,11.7032709244997 6.24080096260511,11.5820384898856 6.35067963651082,11.4644660940673 6.46446609406  
726,11.3506796365108 6.58203848988564,11.2408009626051 6.70327092449966,11.1349477331863 6.82803357918177,11.0332332985438 6.9561928549564,10.9357665770419 7.08761151566099,10.8426519384873 7.22214883490199,10.7539890923671 7.35966074674  
817,10.6698729810778 7.50000000000001,10.5903936782582 7.64301631587002,10.5156362923366 7.788556548905,10.4456808753941 7.93646485097804,10.3806023374436 8.08658283817456,10.3204703662134 8.23874976039384,10.2653493525245 8.392802673484  
2,10.215298321339 8.5485766137277,10.1703708685547 8.70590477448741,10.1306151036133 8.86461868482815,10.0960735979838 9.02454838991937,10.0667833395756 9.18552263302707,10.0427756931309 9.34736903889976,10.024076366639 9.50991429835222,  
10.010705383807 9.6729843538493,10.0026770626182 9.83640458589114,10 10,10.0026770626182 10.1635954141089,10.010705383807 10.3270156461507,10.024076366639 10.4900857016478,10.042775693131 10.6526309611003,10.0667833395756 10.814477366973  
,10.0960735979839 10.9754516100807,10.1306151036133 11.1353813151719,10.1703708685547 11.2940952255126,10.215298321339 11.4514233862723,10.2653493525245 11.6071973265158,10.3204703662134 11.7612502396062,10.3806023374436 11.9134171618255  
,10.4456808753941 12.063535149022,10.5156362923366 12.211443451095,10.5903936782582 12.35698368413,10.6698729810778 12.5,10.7539890923671 12.6403392532519,10.8426519384873 12.777851165098,10.9357665770419 12.912388484339,11.0332332985438  
 13.0438071450436,11.1349477331863 13.1719664208183,11.2408009626051 13.2967290755004,11.3506796365109 13.4179615101144,11.4644660940673 13.5355339059328,11.5820384898857 13.6493203634892,11.7032709244997 13.7591990373949,11.828033579181  
8 13.8650522668137,11.9561928549564 13.9667667014562,12.087611515661 14.0642334229581,12.222148834902 14.1573480615128,12.3596607467482 14.2460109076329,12.5 14.3301270189222,12.6430163158701 14.4096063217418,12.788556548905 14.484363707  
6635,12.9364648509781 14.5543191246059,13.0865828381746 14.6193976625565,13.2387497603939 14.6795296337866,13.3928026734842 14.7346506474755,13.5485766137277 14.7847016786611,13.7059047744875 14.8296291314454,13.8646186848282 14.86938489  
63867,14.0245483899194 14.9039264020162,14.1855226330271 14.9332166604244,14.3473690388998 14.9572243068691,14.5099142983523 14.975923633361,14.6729843538493 14.989294616193,14.8364045858912 14.9973229373818,15.0000000000001 15,15.163595  
4141089 14.9973229373818,15.3270156461508 14.989294616193,15.4900857016479 14.975923633361,15.6526309611003 14.957224306869,15.814477366973 14.9332166604244,15.9754516100807 14.9039264020161,16.1353813151719 14.8693848963867,16.294095225  
5127 14.8296291314453,16.4514233862724 14.784701678661,16.6071973265159 14.7346506474755,16.7612502396062 14.6795296337866,16.9134171618255 14.6193976625564,17.063535149022 14.5543191246058,17.2114434510951 14.4843637076634,17.3569836841  
301 14.4096063217417,17.5000000000001 14.3301270189222,17.6403392532519 14.2460109076329,17.7778511650981 14.1573480615127,17.9123884843391 14.064233422958,18.0438071450437 13.9667667014561,18.1719664208183 13.8650522668136,18.2967290755  
004 13.7591990373948,18.4179615101144 13.6493203634891,18.5355339059328 13.5355339059327,18.6493203634892 13.4179615101143,18.7591990373949 13.2967290755003,18.8650522668137 13.1719664208182,18.9667667014562 13.0438071450435,19.064233422  
9581 12.9123884843389,19.1573480615128 12.7778511650979,19.2460109076329 12.6403392532518,19.3301270189222 12.4999999999999,19.4096063217418 12.3569836841299,19.4843637076635 12.2114434510949,19.5543191246059 12.0635351490219,19.61939766  
25565 11.9134171618254,19.6795296337867 11.7612502396061,19.7346506474756 11.6071973265157,19.7847016786611 11.4514233862722,19.8296291314454 11.2940952255125,19.8693848963867 11.1353813151718,19.9039264020162 10.9754516100805,19.9332166  
604244 10.8144773669728,19.9572243068691 10.6526309611002,19.975923633361 10.4900857016477,19.989294616193 10.3270156461506,19.9973229373818 10.1635954141088,20 10))  

4.2 Find the center of the smallest circle containing the polygon

select st_astext(ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text));  
  
         st_astext            
----------------------------  
 POINT(15 11.6666666666667)  
(1 row)  

4.3 Solve the center and radius of the smallest circle containing the polygon (function introduced in PostGIS 2.3)

SELECT ST_AsText(center), radius FROM ST_MinimumBoundingRadius('POLYGON((26426 65078,26531 65242,26075 65136,26096 65427,26426 65078))');  
  
                st_astext                 |      radius  
------------------------------------------+------------------  
 POINT(26284.8418027133 65267.1145090825) | 247.436045591407  

4.4 The earlier version can be used to find the radius of the circle. The following are the steps to breaking it down

1. Excircle  
  
ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)  
  
2. The contour of the excircle  
  
ST_Boundary(ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text))  
  
3. The center of the excircle  
  
ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)  
  
4. The point closest to the center of the excircle on the contour of the excircle  
  
ST_ClosestPoint(ST_Boundary(ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)), ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text))  
  
5. The point closest to the center of the excircle on the contour of the excircle <-> The distance (that is, radius) of the center of the circle  
  
ST_Distance(  
  ST_ClosestPoint(ST_Boundary(ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)), ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text))  
  ,  
  ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)  
)  

4.5 Distance sorting, the size of the cut-off radius, and filtering out points not within the polygon

postgres=# select id,st_astext(pos),pos<->ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) as dis from pos order by pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) limit 10;  
   id   |                st_astext                 |        dis           
--------+------------------------------------------+--------------------  
 690722 | POINT(15.0562715157866 11.711938586086)  | 0.0722219442478186  
  65270 | POINT(15.1074255164713 11.6899066697806) |  0.109910986215585  
 731760 | POINT(14.8607909493148 11.6321958834305) |  0.143413762872333  
 312402 | POINT(15.1385483611375 11.7407551081851) |  0.157113676140352  
 498870 | POINT(14.8981332499534 11.7865430982783) |   0.15731333481142  
 508068 | POINT(14.9112858809531 11.5105070360005) |  0.179598833059301  
 357440 | POINT(15.1979697681963 11.6899668937549) |  0.199335886908595  
 802618 | POINT(14.7933903057128 11.6609365912154) |  0.206689158005409  
 857485 | POINT(15.0578639935702 11.4632821781561) |  0.211455356859305  
 395057 | POINT(15.2343154605478 11.6581913502887) |  0.234469065270423  
(10 rows)  
Time: 0.750 ms  
  
postgres=# explain select id,st_astext(pos),pos<->ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) as dis from pos order by pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) limit 10;  
                                     QUERY PLAN                                       
------------------------------------------------------------------------------------  
 Limit  (cost=0.29..0.66 rows=10 width=36)  
   ->  Index Scan using idx_pos on pos  (cost=0.29..37485.29 rows=1000000 width=36)  
         Order By: (pos <-> '01010000000000000000002E405655555555552740'::geometry)  
(3 rows)  
  
Time: 0.663 ms  
  
select id, st_astext(pos), pos<->ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) as dis   
from pos   
where   
pos<->ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)   
<=   
ST_Distance(  
  ST_ClosestPoint(ST_Boundary(ST_MinimumBoundingCircle('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)), ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text))  
  ,  
  ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)  
)  
and  
st_within(  
  pos,   
  ST_PolygonFromText('POLYGON((10 10, 20 10, 15 15, 10 10))', 4326)   
)  
order by pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text)   
;  
  
   id   |                st_astext                 |        dis           
--------+------------------------------------------+--------------------  
 690722 | POINT(15.0562715157866 11.711938586086)  | 0.0722219442478186  
  65270 | POINT(15.1074255164713 11.6899066697806) |  0.109910986215585  
 731760 | POINT(14.8607909493148 11.6321958834305) |  0.143413762872333  
 ......  
 307780 | POINT(18.1456628255546 10.7256762916222) |   3.28339251039349  
 220569 | POINT(18.2607466075569 11.2290304061025) |   3.28998316913906  
 651843 | POINT(18.1355669908226 10.6703097978607) |   3.29006167141813  
 333919 | POINT(11.7259097937495 11.2818037485704) |   3.29663215368952  
 653102 | POINT(18.2955473475158 11.3890661671758) |    3.3072185623031  
 143163 | POINT(18.2968946546316 11.3691051676869) |   3.31029507214487  
 128755 | POINT(12.0493835303932 10.1579119032249) |   3.31398258174282  
 803774 | POINT(11.9626270607114 10.3377026785165) |   3.31538507246702  
 477386 | POINT(18.3217689581215 11.5706447605044) |   3.32315726274045  
 636124 | POINT(18.2356625888497 10.8860507654026) |   3.32849567354729  
 800873 | POINT(18.2074238732457 10.7637690240517) |   3.33208531471695  
 443193 | POINT(18.3322164136916 11.6595554212108) |   3.33222575678116  
(1360 rows)  
  
Time: 16.899 ms  

Compared to original methods

postgres=# select id,st_astext(pos),pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) as dist  from pos where st_within(  
  pos,   
  ST_PolygonFromText('POLYGON((10 10, 20 10, 15 15, 10 10))', 4326)   
) order by pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) ;  
   id   |                st_astext                 |        dist          
--------+------------------------------------------+--------------------  
 690722 | POINT(15.0562715157866 11.711938586086)  | 0.0722219442478186  
  65270 | POINT(15.1074255164713 11.6899066697806) |  0.109910986215585  
 731760 | POINT(14.8607909493148 11.6321958834305) |  0.143413762872333  
 312402 | POINT(15.1385483611375 11.7407551081851) |  0.157113676140352  
 498870 | POINT(14.8981332499534 11.7865430982783) |   0.15731333481142  
。。。。。。  
 874126 | POINT(19.5105647295713 10.1697491202503) |   4.75246676154538  
 205780 | POINT(19.5122638251632 10.1721187261865) |   4.75333557456583  
 470466 | POINT(10.4844846390188 10.0222505908459) |   4.80562042343902  
  46089 | POINT(10.3699134103954 10.2971605863422) |   4.82837948383337  
 368116 | POINT(10.3925226721913 10.206622495316)  |   4.83327750755081  
 735712 | POINT(19.5879830047488 10.1306327059865) |   4.83828273014706  
 965836 | POINT(19.5668494608253 10.052738590166)  |   4.84364228928311  
 890979 | POINT(19.6381107252091 10.1740973582491) |   4.87235434260042  
 117850 | POINT(19.6388734783977 10.0914861587808) |   4.89901466522263  
 447534 | POINT(19.6453922521323 10.0930827856064) |    4.9046758233415  
 223530 | POINT(10.3186767641455 10.1891682296991) |   4.90895020589323  
 828120 | POINT(19.6418435219675 10.054949526675)  |   4.91368787446771  
 500541 | POINT(19.6874961443245 10.17231578473)   |   4.91992923779193  
(1536 rows)  
Time: 14.855 ms  

The quantity of records is slightly different due to distortion caused by the fact that when a circle is generated through conversion, what it creates is actually a polygon with 48 segment lines. For details, see the ST_MinimumBoundingCircle function

PostGIS Test for HybridDB for PostgreSQL

postgres=# create table pos(id int, pos geometry);  
postgres=# insert into pos select id, ST_PointFromText('POINT('||180-random()*180||' '||90-random()*90||')', 4326) from generate_series(1,1000000) t(id);  
postgres=# create index idx_pos on pos using gist(pos);  

GPDB does not currently support KNN SORT and KNN MERGE SORT for GIST indexes.

So we can see an extra row of external nodes.

But it doesn't matter as long as there aren't too many results, otherwise the sort will reach a bottleneck, because st_within still can use the spatial indexes.

postgres=# explain analyze select id,st_astext(pos),pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) as dist  from pos where st_within(  
  pos,   
  ST_PolygonFromText('POLYGON((10 10, 20 10, 15 15, 10 10))', 4326)   
) order by pos <-> ST_Centroid('POLYGON((10 10, 20 10, 15 15, 10 10))'::text) ;  
                                                                                                       QUERY PLAN                                                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 2:1  (slice1; segments: 2)  (cost=600.91..600.92 rows=3 width=36)  
   Merge Key: dist  
   Rows out:  1563 rows at destination with 9.586 ms to first row, 10 ms to end, start offset by 0.223 ms.  
   ->  Sort  (cost=600.91..600.92 rows=2 width=36)  
         Sort Key: dist  
         Rows out:  Avg 781.5 rows x 2 workers.  Max 814 rows (seg0) with 7.203 ms to first row, 7.271 ms to end, start offset by 1.709 ms.  
         Executor memory:  145K bytes avg, 145K bytes max (seg0).  
         Work_mem used:  145K bytes avg, 145K bytes max (seg0). Workfile: (0 spilling, 0 reused)  
         ->  Index Scan using idx_pos on pos  (cost=0.00..600.89 rows=2 width=36)  
               Index Cond: pos && '0103000020E6100000010000000400000000000000000024400000000000002440000000000000344000000000000024400000000000002E400000000000002E4000000000000024400000000000002440'::geometry  
               Filter: _st_contains('0103000020E6100000010000000400000000000000000024400000000000002440000000000000344000000000000024400000000000002E400000000000002E4000000000000024400000000000002440'::geometry, pos)  
               Rows out:  Avg 781.5 rows x 2 workers.  Max 814 rows (seg0) with 0.144 ms to first row, 6.741 ms to end, start offset by 1.721 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 203K bytes.  
   (slice1)    Executor memory: 442K bytes avg x 2 workers, 442K bytes max (seg0).  Work_mem: 145K bytes max.  
 Statement statistics:  
   Memory used: 2047000K bytes  
 Settings:  effective_cache_size=8GB; enable_bitmapscan=off; enable_seqscan=off; gp_statistics_use_fkeys=on  
 Optimizer status: legacy query optimizer  
 Total runtime: 10.533 ms  
(20 rows)  

Data Analysis Performance

How is data analysis performance?

Here is a set of standalone for 1 terabyte of TPC-H test data. HybridDB for PostgreSQL is an MPP distributed database and can linearly improve the performance by increasing nodes.

5

Technical Points

1.  Spatial indexes: GiST indexes are spatial indexes that are unique to PostgreSQL. They support precise distance index searches, and can return results with distance sorting. It has also become the first choice of many scientific research institutions and spatial services due to its excellent performance, .

2.  KNN query: output records according to distance from near to far

3.  OSS external table: a function added to Alibaba Cloud RDS PG and HDB PG, which is connected to a cloud massive OSS and can transparently read/write files in the OSS in the form of an external table. Each thread can achieve a read/write bandwidth of 30 MB/s. This allows use to increase throughput by utilizing concurrency.

4.  ETL: an ETL program on the cloud or owned by the use Any program can be used as long as it supports OSS object connections using a protocol supported by PostgreSQL.

5.  MADlib: an open source machine learning library that supports most learning libraries and is capable of implementing machine learning using RDS PG and HDB PG SQL interfaces.
MADlib supports many mining modes such as Classification, Regression, Clustering, Topic Modeling, Association Rule Mining, Descriptive Statistics, and Validation.

6

http://madlib.incubator.apache.org/product.html

Madlib Manual

6.  Geometric knowledge

Incircle of a polygon, circle(polygon)

7

BOX and excircle of a polygon, circle(box(polygon))

8

The geometric functions of PostgreSQL are as follows

https://www.postgresql.org/docs/9.6/static/functions-geometry.html

The geometric functions of PostGIS are as follows

http://postgis.net/docs/manual-2.3/reference.html

7.  The polygon search mentioned in the performance testing above is a PG 10 test. If you encounter performance problems concerning spatial indexes in older versions, you can use the following methods to optimize the operation.

First, convert the polygon into a BOX, then calculate the excircle of the BOX, return records according to KNN index order, and meanwhile filter data contained in the polygon.

create or replace function ff(polygon) returns setof record as 
$$
                                                            
declare    
  v_rec record;    
  cir circle := circle(box($1)); -- diffusion boundary    
  dist float8 := radius(circle(box($1))); -- to find the radius of the outer circle of the polygon    
  centrid point := point(circle(box($1))); -- to find the center point of the outer circle of the polygon    
begin    
  set local enable_seqscan=off; -- enforce spatial indexes and KNN searches    
  for v_rec in     
    select * from pos order by pos <-> centrid     
  loop    
    if not cir @> v_rec.pos then    
      return;    
    elsif ($1 @> v_rec.pos) then    
      return next v_rec;    
    end if;    
  end loop;    
  return;    
end;    

$$
 language plpgsql strict volatile;    
postgres=# select * from ff(polygon('((10,2),(-10,-100),(0,10))')) as t(id int, pos point);    
    id    |                   pos                        
----------+------------------------------------------    
 36646218 | (-0.0167591497302055,-45.0508715584874)    
 42498944 | (0.139414332807064,-44.4842409342527)    
 83455402 | (-0.350065529346466,-44.2021945491433)    
 ......    
 10828319 | (2.18123663216829,7.54482112824917)    
 70772435 | (2.13983003050089,8.06822907179594)    
 79346114 | (2.12917104363441,8.25083814561367)    
(618 rows)    

Cloud Products

Alibaba Cloud RDS PostgreSQL

Alibaba Cloud HybridDB for PostgreSQL

Alibaba Cloud OSS

Summary

In the new retail industry, data must be used to its greatest value if online and offline business are to be integrated and inventories eliminated. For example, it can be used to predict sales volume ahead of time. Connecting online and offline business also brings new challenges to data operation, and leads to gridded operation based on geographic locations.

Databases are required to be equipped with:

1.  Support for quick data search based on geographic locations (GIS)

2.  Support for analysis and exploration of massive sales data

9

Through Alibaba Cloud RDS PostgreSQL, HybridDB for PostgreSQL, and OSS, a response can be returned in 1 ms for 100 million pieces of geographic location data, and requirements can be met for every step in the data chain, from analysis to mining.

1.  Sales data usually enters the Alibaba Cloud HybridDB for PostgreSQL database concurrently by means of OSS.

2.  The RDS PostgreSQL is responsible for online transaction processing and selecting merchants through gridded operation.

3.  The ETL program is responsible for data scheduling.

4.  The BI application is connected to HDB for PG and PG. Its drive and syntax are also compatible with PostgreSQL.

5.  HybridDB for PostgreSQL provides basic high-availability and backup functions, and also provides a function for one-click storage resizing. Users do not need to worry about performance stress from future data increases.

6.  HDB PG and RDS PG can transparently access (read/write) OSS data using an OSS_EXT external table plugin. The OSS provides massive shared storage, which allows RDS PG and HDB PG to share data. The OSS can also be used as an external source of massive data and is concurrently imported to an Express Connect of HDB PG. The OSS can also be used as cold data storage for RDS PG and HDB PG.

7.  For spatial data, in addition to using the GiST index, we also provided a more cost-effective index, the BRIN index, which is excellent for filtering data structured by st_geohash.

References

https://www.postgresql.org/docs/9.6/static/functions-geometry.html

http://postgis.net/docs/manual-2.3/reference.html

0 0 0
Share on

digoal

281 posts | 24 followers

You may also like

Comments