By Digoal
The logistics industry has a very high demand for geolocation data processing. For example:
1) The locations of couriers and delivery trucks need to be tracked in real time. This requires a high write performance of the database.
2) For parcels requiring same-day delivery, nearby couriers need to be recalled in real time based on parcel locations.
3) Massive real-time location information is generated and needs to be stored for data analysis. Therefore, cost-effective storage is required, such as Object Storage Service (OSS). Besides, the storage needs to be integrated with the database or analytic database products.
Alibaba Cloud's PostgreSQL + AnalyticDB for PostgreSQL + OSS solution can meet such requirements. The solution details are as follows.
The following describes this solution in the logistics distribution scenario as an example.
Couriers: millions
Courier location update interval: 5 seconds
A courier works from 07:00 to 19:00 (12 hours) every day.
8640 records are generated for each courier every day.
8.64 billion records are generated for all couriers over the entire network every day.
1) Draw the tracks of couriers in real time.
2) Recall couriers in real time.
3) Deliver parcels requiring same-day delivery.
The hash operation divides the primary table into 128 partition tables by courier ID.
Without partitioning, a single table is capable of storing 8.64 billion records. However, it is time-consuming to export the records to OSS whenever required.
On the other hand, it is convenient for expansion.
create table cainiao (
uid int, -- 快递员ID
pos point, -- 快递员位置
crt_date date, -- 日期
crt_time time(0) -- 时间
);
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
sql := format( 'create table %I (like cainiao)' , 'cainiao_'||i );
execute sql;
end loop;
end;
$$;
A sub table is created on a daily basis throughout the week. This aims to facilitate maintenance, and the tables are directly truncated after being exported to OSS. There are seven sub tables in total.
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
for x in 0..6
loop
sql := format( 'create table %I (like cainiao)' , 'cainiao_'||i||'_'||x );
execute sql;
end loop;
end loop;
end;
$$;
ApsaraDB RDS for PostgreSQL provides the oss_ext plug-in for writing data into OSS. ApsaraDB RDS for PostgreSQL can also read data from OSS using external tables, which is transparent to users.
For more information, see https://www.alibabacloud.com/help/doc-detail/44461.htm
PostgreSQL 10.0 has built-in partition tables. Therefore, read/write operations are directly performed in the primary table for the foregoing partitions.
For PostgreSQL 9.5 or later versions, we recommend that you use the pg_pathman plug-in for table partitioning.
Sample table partitioning:
create table cainiao (
uid int,
pos point,
crt_date date,
crt_time time(0)
)
PARTITION BY RANGE(crt_time);
do language plpgsql $$
declare
sql text;
begin
for i in 0..23
loop
if i<>23 then
sql := format( 'create table %I PARTITION OF cainiao FOR VALUES FROM (%L) TO (%L)' , 'cainiao_'||lpad(i::text, 2, '0') , (lpad(i::text, 2, '0')||':00:00') , (lpad((i+1)::text, 2, '0')||':00:00') );
else
sql := format( 'create table %I PARTITION OF cainiao FOR VALUES FROM (%L) TO (unbounded)' , 'cainiao_'||lpad(i::text, 2, '0') , (lpad(i::text, 2, '0')||':00:00') );
end if;
execute sql;
end loop;
end;
$$;
postgres=# \d+ cainiao
Table "public.cainiao"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------------------+-----------+----------+---------+---------+--------------+-------------
uid | integer | | | | plain | |
pos | point | | | | plain | |
crt_date | date | | | | plain | |
crt_time | time(0) without time zone | | not null | | plain | |
Partition key: RANGE (crt_time)
Partitions: cainiao_00 FOR VALUES FROM ('00:00:00') TO ('01:00:00'),
cainiao_01 FOR VALUES FROM ('01:00:00') TO ('02:00:00'),
cainiao_02 FOR VALUES FROM ('02:00:00') TO ('03:00:00'),
cainiao_03 FOR VALUES FROM ('03:00:00') TO ('04:00:00'),
cainiao_04 FOR VALUES FROM ('04:00:00') TO ('05:00:00'),
cainiao_05 FOR VALUES FROM ('05:00:00') TO ('06:00:00'),
cainiao_06 FOR VALUES FROM ('06:00:00') TO ('07:00:00'),
cainiao_07 FOR VALUES FROM ('07:00:00') TO ('08:00:00'),
cainiao_08 FOR VALUES FROM ('08:00:00') TO ('09:00:00'),
cainiao_09 FOR VALUES FROM ('09:00:00') TO ('10:00:00'),
cainiao_10 FOR VALUES FROM ('10:00:00') TO ('11:00:00'),
cainiao_11 FOR VALUES FROM ('11:00:00') TO ('12:00:00'),
cainiao_12 FOR VALUES FROM ('12:00:00') TO ('13:00:00'),
cainiao_13 FOR VALUES FROM ('13:00:00') TO ('14:00:00'),
cainiao_14 FOR VALUES FROM ('14:00:00') TO ('15:00:00'),
cainiao_15 FOR VALUES FROM ('15:00:00') TO ('16:00:00'),
cainiao_16 FOR VALUES FROM ('16:00:00') TO ('17:00:00'),
cainiao_17 FOR VALUES FROM ('17:00:00') TO ('18:00:00'),
cainiao_18 FOR VALUES FROM ('18:00:00') TO ('19:00:00'),
cainiao_19 FOR VALUES FROM ('19:00:00') TO ('20:00:00'),
cainiao_20 FOR VALUES FROM ('20:00:00') TO ('21:00:00'),
cainiao_21 FOR VALUES FROM ('21:00:00') TO ('22:00:00'),
cainiao_22 FOR VALUES FROM ('22:00:00') TO ('23:00:00'),
cainiao_23 FOR VALUES FROM ('23:00:00') TO (UNBOUNDED)
The real-time location table records the real-time locations (last recorded locations) of couriers.
Since the location data of couriers is reported continually, data in the real-time location table need not be stored. Therefore, unlogged tables can be used.
Note: If the locations of couriers cannot be reported in real time, use tables that are not unlogged.
create unlogged table cainiao_trace_realtime (
uid int primary key, -- 快递员ID
pos point, -- 快递员位置
crt_date date, -- 日期
crt_time time(0) -- 时间
);
Use the location field to create a GiST spatial index.
create index idx_cainiao_trace_realtime_pos on cainiao_trace_realtime using gist (pos);
A trigger can be set to update courier locations in real time. After the real-time location of a courier is uploaded, the courier's last location is updated automatically.
Note: If the real-time location table cainiao_trace_realtime is a table that is not unlogged, we recommend that you do not use a trigger to update locations because performing the write and update operations concurrently increases the response time. It is recommended that the application perform the insert and update operations asynchronously. For example, after movement tracks of a courier are reported, batch insert operations and one update operation are performed separately.
batch insert: insert into cainiao values (),(),(),....; update 最终状态: update cainiao_trace_realtime set xx=xx where uid=xx;
Advantages:
1) The insert and update operations are performed asynchronously.
2) Batch insert operations are performed.
3) The overall response time is reduced.
create or replace function ins_cainiao() returns trigger as $$
declare
begin
insert into cainiao_trace_realtime(uid,pos,crt_date,crt_time)
values (NEW.uid, NEW.pos, NEW.crt_date, NEW.crt_time)
on conflict (uid) do update set pos=excluded.pos,crt_date=excluded.crt_date,crt_time=excluded.crt_time;
return null;
end;
$$ language plpgsql strict;
Add a trigger to the base table.
do language plpgsql $$
declare
sql text;
begin
for i in 0..127
loop
for x in 0..6
loop
sql := format( 'create trigger tg after insert on %I for each row execute procedure ins_cainiao()', 'cainiao_'||i||'_'||x );
execute sql;
end loop;
end loop;
end;
$$;
A sample trigger is as follows:
postgres=# \d+ cainiao_0_0
Table "public.cainiao_0_0"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+---------------------------+-----------+----------+---------+---------+--------------+-------------
uid | integer | | | | plain | |
pos | point | | | | plain | |
crt_date | date | | | | plain | |
crt_time | time(0) without time zone | | | | plain | |
Triggers:
tg AFTER INSERT ON cainiao_0_0 FOR EACH ROW EXECUTE PROCEDURE ins_cainiao()
Note:
1) This article assumes that the application will splice the courier UID and the time field to obtain the base table's name. Otherwise, you need to use the table partitioning feature of PostgreSQL. The performance may be compromised if a partition table is used instead of the base table.
2) In this article, points are used instead of geometries because it is easier to construct data using points, which facilitates the test. In fact, point and geometry are both geographic location types that are applicable to similar scenarios. Performance indicators of points and geometries can also be cross-referenced.
Perform the test by simulating the real-time upload of courier tracks and real-time updates to couriers' latest locations.
The pgbench test script is as follows:
vi test.sql
\set uid random(1,1000000)
\set x random(-500000,500000)
\set y random(-500000,500000)
insert into cainiao_0_2 values (:uid, point(:x,:y), now()::date, now()::time);
The test lasts for 300 seconds.
numactl --physcpubind=0-31 pgbench -M prepared -n -r -P 1 -f ./test.sql -c 32 -j 32 -T 300
174,000 records are written per second, and a single request's delay is 0.18 milliseconds.
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 52270642
latency average = 0.184 ms
latency stddev = 2.732 ms
tps = 174234.709260 (including connections establishing)
tps = 174236.529998 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set uid random(1,1000000)
0.000 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.182 insert into cainiao_0_2 values (:uid, point(:x,:y), now()::date, now()::time);
For example, when the number of parcels requiring same-day delivery reaches a threshold or a scheduled time point arrives, nearby couriers need to be recalled to pick up the parcels.
Alternatively, when a customer attempts to send a parcel requiring same-day delivery, a nearby courier needs to be notified to pick up the parcel.
Stress test case: Select a random point to recall 100 nearest couriers within a radius of 20,000 meters.
Sample SQL statements:
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_trace_realtime where circle '((0,0),20000)' @> pos order by pos <-> point '(0,0)' limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.41..112.45 rows=100 width=40) (actual time=0.096..0.342 rows=100 loops=1)
Output: uid, pos, crt_date, crt_time, ((pos <-> '(0,0)'::point))
Buffers: shared hit=126
-> Index Scan using idx_cainiao_trace_realtime_pos on public.cainiao_trace_realtime (cost=0.41..1167.86 rows=1042 width=40) (actual time=0.094..0.330 rows=100 loops=1)
Output: uid, pos, crt_date, crt_time, (pos <-> '(0,0)'::point)
Index Cond: ('<(0,0),20000>'::circle @> cainiao_trace_realtime.pos)
Order By: (cainiao_trace_realtime.pos <-> '(0,0)'::point)
Buffers: shared hit=126
Planning time: 0.098 ms
Execution time: 0.377 ms
(10 rows)
The pgbench test script is as follows:
vi test1.sql
\set x random(-500000,500000)
\set y random(-500000,500000)
select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
The test lasts for 300 seconds.
numactl --physcpubind=32-63 pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -T 300
60,000 recall requests are processed per second, and the delay of a single request is 0.53 milliseconds.
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 18087765
latency average = 0.531 ms
latency stddev = 0.103 ms
tps = 60292.169523 (including connections establishing)
tps = 60292.786291 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.529 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
Note: If only one courier is to be recalled, the performance can reach 280,000 TPS.
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 84257925
latency average = 0.114 ms
latency stddev = 0.033 ms
tps = 280858.872643 (including connections establishing)
tps = 280862.101773 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.112 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 1;
Perform a stress test by simulating the insertion of courier tracks and random recall of couriers.
INSERT TPS: 125,000, with a response time of 0.25 milliseconds
QUERY TPS: 21,700, with a response time of 1.47 milliseconds
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 12508112
latency average = 0.256 ms
latency stddev = 1.266 ms
tps = 125072.868788 (including connections establishing)
tps = 125080.518685 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set uid random(1,1000000)
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.253 insert into cainiao_16 values (:uid, point(:x,:y), now()::date, now()::time);
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 2174422
latency average = 1.472 ms
latency stddev = 0.455 ms
tps = 21743.641754 (including connections establishing)
tps = 21744.366018 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
1.469 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
To minimize the response time, separate the real-time courier location table from the track details table and use the application to update couriers' real-time locations.
The real-time location table can be stored in the same database as the track details table or another database.
We recommend storing the real-time location table in another database, as this table is used independently in small transactions, such as update and query transactions.
However, the track details table is mainly used in large transactions such as insertion, range analysis, data merge, and daily track query.
For this reason, separate the track details table and the real-time location table.
After the tables are separated, the track details can be stored in an unlogged table or a common table.
The following tests the system performance after separating the tables.
Use a pgbench script to update courier locations and query 100 couriers nearest to a random point.
postgres=# \d cainiao_trace_realtime
Table "public.cainiao_trace_realtime"
Column | Type | Collation | Nullable | Default
----------+---------------------------+-----------+----------+---------
uid | integer | | not null |
pos | point | | |
crt_date | date | | |
crt_time | time(0) without time zone | | |
Indexes:
"cainiao_trace_realtime_pkey" PRIMARY KEY, btree (uid)
"idx_cainiao_trace_realtime_pos" gist (pos)
postgres=# select count(*),min(uid),max(uid) from cainiao_trace_realtime ;
count | min | max
---------+-----+---------
1000000 | 1 | 1000000
(1 row)
vi test1.sql
\set uid 1 1000000
\set x random(-500000,500000)
\set y random(-500000,500000)
insert into cainiao_trace_realtime (uid,pos) values (:uid, point(:x,:y)) on conflict (uid) do update set pos=excluded.pos;
vi test2.sql
\set x random(-500000,500000)
\set y random(-500000,500000)
select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
UPDATE TPS: 180,000, with a response time of 0.17 milliseconds
numactl --physcpubind=0-31 pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -T 300
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 300 s
number of transactions actually processed: 54283976
latency average = 0.177 ms
latency stddev = 2.241 ms
tps = 180943.029385 (including connections establishing)
tps = 180945.072949 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set uid random(1,1000000)
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.175 insert into cainiao_trace_realtime (uid,pos) values (:uid, point(:x,:y)) on conflict (uid) do update set pos=excluded.pos;
QUERY TPS: 52,000, with a response time of 0.61 milliseconds
numactl --physcpubind=0-31 pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 32 -j 32 -T 100
transaction type: ./test2.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 5195710
latency average = 0.616 ms
latency stddev = 0.154 ms
tps = 51956.132043 (including connections establishing)
tps = 51957.754525 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.614 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
UPDATE TPS: 130,000, with a response time of 0.24 milliseconds
QUERY TPS: 18,000, with a response time of 1.78 milliseconds
numactl --physcpubind=0-31 pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 32 -j 32 -T 100
transaction type: ./test1.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 13057629
latency average = 0.245 ms
latency stddev = 0.805 ms
tps = 130575.023251 (including connections establishing)
tps = 130582.436319 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set uid random(1,1000000)
0.001 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
0.242 insert into cainiao_trace_realtime (uid,pos) values (:uid, point(:x,:y)) on conflict (uid) do update set pos=excluded.pos;
numactl --physcpubind=32-63 pgbench -M prepared -n -r -P 1 -f ./test2.sql -c 32 -j 32 -T 100
transaction type: ./test2.sql
scaling factor: 1
query mode: prepared
number of clients: 32
number of threads: 32
duration: 100 s
number of transactions actually processed: 1791580
latency average = 1.786 ms
latency stddev = 2.128 ms
tps = 17915.362549 (including connections establishing)
tps = 17916.037454 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.002 \set x random(-500000,500000)
0.000 \set y random(-500000,500000)
1.784 select * from cainiao_trace_realtime where circle(point(:x,:y),20000) @> pos order by pos <-> point(:x,:y) limit 100;
Preceding sections describe that seven sub tables are used for storing real-time track data (one sub table created per day throughout a week), allowing time for writing the track data into OSS to facilitate maintenance.
Data generated six days ago can be written into OSS every day.
ApsaraDB RDS for PostgreSQL provides the oss_ext plug-in for writing data into OSS. ApsaraDB RDS for PostgreSQL can also read data from OSS by using external tables which is transparent to users.
For more information, see https://www.alibabacloud.com/help/doc-detail/44461.htm
8640 track records may be generated for a single courier.
PostgreSQL supports the following data types: JSON, HStore (KV), array, and associative array. Merging each courier's daily tracks into one record can greatly improve the efficiency of querying tracks by courier.
Sample track merging:
create type trace as (pos point, crt_time time);
create table cainiao_trace_agg (crt_date date, uid int, trace_arr trace[], primary key(crt_date,uid));
insert into cainiao_trace_agg (crt_date , uid , trace_arr )
select crt_date, uid, array_agg( (pos,crt_time)::trace ) from cainiao_0_2 group by crt_date, uid;
Let's compare the performance of querying the tracks of a courier before and after track merging.
Before the merging (B-tree index), the query takes 8 milliseconds.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_0_2 where uid=340054;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx on public.cainiao_0_2 (cost=0.57..193.61 rows=194 width=32) (actual time=0.033..7.711 rows=7904 loops=1)
Output: uid, pos, crt_date, crt_time
Index Cond: (cainiao_0_2.uid = 340054)
Buffers: shared hit=7720
Planning time: 0.090 ms
Execution time: 8.198 ms
(6 rows)
After the merging, the query takes 0.033 milliseconds.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cainiao_trace_agg where crt_date='2017-04-18' and uid=340054;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using cainiao_trace_agg_pkey on public.cainiao_trace_agg (cost=0.42..2.44 rows=1 width=978) (actual time=0.016..0.017 rows=1 loops=1)
Output: crt_date, uid, trace_arr
Index Cond: ((cainiao_trace_agg.crt_date = '2017-04-18'::date) AND (cainiao_trace_agg.uid = 340054))
Buffers: shared hit=4
Planning time: 0.098 ms
Execution time: 0.033 ms
(6 rows)
1) This article described the database design and model stress testing based on the following two common requirements in a logistics tracking system: to track couriers in real time and recall nearby couriers in real time.
2) PostgreSQL works with OSS to implement hot and cold data separation. Historical tracks are written and stored in OSS, which can be shared to AnalyticDB for PostgreSQL for real-time data mining and analysis.
3) A single machine supports track writing at 180,000 TPS, recalling 100 nearest couriers at 60,000 TPS, and recalling one nearest courier at 280,000 TPS.
4) The inheritance feature of PostgreSQL helps manage partition tables better. For example, you can query all partitions of Monday from the primary table of these partitions. You can also query all-time segment data of a certain module value from the related primary table.
PostgreSQL: Optimizing Single-column Conditions-based Queries
Alibaba Clouder - December 12, 2017
digoal - October 23, 2018
Hologres - June 16, 2022
Alibaba Clouder - January 17, 2018
Alibaba Clouder - September 23, 2020
digoal - January 22, 2021
An online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn all-in-one service for log-type data
Learn MoreAnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreMore Posts by digoal