# Community

Blog Events Webinars Tutorials Forum
×
Community Blog PostgreSQL Independent Event Correlation Analysis - User-Car Fitting

# PostgreSQL Independent Event Correlation Analysis - User-Car Fitting

In this article, we look at how we can create a user-car fitting model in PostgreSQL to map spatio-temporal data with drivers, passengers and vehicles.

## Background

Independent event correlation analysis is very interesting. For example, the "Passing By" function points of some apps can indicate that you've been in a similar location with a different person at a different time. Independent event correlation analysis can also be used for public opinion analysis, sales portfolio for commodities, and user-car fitting.

User-car fitting in the safety system processes spatio-temporal data for fitting drivers, passengers and vehicles.

## User-Car Fitting

1.  Create a table structure

User

``````create table u_pos (
id int8,
uid int8,
crt_time timestamp,
pos geometry
);  ``````

Car

``````create table c_pos (
id int8,
car_id int8,
crt_time timestamp,
pos geometry
);  ``````

2.  Generate test data

Taking Hangzhou as an example, the latitude and longitude ranges are as follows:

``The east longitude is 118°21′-120°30′, and the north latitude is 29°11′-30°33′. The calculated east longitude is 118.35°-120.5° and north latitude is 29.183°-30.55°.  ``

The assumed value of the active number:

``````10 million users, and 10 million cars.

The number of user tracks is 1 billion a day.

The number of car tracks is 100 million a day.  ``````

2.1. Write the activity location data of the user, partition it by day and keep it for one year.

``````for ((i=1;i<=32;i++))
do
nohup psql -c "insert into u_pos select id, random()*10000000, '2017-10-01'::date + ((id*2.7648)||' ms')::interval, st_setsrid(st_makepoint(118.35+random()*2.15, 29.183+random()*1.367), 4326) from generate_series(1,31250000) t(id);" >/dev/null 2>&1 &
done  ``````

The BRIN index, which is most commonly used in time-series data, is adopted.

``create index idx_u_pos_1 on u_pos using brin(crt_time);  ``

Create an index of user + time.

``create index idx_u_pos_2 on u_pos using btree(uid, crt_time);  ``

2.2. Write the activity location data of the car, partition it by day and keep it for one year.

``````for ((i=1;i<=32;i++))
do
nohup psql -c "insert into c_pos select id, random()*10000000, '2017-10-01'::date + ((id*27.648)||' ms')::interval, st_setsrid(st_makepoint(118.35+random()*2.15, 29.183+random()*1.367), 4326) from generate_series(1,3125000) t(id);" >/dev/null 2>&1 &
done  ``````

The BRIN index, which is most commonly used in time-series data, is adopted.

``create index idx_c_pos_1 on c_pos using brin(crt_time);  ``

Create an index of car + time.

``create index idx_c_pos_2 on c_pos using btree(car_id, crt_time);  ``

3.  Find the fitting of people and cars in a certain time interval

3.1. During the driving of the car, N points are captured and the time and position are returned.

``select pos, crt_time from c_pos where car_id=? and crt_time between ? and ?;  ``

Return the intersection of users near the N points in the corresponding time interval

``````create or replace function merge_car_u(
v_car_id int8, -- car ID
s_time timestamp, -- the start time of the search range
e_time timestamp, -- the end time of the search range
ts_range interval, -- corresponding to each car track point: the time difference between the time when the target person appears and the time when the car appears (how much scaling has occurred)
pos_range float8 -- corresponding to each car track point: the distance between the target user and the car
) returns int8[] as
\$\$

declare
res int8[];
tmp int8[];
v_pos geometry;
v_crt_time timestamp;
i int := 0;
begin
for v_pos, v_crt_time in select pos, crt_time from c_pos where car_id=v_car_id and crt_time between s_time and e_time  -- find track points
loop
select array_agg(uid) into tmp from u_pos where crt_time between v_crt_time-ts_range and v_crt_time+ts_range and (v_pos <-> pos) < pos_range;  -- find the ID of the corresponding target
if (i <> 0) then
select array_agg(unnest) into res from (select unnest(res) intersect select unnest(tmp)) t;  -- find intersections
else
res := tmp;
end if;
i := i+1;
end loop;
return res;
end;

\$\$
language plpgsql strict;  ``````

Example:

``````postgres=# select * from merge_car_u(1, '2017-10-01 01:00:00', '2017-10-01 04:00:00', '10 s', 0.004);
merge_car_u
-----------------------------------
{5481974,5958009,3682524,1313466}
(1 row)

Time: 232.960 ms  ``````

3.2. During the movement of the user, N points are captured and the time and position are returned.

Return the intersection of cars near the N points in the corresponding time interval

``````create or replace function merge_u_car(
v_uid int8, -- user ID
s_time timestamp, -- the start time of the search range
e_time timestamp, -- the end time of the search range
ts_range interval, -- corresponding to each user track point: the time difference between the time when the target car appears and the time when the user appears (how much scaling has occurred)
pos_range float8 -- corresponding to each user track point: the distance between the target car and the user
) returns int8[] as
\$\$

declare
res int8[];
tmp int8[];
v_pos geometry;
v_crt_time timestamp;
i int := 0;
begin
for v_pos, v_crt_time in select pos, crt_time from u_pos where uid=v_uid and crt_time between s_time and e_time  -- find track points
loop
select array_agg(car_id) into tmp from c_pos where crt_time between v_crt_time-ts_range and v_crt_time+ts_range and (v_pos <-> pos) < pos_range;  -- find the ID of the corresponding target
if (i <> 0) then
select array_agg(unnest) into res from (select unnest(res) intersect select unnest(tmp)) t;  -- find intersections
else
res := tmp;
end if;
i := i+1;
end loop;
return res;
end;

\$\$
language plpgsql strict;  ``````

Example:

``````postgres=# select * from merge_u_car(100, '2017-10-01 01:00:00', '2017-10-01 02:00:00', '100 s', 0.2);
merge_u_car
-----------------------------------------------------------------------------------------------------------
{6214562,6180159,4534165,7824219,6826437,3020910,1463798,2939986,5786345,7233751,2856178,1719127,7763683}
(1 row)

Time: 96.986 ms  ``````

## Summary

1.  Optimization ideas of storage and index.

"Time truncation + spatial sorting" storage

For example:

``(YYYY-MM-DD HH24:MI), (geohash)  ``

After the storage is modified, the BTree or BRIN index of the structure above is created.

When searching for records that appear near a certain point at a certain time, parallel search can be performed, and the number of data blocks searched is relatively small because of dense storage.

2.  Other requirements: complement of missing positions. In some cases, the location information of the cars or users may not be collected, such as when passing through congested roads or during dead angles of the collector.

In the case of a vacancy in location acquisition, pgrouting, together with the road network information, is used to generate several paths to complete the missing points. And, estimate the time to obtain points and the time passing through these points.

3.  Other requirements: correction of abnormal positions.

4.  Fitting performance, and day partitions. 10 million users, and 10 million cars. The number of user tracks is 1 billion a day. The number of car tracks is 100 million a day.

Fitting response of millisecond level can be achieved.

0 0 0
Share on

# digoal

255 posts | 18 followers

# digoal

255 posts | 18 followers

# Related Products

• ## Database Overview

Fully managed and less trouble database services

• ## AnalyticDB for PostgreSQL

An online MPP warehousing service based on the Greenplum Database open source program

• ## ApsaraDB RDS for PostgreSQL

An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities

• ## Database for FinTech Solution

Leverage cloud-native database solutions dedicated for FinTech.