PostgreSQL Independent Event Correlation Analysis - User-Car Fitting

In this article, we look at how we can create a user-car fitting model in PostgreSQL to map spatio-temporal data with drivers, passengers and vehicles.

Background

Independent event correlation analysis is very interesting. For example, the "Passing By" function points of some apps can indicate that you've been in a similar location with a different person at a different time. Independent event correlation analysis can also be used for public opinion analysis, sales portfolio for commodities, and user-car fitting.

User-car fitting in the safety system processes spatio-temporal data for fitting drivers, passengers and vehicles.

User-Car Fitting

1. Create a table structure

User

create table u_pos (  
  id int8,  
  uid int8,  
  crt_time timestamp,  
  pos geometry  
);

Car

create table c_pos (  
  id int8,  
  car_id int8,  
  crt_time timestamp,  
  pos geometry  
);

2. Generate test data

Taking Hangzhou as an example, the latitude and longitude ranges are as follows:

The east longitude is 118°21′-120°30′, and the north latitude is 29°11′-30°33′. The calculated east longitude is 118.35°-120.5° and north latitude is 29.183°-30.55°.

The assumed value of the active number:

10 million users, and 10 million cars.  
  
The number of user tracks is 1 billion a day.  
  
The number of car tracks is 100 million a day.

2.1. Write the activity location data of the user, partition it by day and keep it for one year.

for ((i=1;i<=32;i++))  
do  
nohup psql -c "insert into u_pos select id, random()*10000000, '2017-10-01'::date + ((id*2.7648)||' ms')::interval, st_setsrid(st_makepoint(118.35+random()*2.15, 29.183+random()*1.367), 4326) from generate_series(1,31250000) t(id);" >/dev/null 2>&1 &  
done

The BRIN index, which is most commonly used in time-series data, is adopted.

create index idx_u_pos_1 on u_pos using brin(crt_time);

Create an index of user + time.

create index idx_u_pos_2 on u_pos using btree(uid, crt_time);

2.2. Write the activity location data of the car, partition it by day and keep it for one year.

for ((i=1;i<=32;i++))  
do  
nohup psql -c "insert into c_pos select id, random()*10000000, '2017-10-01'::date + ((id*27.648)||' ms')::interval, st_setsrid(st_makepoint(118.35+random()*2.15, 29.183+random()*1.367), 4326) from generate_series(1,3125000) t(id);" >/dev/null 2>&1 &  
done

The BRIN index, which is most commonly used in time-series data, is adopted.

create index idx_c_pos_1 on c_pos using brin(crt_time);

Create an index of car + time.

create index idx_c_pos_2 on c_pos using btree(car_id, crt_time);

3. Find the fitting of people and cars in a certain time interval

3.1. During the driving of the car, N points are captured and the time and position are returned.

select pos, crt_time from c_pos where car_id=? and crt_time between ? and ?;

Return the intersection of users near the N points in the corresponding time interval

create or replace function merge_car_u(  
  v_car_id int8, -- car ID  
  s_time timestamp, -- the start time of the search range  
  e_time timestamp, -- the end time of the search range  
  ts_range interval, -- corresponding to each car track point: the time difference between the time when the target person appears and the time when the car appears (how much scaling has occurred)  
  pos_range float8 -- corresponding to each car track point: the distance between the target user and the car  
) returns int8[] as 
$$
  
declare  
  res int8[];  
  tmp int8[];  
  v_pos geometry;  
  v_crt_time timestamp;  
  i int := 0;  
begin  
  for v_pos, v_crt_time in select pos, crt_time from c_pos where car_id=v_car_id and crt_time between s_time and e_time  -- find track points  
  loop  
    select array_agg(uid) into tmp from u_pos where crt_time between v_crt_time-ts_range and v_crt_time+ts_range and (v_pos <-> pos) < pos_range;  -- find the ID of the corresponding target  
    if (i <> 0) then  
      select array_agg(unnest) into res from (select unnest(res) intersect select unnest(tmp)) t;  -- find intersections  
    else  
      res := tmp;  
    end if;  
    i := i+1;  
  end loop;  
  return res;  
end;  

$$
 language plpgsql strict;

Example:

postgres=# select * from merge_car_u(1, '2017-10-01 01:00:00', '2017-10-01 04:00:00', '10 s', 0.004);  
            merge_car_u              
-----------------------------------  
 {5481974,5958009,3682524,1313466}  
(1 row)  
  
Time: 232.960 ms

3.2. During the movement of the user, N points are captured and the time and position are returned.

Return the intersection of cars near the N points in the corresponding time interval

create or replace function merge_u_car(  
  v_uid int8, -- user ID  
  s_time timestamp, -- the start time of the search range  
  e_time timestamp, -- the end time of the search range  
  ts_range interval, -- corresponding to each user track point: the time difference between the time when the target car appears and the time when the user appears (how much scaling has occurred)  
  pos_range float8 -- corresponding to each user track point: the distance between the target car and the user  
) returns int8[] as 
$$
  
declare  
  res int8[];  
  tmp int8[];  
  v_pos geometry;  
  v_crt_time timestamp;  
  i int := 0;  
begin  
  for v_pos, v_crt_time in select pos, crt_time from u_pos where uid=v_uid and crt_time between s_time and e_time  -- find track points  
  loop  
    select array_agg(car_id) into tmp from c_pos where crt_time between v_crt_time-ts_range and v_crt_time+ts_range and (v_pos <-> pos) < pos_range;  -- find the ID of the corresponding target  
    if (i <> 0) then  
      select array_agg(unnest) into res from (select unnest(res) intersect select unnest(tmp)) t;  -- find intersections  
    else  
      res := tmp;  
    end if;  
    i := i+1;  
  end loop;  
  return res;  
end;  

$$
 language plpgsql strict;

Example:

postgres=# select * from merge_u_car(100, '2017-10-01 01:00:00', '2017-10-01 02:00:00', '100 s', 0.2);  
                                                merge_u_car                                                  
-----------------------------------------------------------------------------------------------------------  
 {6214562,6180159,4534165,7824219,6826437,3020910,1463798,2939986,5786345,7233751,2856178,1719127,7763683}  
(1 row)  
  
Time: 96.986 ms

Summary

1. Optimization ideas of storage and index.

"Time truncation + spatial sorting" storage

For example:

(YYYY-MM-DD HH24:MI), (geohash)

After the storage is modified, the BTree or BRIN index of the structure above is created.

When searching for records that appear near a certain point at a certain time, parallel search can be performed, and the number of data blocks searched is relatively small because of dense storage.

2. Other requirements: complement of missing positions. In some cases, the location information of the cars or users may not be collected, such as when passing through congested roads or during dead angles of the collector.

In the case of a vacancy in location acquisition, pgrouting, together with the road network information, is used to generate several paths to complete the missing points. And, estimate the time to obtain points and the time passing through these points.

3. Other requirements: correction of abnormal positions.

4. Fitting performance, and day partitions. 10 million users, and 10 million cars. The number of user tracks is 1 billion a day. The number of car tracks is 100 million a day.

Fitting response of millisecond level can be achieved.

Community

PostgreSQL Independent Event Correlation Analysis - User-Car Fitting

Background

User-Car Fitting

Summary

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

Database for FinTech Solution