Creating a Real-Time User Profile Recommendation System with PostgreSQL (2)

By Digoal

User profiles are already widely applied to application reconstruction in marketing. A popular solution is tagging users and discovering target users based on tag combinations.

A profile system may use wide tables and distributed systems. The role of wide tables is to store tags, and each column represents a tag. In fact, this design is not necessarily the best or only one. This document will introduce and discuss the efficiency of an alternative design concept based on PostgreSQL.

Implementation Details for Bit Solution 2

ApsaraDB RDS for PostgreSQL's New Bit Operation Functions

This document will refer to some useful new functions of ApsaraDB RDS for PostgreSQL.

get_bit (varbit, int, int) returns varbit
  Obtain N bits from a specified position and return varbit
  For example, get_bit('111110000011', 3, 5), return 11000

set_bit_array (varbit, int, int, int[]) returns varbit 
  Set the bit at the specified position to 0|1, and fill the part exceeding the original length with 0|1  
  for example set_bit_array('111100001111', 0, 1, array[1,15]), return 1011000011111110

bit_count (varbit, int, int, int) returns int
  From the nth bit, count the number of following 0|1 bits, and if N exceeds the length, merely count the existing bits.  
  for example bit_count('1111000011110000', 1, 5, 4), return 1 (0001)

bit_count (varbit, int) returns int 
  Count the number of 1|0 bits in the entire bit string  
  for example bit_count('1111000011110000', 1), return 8

bit_fill (int, int) returns varbit 
  Fill a specified length with 0 or 1
  for example bit_fill(0,10), return '0000000000'

bit_rand (int, int, float) returns varbit  
  Fill a specified length of random bits, and specify a random proportion of ones to zeroes  
  For example bit_rand(10, 1, 0.3), which may return '0101000001'  

bit_posite (varbit, int, boolean) returns int[]  
  Return the positions of 1|0 bits, the subscript starts counting from 0, return positive if true, and negative if false    
  For example, bit_posite ('11110010011', 1, true), return [0,1,2,3,6,9,10]  
       bit_posite ('11110010011', 1, false), return [10,9,6,3,2,1,0]

bit_posite (varbit, int, int, boolean) returns int[]  
  Return the positions of 1|0 bits, the subscript starts counting from 0, return positive if true, and negative if false until N bits are returned  
  for example, bit_posite ('11110010011', 1, 3, true), return [0,1,2]  
       bit_posite ('11110010011', 1, 3, false), return [10,9,6]  

get_bit_2 (varbit, int, int) returns int  
  Return the bit at the specified position, the subscript starts from 0, and returns a 0 or 1 as specified when the BIT position has been exceeded  
  For example, get_bit_2('111110000011', 100, 0), return 0 (100 already exceeds the length, and return 0 which is specified by the user)

For the built-in BIT operation functions in the database, please refer to the source code

src/backend/utils/adt/varbit.c

Table Structure Design

Store users in bits

userid int8 indicates that there may be over 4 billion users.

rowid int indicates that a single APPID is not allowed to possess more than 2 billion users. Perform auto-increment from 0, matching the bit subscripts.

appid int indicates that there will be no more than 4 billion users.

In the dictionary table, rowid determines the MAP sequence, which is returned using a window query.

drop table IF EXISTS t_userid_dic;

create table IF NOT EXISTS t_userid_dic(appid int not null, rowid int not null, userid int8 not null, unique (appid,userid), unique (appid,rowid));

Insert the function of the user dictionary table, which can generate seamless and continuous ROWIDs.

create or replace function f_uniq(i_appid int, i_userid int8) returns int as 
$$

declare
  newid int;
  i int := 0;
  res int;
  stack1 text;
  stack2 text;
  stack3 text;
  stack4 text;
  stack5 text;
  stack6 text;
  stack7 text;
  stack8 text;
  stack9 text;
  stack10 text;
begin
  loop 
    if i>0 then 
      perform pg_sleep(random());
    else
      i := i+1;
    end if;

    -- Obtain the existing maximum ID+1 (i.e., the ID to be inserted)
    select max(rowid)+1 into newid from t_userid_dic where appid=i_appid;
    if newid is not null then
      -- Obtain AD LOCK (multiply by appid. The algorithm can be improved to avoid conflict between IDs) 
      if pg_try_advisory_xact_lock(i_appid::int8 * newid) then
        -- Insert
    insert into t_userid_dic (appid, rowid, userid) values (i_appid, newid, i_userid);
        -- Return the UID
    return newid;
      else
    -- Continue looping if no AD LOCK is obtained
    continue;
      end if;
    else
      -- Indicate that this is the first record, and obtain the LOCK with AD=0
      if pg_try_advisory_xact_lock(0 * i_appid::int8) then
    insert into t_userid_dic (appid, rowid, userid) values (i_appid, 0, i_userid);
        return 0;
      else
    continue;
      end if;
    end if;
  end loop;
  
  exception 
  -- Only ignore unique violation errors and continue to report other errors 
  when SQLSTATE '23505' then 
    /*
    -- Reference https://www.postgresql.org/docs/9.6/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
    GET STACKED DIAGNOSTICS stack1 = RETURNED_SQLSTATE,
                            stack2 = COLUMN_NAME,
                            stack3 = CONSTRAINT_NAME,
                            stack4 = PG_DATATYPE_NAME,
                            stack5 = MESSAGE_TEXT,
                            stack6 = TABLE_NAME,
                            stack7 = SCHEMA_NAME,
                            stack8 = PG_EXCEPTION_DETAIL,
                            stack9 = PG_EXCEPTION_HINT,
                            stack10 = PG_EXCEPTION_CONTEXT;
    -- If a PK conflict occurs due to being in a transient state, continue calling (note: if appid and userid are duplicated, an endless loop will occur, which is obviously not recommended)
    --  select f_uniq(i_appid, i_userid) into res;
    raise notice 'RETURNED_SQLSTATE, %', stack1;
    raise notice 'COLUMN_NAME, %', stack2;
    raise notice 'CONSTRAINT_NAME, %', stack3;
    raise notice 'PG_DATATYPE_NAME, %', stack4;
    raise notice 'MESSAGE_TEXT, %', stack5;
    raise notice 'TABLE_NAME, %', stack6;
    raise notice 'SCHEMA_NAME, %', stack7;
    raise notice 'PG_EXCEPTION_DETAIL, %', stack8;
    raise notice 'PG_EXCEPTION_HINT, %', stack9;
    raise notice 'PG_EXCEPTION_CONTEXT, %', stack10;
    -- Fail to insert, and return NULL
    */
    return null;
end;

$$
 language plpgsql strict;

If NULL is returned during the call above, it indicates that the insertion failed. This may be caused by a unique violation, so you can retry on the application end.

Perform stress testing on the function above to determine whether seamless insertion can be achieved. Raise notice can be disabled during the stress testing.

$ vi test.sql

\set appid random(1,1000)
\set userid random(1,2000000000)
select f_uniq(:appid, :userid);

Concurrency of 164  
$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 164 -j 164 -T 10
progress: 10.0 s, 85720.5 tps, lat 1.979 ms stddev 34.808
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 164
number of threads: 164
duration: 10 s
number of transactions actually processed: 827654
latency average = 2.039 ms
latency stddev = 35.674 ms
tps = 75435.422933 (including connections establishing)
tps = 75483.813182 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set appid random(1,1000)
         0.000  \set userid random(1,2000000000)
         2.034  select f_uniq(:appid, :userid);

Verification

postgres=# select count(*) from t_userid_dic ;
 count  
--------
 825815
(1 row)

postgres=# select appid,count(*),max(rowid) from t_userid_dic group by 1;
 appid | count | max 
-------+-------+-----
   251 |   857 | 857
   106 |   854 | 854
   681 |   816 | 816
   120 |   826 | 826
   285 |   815 | 815
   866 |   778 | 778
   264 |   873 | 873
......

postgres=# select * from (select appid,count(*),max(rowid) from t_userid_dic group by 1) t where max<>count;
 appid | count | max 
-------+-------+-----
(0 rows)

This completely satisfies requirements on insert speed and seamlessness.
Generate 100 million test users, APPID=1, for subsequent tests

truncate t_userid_dic;

insert into t_userid_dic select 1, generate_series(1,100000000), generate_series(1,100000000);

Update the table in real time

In order to improve the write performance, data is written into the table in real time, and incrementally merged into the TAG table on the backend.

drop table IF EXISTS t_user_tags;

create table IF NOT EXISTS t_user_tags(id serial8 primary key, appid int, userid int8, tag int, ins boolean, dic boolean default false);  

create index idx_t_user_tags_id on t_user_tags(id) where dic is false;

-- ins = true indicates to add a tag, and = false indicates to delete a tag.  
  
-- dic = true indicates the USERID designed for this record has already been merged to the user dictionary table  

-- There is room to optimize the design of this table. For example, it is best to alternate tables to clear data, e.g., one table each day, each being preserved for 31 days.

Generate 15 million pieces of test data (APPID=1, there is a total of 2 billion random USERIDs, the range for tagid addition is 1-10,000, and the range for tagid deletion is 1-1,000)

insert into t_user_tags (appid,userid,tag,ins) select 1, 2000000000*random(),10000*random(),true from generate_series(1,10000000);
insert into t_user_tags (appid,userid,tag,ins) select 1, 2000000000*random(),5000*random(),false from generate_series(1,5000000);

Tag + userids bitmap table is critical as it will be queried frequently. Data is incrementally merged into this table from t_user_tags.

drop table IF EXISTS t_tags;

create table IF NOT EXISTS t_tags(id serial primary key, appid int, tag int, userids varbit);
-- id can be used for advisory lock and updated concurrently

create unique index idx_t_tags_uk on t_tags(tag,appid);

Generate 10,000 pieces of TAG test data, where each TAG contains BITs of 100 million users. For convenience of subsequent tests

-- Generate random bits  
CREATE OR REPLACE FUNCTION public.randbit(integer)
 RETURNS bit varying
 LANGUAGE plpgsql
 STRICT
AS $function$
declare
  res varbit;
begin
  select (string_agg(mod((2*random())::int,2)::text,''))::varbit into res from generate_series(1,$1);
  if res is not null then 
    return res;
  else
    return ''::varbit;
  end if;
end;
$function$

create sequence seq;

-- Insert 10,000 records in parallel
$ vi test.sql

insert into t_tags(appid,tag,userids) select 1,nextval('seq'::regclass),randbit(100000000);

$ pgbench -M simple -n -r -f ./test.sql -c 50 -j 50 -t 200

Test Query Performance for Combinations of TAGs

This indicator shows the performance when delineating and returning a user group when a user queries a combination of TAGs.

The testing is simple: Include all, include none, and include any.

1. Users that include the tags

userids (bitand) userids

The result is users with bit 1

The following is the test SQL

-- Obtain the maximum BIT length 
with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) ) 
select appid,userid from t_userid_dic, tmp 
where appid = ? 
and rowid = any 
(
  ( 
    -- Positively retrieve 10,000 users where bit=1. Use false if you need reverse retrieval (data from the nearest user will be retrieved) 
    -- start counting rowid from 0, which matches the initial position of the bit subscript 
    select bit_posite(res, 1, 10000, true) from 
    (
      select t1.userids & t2.userids & t3.userids & t4.userids AS res -- & ......
       from
           -- Complete BITs according to the maximum length 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4 
          -- ......
    ) t
  )::int[]
)
;

Performance data

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) ) 
select appid,userid from t_userid_dic, tmp 
where appid = 1  
and rowid = any  
(
  ( 
    select bit_posite(res, 1, 10000, true) from 
    (
      select t1.userids & t2.userids & t3.userids & t4.userids as res
       from
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4 
    ) t
  )::int[]
)
;

2. Users that do not include the tags

userids (bitor) userids

The result is users with bit 0

The following is the test SQL

-- Obtain the maximum BIT length 
with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) ) 
select appid,userid from t_userid_dic, tmp 
where appid = ? 
and rowid = any 
(
  ( 
    -- Positively retrieve 10,000 users where bit=0. Use false if you need reverse retrieval (data from the nearest user will be retrieved)  
    select bit_posite(res, 0, 10000, true) from 
    (
      select t1.userids | t2.userids | t3.userids | t4.userids AS res -- | ......
       from
           -- Complete BITs according to the maximum length 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4 
          -- ......
    ) t
  )::int[]
)
;

Performance data

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) ) 
select appid,userid from t_userid_dic, tmp 
where appid = 1  
and rowid = any  
(
  ( 
    select bit_posite(res, 0, 10000, true) from 
    (
      select t1.userids | t2.userids | t3.userids | t4.userids as res
       from
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4 
    ) t
  )::int[]
)
;

3. Include any of the TAGs

userids (bitor) userids

The result is users with bit 1

The following is the test SQL

-- Obtain the maximum BIT length 
with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (?,?,...) ) 
select appid,userid from t_userid_dic, tmp 
where appid = ? 
and rowid = any 
(
  ( 
    -- Positively retrieve 10,000 users where bit=1. Use false if you need reverse retrieval (data from the nearest user will be retrieved)  
    select bit_posite(res, 1, 10000, true) from 
    (
      select t1.userids | t2.userids | t3.userids | t4.userids AS res -- | ......
       from
           -- Complete BITs according to the maximum length 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t1 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t2 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t3 , 
         ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags where tag = ? ) t4 
          -- ......
    ) t
  )::int[]
)
;

Performance data

with tmp as ( select max(bit_length(userids)) maxlen from t_tags where tag in (226833, 226830, 226836, 226834) ) 
select appid,userid from t_userid_dic, tmp 
where appid = 1  
and rowid = any  
(
  ( 
    select bit_posite(res, 1, 10000, true) from 
    (
      select t1.userids | t2.userids | t3.userids | t4.userids as res
       from
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226833 ) t1 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226830 ) t2 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226836 ) t3 , 
          ( select set_bit_array(userids, 0, 0, array[tmp.maxlen-1]) userids from t_tags,tmp where tag = 226834 ) t4 
    ) t
  )::int[]
)
;

Advanced Functions

Combined with bit_posite, multiple users can be forward or reverse retrieved (for example, reversely retrieve 10,000 users. This is useful if, for example, there are 1 million results but the promotion only needs 10,000 users, especially if they must be 10,000 new users).
Combined with get_bit, a segment of BITs can be intercepted, and the result can then be easily obtained.

Test Performance When Adding Data

Refers to the performance when adding data to the t_user_tags table.

postgres=# \d+ t_user_tags
                                             Table "public.t_user_tags"
 Column |  Type   |                        Modifiers                         | Storage | Stats target | Description 
--------+---------+----------------------------------------------------------+---------+--------------+-------------
 id     | bigint  | not null default nextval('t_user_tags_id_seq'::regclass) | plain   |              | 
 appid  | integer |                                                          | plain   |              | 
 userid | bigint  |                                                          | plain   |              | 
 tag    | integer |                                                          | plain   |              | 
 ins    | boolean |                                                          | plain   |              | 
 dic    | boolean | default false                                            | plain   |              | 
Indexes:
    "t_user_tags_pkey" PRIMARY KEY, btree (id)
    "idx_t_user_tags_id" btree (id) WHERE dic IS FALSE

The test is as follows

$ vi test.sql

\set appid random(1,1000) 
\set userid random(1,2000000000) 
\set new_tag random(1,10000) 
\set old_tag random(8001,10000) 
insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :new_tag, true); 
insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :old_tag, false); 


$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 128 -j 128 -T 120


postgres=# select count(*) from t_user_tags;
  count   
----------
 14721724
(1 row)

Performance data (the QPS of a one-step operation is about 122,000, including adding or deleting TAGs)

Update actions can be divided into two parts, i.e., adding and deleting. Do not merge them to the same record.

transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 7360862
latency average = 2.085 ms
latency stddev = 1.678 ms
tps = 61326.338528 (including connections establishing)
tps = 61329.196790 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.001  \set appid random(1,1000) 
         0.000  \set userid random(1,2000000000) 
         0.000  \set new_tag random(1,10000) 
         0.000  \set old_tag random(8001,10000) 
         1.042  insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :new_tag, true);
         1.037  insert into t_user_tags (appid,userid,tag,ins) values (:appid, :userid, :old_tag, false);

Test Performance When Merging Data

Merging data involves three steps

Update the user dictionary table t_userid_dic;
Obtain and delete t_user_tags records in batch;
Merge label data into t_tags.

The above three actions can be completed in one transaction.

Considering that the userids field in a t_tags table contains 100 million bits, about 12.5 MB, updating a single record may take a long time, so the operation should be done in parallel mode with each TAG computed in parallel.

Dictionary update is wrapped into the following function

Retrieve data from the t_user_tags table and update the data dictionary, marking the data to allow merging.

There is no need to execute this operation in parallel, serial execution using an infinite loop in a background process is acceptable.

create or replace function update_dict(v_rows int) returns void as 
$$

declare
  min_id int; -- Boundary id
  f_uniq_res int8;
begin
  if v_rows<1 then
    raise notice 'v_rows must >=1';
    return;
  end if;
  
  -- Query APPID, where the tag corresponds to the minimum boundary. Return if no record is found
  select min(id) into min_id from t_user_tags where dic=false;
  if not found then
    raise notice 'no data';
    return;
  end if;

  -- Insert the temporary data that needs to be processed into the array
  -- No error will be reported if f_uniq fails. Here f_uniq needs to be changed so that it will not process UK conflicts, but will process all other errors. Otherwise, if t_user_tags is changed, the USER may not be properly added to the dictionary.  
  with tmp as (update t_user_tags t set dic=true where id>=min_id and id<=min_id+v_rows returning *)
  select count(*) into f_uniq_res from (select f_uniq(appid,userid) from (select appid,userid from tmp group by 1,2) t) t;
end;

$$
 language plpgsql;

Since batch operations will generate a large number of AD LOCKs, max_locks_per_transaction should be added and data parameters must be adjusted accordingly

max_locks_per_transaction=40960

Verification

postgres=# select update_dict(200000);
 update_dict 
-------------
 
(1 row)
Time: 8986.175 ms

Execute a few times

Time: 9395.991 ms
Time: 10798.631 ms
Time: 10726.547 ms
Time: 10620.055 ms

This means that the system processes about 20,000 records per second

Verify the accuracy of the dictionary update

postgres=# select count(*) from t_userid_dic ;
  count  
---------
 1399501
(1 row)
Time: 110.656 ms

postgres=# select count(*) from (select appid,userid from t_user_tags where dic=true group by 1,2) t;
  count  
---------
 1399501
(1 row)
Time: 2721.264 ms

postgres=# select * from t_userid_dic order by appid,rowid limit 10;
 appid | rowid |   userid   
-------+-------+------------
     1 |     0 | 1802787010
     1 |     1 | 1342147584
     1 |     2 | 1560458710
     1 |     3 | 1478701081
     1 |     4 | 1826138023
     1 |     5 |  182295180
     1 |     6 | 1736227913
     1 |     7 |  512247294
     1 |     8 |  686842950
     1 |     9 | 1940486738
(10 rows)

postgres=# select min(rowid),max(rowid),count(*),appid from t_userid_dic group by appid;
 min |   max   |  count  | appid 
-----+---------+---------+-------
   1 | 1399501 | 1399501 |     1
(1 row)
Time: 369.562 ms

Although the operation isn't necessarily run in parallel, its security during parallel execution must be ensured, so next we will go over how to do so.

$ vi test.sql

select update_dict(1000);

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100

Verify that the parallel results are secure and reliable

postgres=# select count(*) from t_userid_dic ;
  count  
---------
 1533534
(1 row)

postgres=# select count(*) from (select appid,userid from t_user_tags where dic=true group by 1,2) t;
  count  
---------
 1533534
(1 row)

postgres=# select * from t_userid_dic order by appid,rowid limit 10;
 appid | rowid |   userid   
-------+-------+------------
     1 |     0 | 1802787010
     1 |     1 | 1342147584
     1 |     2 | 1560458710
     1 |     3 | 1478701081
     1 |     4 | 1826138023
     1 |     5 |  182295180
     1 |     6 | 1736227913
     1 |     7 |  512247294
     1 |     8 |  686842950
     1 |     9 | 1940486738
(10 rows)

postgres=# select min(rowid),max(rowid),count(*),appid from t_userid_dic group by appid;
 min |   max   |  count  | appid 
-----+---------+---------+-------
   1 | 1533534 | 1533534 |     1
(1 row)

Data merging is wrapped into the following function

Preface: If you need to process updates and mergers for a user dictionary with one function, make sure to do so on the repeatable read isolation level to guarantee that the processed dictionary data will be consistent with the merged data.

We updated the dictionary above, so next we can merge data from t_user_tags into t_tags.

Considering that it may be slow to update T_TAGS, you should increase the degree of parallelism as much as possible so that you can process multiple TAGs at once.

-- Do not apply an APPID mode and an APPID+tag mode to the same APPID in parallel processes.

create or replace function merge_tags(
  v_appid int,      -- Input APPID to be processed 
  v_tag int,        -- Input tags to be processed, input null to process all of the APPID's tags, and make sure you don't use a strict function, otherwise NULL will be returned for a NULL parameter
  v_rows int        -- Input how many records will be processed, i.e., the amount to be merged  
) returns void as 
$$

declare
  min_id int;       -- Boundary id
  ad_lockid int8;   -- Be able to effectively avoid conflicts during parallel operations on different APPIDs

  -- Loop
  i_tag int;
  i_userid_del int8[];
  i_userid_add int8[];

  -- Convert userid[] into ROWID[]
  i_rowid_del int[];
  i_rowid_add int[];

  -- Temporary storage, used for debugging
  i_userids_del varbit := '0'::varbit;
  i_userids_add varbit := '0'::varbit;
begin
  -- Calculate ad_lockid
  if v_tag is not null then
    ad_lockid := (v_appid+1)^2 + (v_tag+1)^2;
  else
    ad_lockid := (v_appid+1)^2;
  end if;
  
  -- Obtain parallel protection lock
  if not pg_try_advisory_xact_lock(ad_lockid) then
    raise notice 'cann''t parallel merge same tag with ad_lockid: %', ad_lockid;
    return;
  end if;

  -- The number of rows processed
  if v_rows<1 then
    raise notice 'v_rows must >=1';
    return;
  end if;

  -- Determine whether to process a single tag 
  if v_tag is not null then 
    -- raise notice 'v_tag: %', v_tag;
    -- Process a single tag 
    
    -- Query APPID, where the tag corresponds to the minimum boundary. Return if no record is found
    select min(id) into min_id from t_user_tags where dic=true and appid=v_appid and tag=v_tag;
    if not found then
      raise notice 'no data for appid:% , tag:% ', v_appid, v_tag;
      return;
    end if;
    
    -- Obtain real-time data, delete real-time data, aggregate real-time data, and merge to TAGs
    -- It is recommended to add delete|update limit syntax to RDS PostgreSQL, and discard with id>=min_id for update
    -- there is a bug where the merger ignores the sequence in which TAGs are added or deleted. For example, if the user of a certain APPID adds a TAG and then deletes it, the final merger will still include the TAG.
    -- A window function can solve this problem: amend the query of array_agg, merge data before aggregation, and use ins, row_number() over (partition by appid,userid,tag order by id desc) rn .... where rn=1, taking the last ins value for each appid, userid, and tag  
    for i_tag, i_rowid_del, i_rowid_add in 
      with tmp as (select * from t_user_tags t where dic=true and appid=v_appid and tag=v_tag and id>=min_id order by id limit v_rows for update), 
           tmp0 as (select * from ( select *, row_number() over (partition by appid,userid,tag order by id desc) as rn from tmp ) as ss where ss.rn=1),  -- For the same appid and userid, use window to retrieve the last state
           tmp1 as (delete from t_user_tags t where exists (select 1 from tmp where tmp.id=t.id)) 
       select t1.tag, array_remove(array_agg(case when not t1.ins then t2.rowid else null end), null) code_del, array_remove(array_agg(case when t1.ins then t2.rowid else null end), null) code_add 
         from tmp3 t1 join t_userid_dic t2 on (t1.type=t2.type and t1.code=t2.code and t2.appid=v_appid) group by t1.tag_name   
           -- select tag, array_agg(case when not ins then userid else null end) userid_del, array_agg(case when ins then userid else null end) userid_add from tmp group by tag 
    loop 
      -- Determine whether there is a TAG; if yes, update; otherwise, insert
      perform 1 from t_tags where appid=v_appid and tag=i_tag;
      if found then
        update t_tags set userids = set_bit_array( set_bit_array(userids, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit where appid=v_appid and tag=i_tag;
      else
        insert into t_tags(appid, tag, userids) values (v_appid, i_tag, set_bit_array( set_bit_array('0'::varbit, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit);
      end if;
    end loop;

  else
    -- Process all the tags
    
    -- Search for the minimum boundary of the APPID, and return directly if no record is found
    select min(id) into min_id from t_user_tags where dic=true and appid=v_appid;
    if not found then
      raise notice 'no data for appid:%', v_appid;
      return;
    end if;
  
    -- Obtain real-time data, delete real-time data, aggregate real-time data, and merge to TAGs
    -- there is a bug where the merger ignores the sequence in which TAGs are added or deleted. For example, if the user of a certain APPID adds a TAG and then deletes it, the final merger will still include the TAG.
    -- A window function can solve this problem: amend the query of array_agg, merge data before aggregation, and use ins, row_number() over (partition by appid,userid,tag order by id desc) rn .... where rn=1, taking the last ins value for each appid, userid, and tag
    for i_tag, i_rowid_del, i_rowid_add in 
      with tmp as (select * from t_user_tags t where dic=true and appid=v_appid and id>=min_id order by id limit v_rows for update), 
           tmp0 as (select * from ( select *, row_number() over (partition by appid,userid,tag order by id desc) as rn from tmp ) as ss where ss.rn=1),  -- For the same appid and userid, use window to retrieve the last state
           tmp1 as (delete from t_user_tags t where exists (select 1 from tmp where tmp.id=t.id)) 
           select t1.tag, array_remove(array_agg(case when not t1.ins then t2.rowid else null end), null) code_del, array_remove(array_agg(case when t1.ins then t2.rowid else null end), null) code_add 
         from tmp3 t1 join t_userid_dic t2 on (t1.type=t2.type and t1.code=t2.code and t2.appid=v_appid) group by t1.tag_name   
       -- select tag, array_agg(case when not ins then userid else null end) userid_del, array_agg(case when ins then userid else null end) userid_add from tmp group by tag
    loop
      -- execute format('select coalesce(array_agg(rowid), array[]::int[]) from t_userid_dic where appid=%L and userid = any (%L)', v_appid, array_remove(i_userid_del, null) ) into i_rowid_del;
      -- execute format('select coalesce(array_agg(rowid), array[]::int[]) from t_userid_dic where appid=%L and userid = any (%L)', v_appid, array_remove(i_userid_add, null) ) into i_rowid_add;


      -- Determine whether there is a TAG; if yes, update; otherwise, insert
      perform 1 from t_tags where appid=v_appid and tag=i_tag;
      if found then
        update t_tags set userids = set_bit_array( set_bit_array(userids, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit where appid=v_appid and tag=i_tag;
      else
        insert into t_tags(appid, tag, userids) values (v_appid, i_tag, set_bit_array( set_bit_array('0'::varbit, 0, 0, i_rowid_del), 1, 0, i_rowid_add )::text::varbit);
      end if;
    end loop;

  end if;

end;

$$
 language plpgsql;
-- Do not use strict

Speed test

$ vi test.sql

\set tag random(1,10000)
select merge_tags(1,:tag,10000);

$ pgbench -M prepared -n -r -P 1 -f ./test.sql -c 128 -j 128 -T 100

Merges about 150,000 records per second.

This verification method requires the data of the merge result to be consistent with the merged data.

Conformity.

truncate t_tags;


select update_dict(100000);



create table check_merge1 as select tag,count(*) cnt from (select tag,userid from t_user_tags where dic=true order by id limit 10000) t group by tag;



select merge_tags(1,null,10000);
。。。。。。
select merge_tags(1,null,10000);


create table check_merge2 as select tag, count_bit(userids,1) cnt from t_tags; 


postgres=# select t1.*,t2.* from check_merge1 t1 full outer join check_merge2 t2 on (t1.tag=t2.tag and t1.cnt=t2.cnt) where t1.* is null or t2.* is null;
 tag | cnt | tag | cnt 
-----+-----+-----+-----
(0 rows)
Time: 5.133 ms

If a result is returned, it means that there was an error with the merger.

Query which TAGs belong to a user

Find the rowid corresponding to the userid, and determine whether the tag exists according to the bit at the userid's rowid.

select tag from (select tag, get_bit(t1.userids, t2.rowid-1) bt from t_tags t1, (select rowid from t_userid_dic where userid=?) t2) t where bt=1;

Example, pay attention to the alignment (or improving the get_bit function, to support an operation without BITs)

postgres=# \set FETCH_COUNT 1
postgres=# select tag from (select tag, get_bit(t1.userids, t2.rowid-1) bt from t_tags t1, (select rowid from t_userid_dic where userid=100000) t2) t where bt=1;
  tag   
--------
 226813
 226824
 226818
 226810
 226782
 226790
 226792
 226787
 226803
 226826
(10 rows)
Time: 152.636 ms

The process of querying which TAGs belong to a single user is a heavy operation, and if there are a lot of TAGs and users, then it is recommended to do so through parallel processing.

Configuring parallelism parameters

postgres=# show parallel_tuple_cost;
 parallel_tuple_cost 
---------------------
 0
postgres=# show parallel_setup_cost ;
 parallel_setup_cost 
---------------------
 0
postgres=# show max_parallel_workers_per_gather ;
 max_parallel_workers_per_gather 
---------------------------------
 27
postgres=# show max_worker_processes ;
 max_worker_processes 
----------------------
 128
postgres=# show force_parallel_mode ;
 force_parallel_mode 
---------------------
 on
postgres=# alter table t_tags set (parallel_workers=27);
ALTER TABLE

Returning user data after processing each tag in parallel takes about 0.76 ms.

If a return is executed using a cursor, the first user can then be obtained quickly.

postgres=# explain (analyze,verbose,costs,buffers,timing) select array_agg(tag) from t_tags where get_bit(userids,10000)=1;
                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1047.68..1047.69 rows=1 width=32) (actual time=7176.745..7176.745 rows=1 loops=1)
   Output: array_agg(tag)
   Buffers: shared hit=15912565
   ->  Gather  (cost=0.00..1047.55 rows=50 width=4) (actual time=8.940..7175.486 rows=4957 loops=1)
         Output: tag
         Workers Planned: 27
         Workers Launched: 27
         Buffers: shared hit=15912565
         ->  Parallel Seq Scan on public.t_tags  (cost=0.00..1047.55 rows=2 width=4) (actual time=51.974..6023.333 rows=177 loops=28)
               Output: tag
               Filter: (get_bit((t_tags.userids)::"bit", 10000) = 1)
               Rows Removed by Filter: 180
               Buffers: shared hit=15909973
               Worker 0: actual time=24.376..5791.799 rows=158 loops=1
                 Buffers: shared hit=528366
               Worker 1: actual time=23.143..6749.264 rows=198 loops=1
                 Buffers: shared hit=632954
               Worker 2: actual time=55.201..6400.872 rows=186 loops=1
                 Buffers: shared hit=604388
               Worker 3: actual time=23.456..5351.070 rows=152 loops=1
                 Buffers: shared hit=482151
               Worker 4: actual time=71.248..6179.161 rows=181 loops=1
                 Buffers: shared hit=580237
               Worker 5: actual time=124.402..5395.424 rows=140 loops=1
                 Buffers: shared hit=493010
               Worker 6: actual time=111.926..6013.077 rows=168 loops=1
                 Buffers: shared hit=553851
               Worker 7: actual time=24.483..7170.148 rows=202 loops=1
                 Buffers: shared hit=677578
               Worker 8: actual time=23.284..5448.081 rows=139 loops=1
                 Buffers: shared hit=487985
               Worker 9: actual time=54.677..7057.927 rows=233 loops=1
                 Buffers: shared hit=666715
               Worker 10: actual time=73.070..6615.151 rows=177 loops=1
                 Buffers: shared hit=622393
               Worker 11: actual time=25.978..5977.110 rows=182 loops=1
                 Buffers: shared hit=552329
               Worker 12: actual time=22.975..5366.569 rows=150 loops=1
                 Buffers: shared hit=480447
               Worker 13: actual time=76.756..6940.743 rows=201 loops=1
                 Buffers: shared hit=655799
               Worker 14: actual time=54.590..5362.862 rows=161 loops=1
                 Buffers: shared hit=482488
               Worker 15: actual time=106.099..5454.446 rows=153 loops=1
                 Buffers: shared hit=494638
               Worker 16: actual time=53.649..6048.233 rows=165 loops=1
                 Buffers: shared hit=553771
               Worker 17: actual time=23.089..5810.984 rows=160 loops=1
                 Buffers: shared hit=532711
               Worker 18: actual time=55.039..5981.338 rows=165 loops=1
                 Buffers: shared hit=542380
               Worker 19: actual time=24.163..6187.498 rows=182 loops=1
                 Buffers: shared hit=571046
               Worker 20: actual time=23.965..6119.395 rows=194 loops=1
                 Buffers: shared hit=566214
               Worker 21: actual time=106.038..6238.629 rows=187 loops=1
                 Buffers: shared hit=582724
               Worker 22: actual time=54.568..6488.311 rows=183 loops=1
                 Buffers: shared hit=613989
               Worker 23: actual time=24.021..5368.295 rows=152 loops=1
                 Buffers: shared hit=488385
               Worker 24: actual time=53.327..5658.396 rows=178 loops=1
                 Buffers: shared hit=515591
               Worker 25: actual time=23.201..5358.615 rows=142 loops=1
                 Buffers: shared hit=483975
               Worker 26: actual time=109.940..5560.662 rows=163 loops=1
                 Buffers: shared hit=505844
 Planning time: 0.081 ms
 Execution time: 7637.509 ms
(69 rows)
Time: 7638.100 ms

postgres=# select tag from t_tags where get_bit(userids,10000)=1;
  tag   
--------
 226813
 226824
Cancel request sent
ERROR:  canceling statement due to user request
Time: 17.521 ms

Profile: the bottleneck of get_bit is memcpy, but this can be improved through PG kernel optimization  
             3647.00 34.2% memcpy                        /lib64/libc-2.12.so

Sharding

A large APPID is sharded according to the USER segment

APPID + segment shard

If an APPID includes 10,000 TAGs, 100 million users only occupies 120 GB.

Usually, redistribution is only needed when skews appear. PostgreSQL uses postgres_fdw to provide native support for data sharding. The batch of TAGs for a single APPID must be located at one node.

Dictionary Translation

Dictionary translation obtains dictionary values from the bit position. Assume that the dictionary ID is imei+id (id is a seamless auto-incrementing ID). How can we obtain the corresponding imei from the bit position?

create table imei_dict(
  id int primary key,
  imei text
);
select imei from imei_dict where id = any (bit_posite(....));

You may also use the cursor to improve the instant response speed.

This SQL is fast. It uses index scanning, and only takes 380 milliseconds to query 1 million records from 100 million.

Community

Creating a Real-Time User Profile Recommendation System with PostgreSQL (2)

Implementation Details for Bit Solution 2

ApsaraDB RDS for PostgreSQL's New Bit Operation Functions

Table Structure Design

Test Query Performance for Combinations of TAGs

Test Performance When Adding Data

Test Performance When Merging Data

Sharding

Dictionary Translation

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

ApsaraDB RDS for PostgreSQL

PolarDB for PostgreSQL

ApsaraDB RDS for MySQL

ApsaraDB RDS for MariaDB