This topic describes how to use ApsaraDB RDS for PostgreSQL 12 to implement precision marketing on target users in real time.

Background information

Real-time precision marketing is required in almost all industries, such as the Internet, gaming, and education. Real-time precision marketing allows you to generate and filter user profiles to rapidly locate target users in different industries.

  • E-commerce industry: Merchants push advertisements or deliver suitable marketing activities to target users based on user characteristics.
  • Gaming industry: Merchants present gifts to players based on player characteristics to improve player activation.
  • Education industry: Teachers push targeted exercises to students based on student characteristics to help students learn.
  • Search, portal, and video website industries: Merchants push specific contents to users based on user concerns.

These industries have the same pain points:

  • The huge volume of data requires a large number of computations.
  • The large number of user tags and fields consume large amounts of storage space.
  • The number of fields in a database may exceed the limit. Typically, a database can only contain up to 1,000 fields.
  • A database that supports inverted indexes can use an array instead of multiple fields to store tags. However, not all databases support inverted indexes.
  • The use of an array instead of multiple fields to store tags and the support for inverted indexes require large amounts of storage space.
  • A wide range of combinations are used for selection conditions. The use of a single index for each field without a fixed index also requires large amounts of storage space.
  • High performance is required for quick responses of real-time marketing.
  • Real-time updates of user data and profiles are required to select users with precision. For example, assume that a user browses mobile phones and makes an order at that night. If merchants update user data the next day, their selected target user will be imprecise.

Common services such as ApsaraDB RDS for MySQL have limited resources and cannot handle target user selection in real time.

You can use one of the following solutions to implement precision marketing in real time based on ApsaraDB RDS for PostgreSQL.

Before you begin

Solution 1

Note Solution 1 is supported for both PostgreSQL and MySQL.

The following table schema is used:

KEY: the user ID
Tag 1:
Tag 2:
... Tag N:

The following index is used:

One index for each tag field

The following search method is used:

Combination of AND, OR, and NOT where Tag a and Tag b and ...

Disadvantages

  • The large number of user tags and fields consume large amounts of storage space.
  • The number of fields in a database may exceed the limit. Typically, a database can only contain up to 1,000 fields. To solve this limit problem, you can use a many-to-many structure to maintain a single record for each tag.
  • A wide range of combinations are used for selection conditions. The use of a single index for each field without a fixed index also requires large amounts of storage space.
  • A large amount of data must be updated when new group tags are added.
  • The query performance is poor.

Procedure

  1. Create a group table, with each record representing a group of people. Example:
    create table t_tag_dict (
    tag int primary key,   -- Tag (group) ID info text,  -- Group description crt_time timestamp  -- Time
    ); 
  2. Create 100,000 group tags. Example:
    insert into t_tag_dict values (1, 'Male', now());
    insert into t_tag_dict values (2, 'Female', now());
    insert into t_tag_dict values (3, 'Older than 24 years old', now());
    -- ...
     insert into t_tag_dict 
    select generate_series(4,100000), md5(random()::text), clock_timestamp();
  3. Create a user profile table, with each record of a user representing a tag of the user. Example:
    create table t_user_tag (
    uid int8,   -- User ID tag int,            -- Tag (group) of the user mod_time timestamp,     -- Time
    primary key (tag,uid)
    );
  4. Set 64 random tags for each of the 10 million male and 10 million female users to generate a total of 1.28 billion records. Example:
    create or replace function gen_rand_tag(int,int) returns setof int as
    $$
      select case when random() > 0.5 then 1::int else 2::int end as tag
      union all
      select ceil(random()*$1)::int as tag from generate_series(1,$2);
    $$ language sql strict volatile;
     insert into t_user_tag
    select uid, gen_rand_tag(100000,63) as tag, clock_timestamp() 
    from generate_series(1,20000000) as uid on conflict (uid,tag) do nothing;
     -- You can also use the following method to import tags: create sequence seq;
     vi test.sql
    insert into t_user_tag
    select uid, gen_rand_tag(100000,63) as tag, clock_timestamp() 
    from nextval('seq'::regclass) as uid 
    on conflict(tag,uid) do nothing;
     pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 400000
  5. Query users who match with tags 1 and 3. Example:
    1. Number of users select count(*) from 
    (
    select uid from t_user_tag where tag=1 
    intersect
    select uid from t_user_tag where tag=3
    ) t;
    -- Time: 1494.789 ms (00:01.495)
     2. User IDs select uid from t_user_tag where tag=1 
    intersect
    select uid from t_user_tag where tag=3;
    -- Time: 3246.184 ms (00:03.246)
  6. Query users who match with tags 1, 3, 10, or 200. Example:
    1. Number of users select count(*) from 
    (
    select uid from t_user_tag where tag=1 
    union
    select uid from t_user_tag where tag=3
    union
    select uid from t_user_tag where tag=10
    union
    select uid from t_user_tag where tag=200
    ) t;
    -- Time: 3577.714 ms (00:03.578)
     2. User IDs select uid from t_user_tag where tag=1 
    union
    select uid from t_user_tag where tag=3
    union
    select uid from t_user_tag where tag=10
    union
    select uid from t_user_tag where tag=200;
    -- Time: 5682.458 ms (00:05.682)

Solution 2

Note Solution 2 is supported only for PostgreSQL. MySQL does not support arrays or inverted indexes.

The following table schema is used:

KEY: the user ID VALUES: the tag array

The following index is used:

Tag array field: generalized inverted index (GIN)

The following search method is used:

AND, OR, and NOT
where VALUES @> array[Tags] -- AND
where VALUES && array[Tags] -- OR where not VALUES @> array[Tags] -- NOT

Disadvantages

  • A database that supports inverted indexes can use an array instead of multiple fields to store tags. However, not all databases support inverted indexes.
  • The use of an array instead of multiple fields to store tags and the support for inverted indexes require large amounts of storage space.
  • A large amount of data must be updated when new group tags are added.

Procedure

  1. Create a group table, with each record representing a group of people. Example:
    create table t_tag_dict (
    tag int primary key,   -- Tag (group) ID info text,  -- Group description crt_time timestamp  -- Time
    ); 
  2. Create 100,000 group tags. Example:
    insert into t_tag_dict values (1, 'Male', now());
    insert into t_tag_dict values (2, 'Female', now());
    insert into t_tag_dict values (3, 'Older than 24 years old', now());
    -- ...
     insert into t_tag_dict 
    select generate_series(4,100000), md5(random()::text), clock_timestamp();
  3. Create a user profile table, with the single record of each user representing an array of tags of the user. Example:
    create table t_user_tags (
    uid int8 primary key,   -- User ID tags int[],            -- Array of user tags (groups) mod_time timestamp     -- Time
    ); 
  4. Create a function to generate a random array of tags. Example:
    create or replace function gen_rand_tags(int,int) returns int[] as $$
      select array_agg(ceil(random()*$1)::int) from generate_series(1,$2);
    $$ language sql strict;
  5. Randomly select eight from 100,000 tags. Example:
    select gen_rand_tags(100000, 8);
    
                       gen_rand_tags                   
    ---------------------------------------------------
     {43494,46038,74102,25308,99129,40893,33653,29690}
    (1 row)
  6. Set 64 random tags for each of the 10 million male and 10 million female users. Example:
    insert into t_user_tags 
    select generate_series(1,10000000), 
    array_append(gen_rand_tags(100000, 63),1), now();
     insert into t_user_tags 
    select generate_series(10000001,20000000), 
    array_append(gen_rand_tags(100000, 63),2), now();
  7. Create an inverted index for the group tag fields. Example:
    create index idx_t_user_tags_1 on t_user_tags using gin (tags);
  8. Query users who match with tags 1 and 3. Example:
    1. Number of users select count(uid) from t_user_tags where tags @> array[1,3];
     2. User IDs select uid from t_user_tags where tags @> array[1,3];
  9. Query users who match with tags 1, 3, 10, or 200. Example:
    1. Number of users select count(uid) from t_user_tags where tags && array[1,3,10,200];
     2. User IDs select uid from t_user_tags where tags && array[1,3,10,200];

Solution 3

Note Solution 3 is supported only for PostgreSQL. MySQL does not support bitmaps.

Solution 3 uses the roaringbitmap plug-in to quickly query data. For more information, see Use the roaringbitmap plug-in.

The following table schema is used:

KEY: the tag ID
VALUES: the user bitmap

The following index is used:

Tag ID field: B-tree index

The following search method is used:

Aggregate bitmap: AND, OR, and NOT and_agg(bitmaps) where KEY in (Tags) -- AND
or_agg(bitmaps) where KEY in (Tags) -- OR
except(bitmap1,bitmap2) -- NOT
  • Advantages
    • Only a small amount of space is required to store tables.
    • Only a small amount of space is required to store indexes. Only one B-tree index is required to store tags. Typically, the number of tags is less than one million.
    • When a new group tag is added, only a single group bitmap record needs to be added and updates do not consume large amounts of data.
    • The query performance is excellent.
  • Disadvantages
    • The maximum length of a bitmap is 1 GB. If the number of users exceeds the maximum length, use an offset as follows:
      offset0_bitmap, offset1gb_bitmap, ... 
    • User IDs must be numbers and consecutive numbers are recommended. A mapping table is required if user IDs are not numbers.

Procedure

Note
  • When a user ID exceeds 4 billion (INT4), you can use an offset to convert the user ID. For more information about the conversion method, see Troubleshooting for UID overflow.
  • For more information about how to use roaringbitmap, see pg_roaringbitmap.
  1. Install the roaringbitmap plug-in. Example:
    create extension roaringbitmap;
  2. Create a bitmap table that contains user tags. Example:
    create table t_tag_users (
      tagid int primary key,   -- User tag (group) ID  uid_offset int,          -- Convert the user ID from INT8 to INT4.  userbits roaringbitmap,     -- Bitmap of user IDs  mod_time timestamp       -- Time
    );
  3. Insert data to generate a bitmap table that contains user ID tags. Example:
    insert into t_tag_users 
    select tagid, uid_offset, rb_build_agg(uid::int) as userbits from 
    (
    select 
      unnest(tags) as tagid, 
      (uid / (2^31)::int8) as uid_offset, 
      mod(uid, (2^31)::int8) as uid 
    from t_user_tags 
    ) t 
    group by tagid, uid_offset; 
  4. Query users who match with tags 1 and 3. Example:
    1. Number of users select sum(ub) from 
    (
    select uid_offset,rb_and_cardinality_agg(userbits) as ub 
    from t_tag_users 
    where tagid in (1,3) 
    group by uid_offset
    ) t;
     2. User IDs select uid_offset,rb_and_agg(userbits) as ub 
    from t_tag_users 
    where tagid in (1,3) 
    group by uid_offset;
  5. Query users who match with tags 1, 3, 10, or 200. Example:
    1. Number of users select sum(ub) from 
    (
    select uid_offset,rb_or_cardinality_agg(userbits) as ub 
    from t_tag_users 
    where tagid in (1,3,10,200) 
    group by uid_offset
    ) t;
     2. User IDs select uid_offset,rb_or_agg(userbits) as ub 
    from t_tag_users 
    where tagid in (1,3,10,200) 
    group by uid_offset;

Comparison between solutions

Item Solution 1

(MySQL and PostgreSQL)

Solution 2

(PostgreSQL)

Solution 3

(PostgreSQL)

Improvement of solution 3 than solution 1
Time used for AND queries of user selection 1.5 seconds 0.042 seconds 0.0015 seconds 99900%
Time used for OR queries of user selection 3.6 seconds 3 seconds 0.0017 seconds 211665%
Space usage (table) 63,488 MB 3,126 MB 1,390 MB 4467%
Space usage (index) 62,464 MB 3,139 MB 2 MB 3123100%
Time used for index creation N/A 20 minutes Extremely fast (about 0 seconds) N/A
Note ApsaraDB RDS for MySQL 8.0 and ApsaraDB RDS for PostgreSQL 12 instances used in these solutions have the following specifications: 8-core CPU, 32 GB memory, and Enhanced SSDs of 1,500 GB.

Summary

ApsaraDB RDS for PostgreSQL 12 allows you to use the roaringbitmap plug-in to generate, compress, and parse bitmap data. You can perform bitwise aggregate operations such as AND, OR, NOT, and XOR to implement precision marketing on target users in real time when ten millions of tags are used for hundreds of millions of users.

Compared with the MySQL-based solution, the PostgreSQL-based solution is more cost-effective.

Video tutorial

Real-time precision marketing

Promotions

CNY 9.9 trial for new users and 50% off upgrade for old users