This topic describes how to use ApsaraDB RDS for PostgreSQL 12 to implement precision marketing on target users in real time.
Prerequisites
Background information
Real-time precision marketing is required in almost all industries, such as the Internet, gaming, and education. Real-time precision marketing allows you to generate and filter user profiles to rapidly locate target users in different industries.
- E-commerce industry: Merchants push advertisements or deliver suitable marketing activities to target users based on user characteristics.
- Gaming industry: Merchants present gifts to players based on player characteristics to improve player activation.
- Education industry: Teachers push targeted exercises to students based on student characteristics to help students learn.
- Search, portal, and video website industries: Merchants push specific contents to users based on user concerns.
These industries have the same pain points:
- The huge volume of data requires a large number of computations.
- The large number of user tags and fields consume large amounts of storage space.
- The number of fields in a database may exceed the limit. Typically, a database can only contain up to 1,000 fields.
- A database that supports inverted indexes can use an array instead of multiple fields to store tags. However, not all databases support inverted indexes.
- The use of an array instead of multiple fields to store tags and the support for inverted indexes require large amounts of storage space.
- A wide range of combinations are used for selection conditions. The use of a single index for each field without a fixed index also requires large amounts of storage space.
- High performance is required for quick responses of real-time marketing.
- Real-time updates of user data and profiles are required to select users with precision. For example, assume that a user browses mobile phones and makes an order at that night. If merchants update user data the next day, their selected target user will be imprecise.
Common services such as ApsaraDB RDS for MySQL have limited resources and cannot handle target user selection in real time.
You can choose one of the following solutions to implement precision marketing in real time based on ApsaraDB RDS for PostgreSQL.
Solution 1
The following table schema is used:
KEY: the user ID
Tag 1:
Tag 2:
... Tag N:
The following index is used:
One index for each tag field
The following search method is used:
Combination of AND, OR, and NOT where Tag a and Tag b and ...
Disadvantages
- The large number of user tags and fields consume large amounts of storage space.
- The number of fields in a database may exceed the limit. Typically, a database can only contain up to 1,000 fields. To solve this limit problem, you can use a many-to-many structure to maintain a single record for each tag.
- A wide range of combinations are used for selection conditions. The use of a single index for each field without a fixed index also requires large amounts of storage space.
- A large amount of data must be updated when new group tags are added.
- The query performance is poor.
Procedure
Solution 2
The following table schema is used:
KEY: the user ID VALUES: the tag array
The following index is used:
Tag array field: generalized inverted index (GIN)
The following search method is used:
AND, OR, and NOT
where VALUES @> array[Tags] -- AND
where VALUES && array[Tags] -- OR where not VALUES @> array[Tags] -- NOT
Disadvantages
- A database that supports inverted indexes can use an array instead of multiple fields to store tags. However, not all databases support inverted indexes.
- The use of an array instead of multiple fields to store tags and the support for inverted indexes require large amounts of storage space.
- A large amount of data must be updated when new group tags are added.
Procedure
Solution 3
Solution 3 uses the roaringbitmap plug-in to quickly query data. For more information, see Use the roaringbitmap plug-in.
The following table schema is used:
KEY: the tag ID
VALUES: the user bitmap
The following index is used:
Tag ID field: B-tree index
The following search method is used:
Aggregate bitmap: AND, OR, and NOT and_agg(bitmaps) where KEY in (Tags) -- AND
or_agg(bitmaps) where KEY in (Tags) -- OR
except(bitmap1,bitmap2) -- NOT
- Advantages
- Only a small amount of space is required to store tables.
- Only a small amount of space is required to store indexes. Only one B-tree index is required to store tags. Typically, the number of tags is less than one million.
- When a new group tag is added, only a single group bitmap record needs to be added and updates do not consume large amounts of data.
- The query performance is excellent.
- Disadvantages
- The maximum length of a bitmap is 1 GB. If the number of users exceeds the maximum
length, use an offset as follows:
offset0_bitmap, offset1gb_bitmap, ...
- User IDs must be numbers and consecutive numbers are recommended. A mapping table is required if user IDs are not numbers.
- The maximum length of a bitmap is 1 GB. If the number of users exceeds the maximum
length, use an offset as follows:
Procedure
- When a user ID exceeds 4 billion (INT4), you can use an offset to convert the user ID. For more information about the conversion method, see Troubleshooting for UID overflow.
- For more information about how to use roaringbitmap, see pg_roaringbitmap.
Comparison between solutions
Item | Solution 1
(MySQL and PostgreSQL) |
Solution 2
(PostgreSQL) |
Solution 3
(PostgreSQL) |
Improvement of solution 3 than solution 1 |
---|---|---|---|---|
Time used for AND queries of user selection | 1.5 seconds | 0.042 seconds | 0.0015 seconds | 99900% |
Time used for OR queries of user selection | 3.6 seconds | 3 seconds | 0.0017 seconds | 211665% |
Space usage (table) | 63,488 MB | 3,126 MB | 1,390 MB | 4467% |
Space usage (index) | 62,464 MB | 3,139 MB | 2 MB | 3123100% |
Time used for index creation | N/A | 20 minutes | Extremely fast (about 0 seconds) | N/A |
Summary
ApsaraDB RDS for PostgreSQL 12 allows you to use the roaringbitmap plug-in to generate, compress, and parse bitmap data. You can perform bitwise aggregate operations such as AND, OR, NOT, and XOR to implement precision marketing on target users in real time when ten millions of tags are used for hundreds of millions of users.
Compared with the MySQL-based solution, the PostgreSQL-based solution is more cost-effective.
Video tutorial
Promotions
CNY 9.9 trial for new users and 50% off upgrade for old users