All Products
Search
Document Center

ApsaraDB RDS:Real-time precision marketing (Audience Segmentation)

Last Updated:Mar 30, 2026

Tag-based user group selection is a core requirement across e-commerce, gaming, education, and content platforms. At scale — tens of millions of users, hundreds of thousands of tags — standard database approaches fail on storage, query speed, or both. This topic compares three PostgreSQL schema models for querying users by tag combinations and explains when to use each approach.

Prerequisites

Before you begin, make sure you have:

Solution 3 requires the roaringbitmap extension and PostgreSQL 12 or later. The examples in this topic use PostgreSQL 12. For extension details, see Use the roaringbitmap extension.

Background

Precision marketing systems need to query users by any combination of tags in real time — identifying, for example, all male users over 24 who browsed mobile phones in the last 24 hours but have not yet placed an order. The challenge is doing this quickly across tens of millions of users with tens of thousands of tags.

Common pain points:

  • Tag counts grow into the hundreds of thousands, pushing against database field limits (typically 1,000 fields per table).

  • Storing tags as arrays requires Generalized Inverted Index (GIN) support, which not all databases provide and which consumes significant storage.

  • Query condition combinations are unpredictable, so static indexes cannot cover all cases.

  • User profile data must stay near real-time — a user who placed an order last night should not receive ads for that same product today.

This topic describes three solutions built on PostgreSQL that address these pain points with different trade-offs.

Choose a solution

Solution 1 Solution 2 Solution 3
Database support PostgreSQL and MySQL PostgreSQL only PostgreSQL only
Schema model One row per user per tag One row per user, tag array One row per tag, user bitmap
Index type B-tree, one per tag field GIN on tag array B-tree on tag ID
AND query speed 1.5 s 0.042 s 0.0015 s
OR query speed 3.6 s 3 s 0.0017 s
Table storage 63,488 MB 3,126 MB 1,390 MB
Index storage 62,464 MB 3,139 MB 2 MB
Best for Simple setups or MySQL constraints Moderate scale with PostgreSQL Large-scale, real-time requirements
Test environment: RDS instances running MySQL 8.0 and PostgreSQL 12, with 8 CPU cores, 32 GB of memory, and an Enhanced SSD (ESSD) of 1,500 GB. Data set: 20 million users, 100,000 tags, 64 tags per user (1.28 billion total tag records).

Use Solution 3 for production systems with millions of users and real-time query requirements. Use Solution 1 if you must support MySQL. Use Solution 2 as a middle ground when storage is more constrained than query speed.

Solution 1: One row per user per tag

This solution works with both PostgreSQL and MySQL.

Schema overview

KEY:    user ID
VALUES: one row per tag (tag 1, tag 2, ... tag N)
  • Index: B-tree index on each tag field

  • Search: WHERE tag = a AND tag = b (with INTERSECT/UNION)

Disadvantages

  • High storage usage: one index per tag field across 20 million users produces 62,464 MB of index storage alone.

  • Adding a new tag requires inserting rows for every user in the group.

  • Query performance degrades with more tags in the filter condition.

Procedure

  1. Create a tag dictionary table, with one row per tag:

    CREATE TABLE t_tag_dict (
      tag      int PRIMARY KEY,   -- Tag ID
      info     text,              -- Tag description
      crt_time timestamp          -- Creation time
    );
  2. Insert 100,000 tags:

    INSERT INTO t_tag_dict VALUES (1, 'Male', now());
    INSERT INTO t_tag_dict VALUES (2, 'Female', now());
    INSERT INTO t_tag_dict VALUES (3, 'Older than 24 years old', now());
    -- ... additional tags ...
    
    INSERT INTO t_tag_dict
    SELECT generate_series(4, 100000), md5(random()::text), clock_timestamp();
  3. Create a user profile table, with one row per user per tag:

    CREATE TABLE t_user_tag (
      uid      int8,       -- User ID
      tag      int,        -- Tag ID
      mod_time timestamp,  -- Last modified
      PRIMARY KEY (tag, uid)
    );
  4. Populate 20 million users with 64 random tags each (1.28 billion rows total):

    CREATE OR REPLACE FUNCTION gen_rand_tag(int, int) RETURNS SETOF int AS
    $$
      SELECT CASE WHEN random() > 0.5 THEN 1::int ELSE 2::int END AS tag
      UNION ALL
      SELECT ceil(random() * $1)::int AS tag FROM generate_series(1, $2);
    $$ LANGUAGE sql STRICT VOLATILE;
    
    INSERT INTO t_user_tag
    SELECT uid, gen_rand_tag(100000, 63) AS tag, clock_timestamp()
    FROM generate_series(1, 20000000) AS uid
    ON CONFLICT (uid, tag) DO NOTHING;
  5. Query users matching tags 1 AND 3:

    -- Count matching users
    SELECT count(*) FROM (
      SELECT uid FROM t_user_tag WHERE tag = 1
      INTERSECT
      SELECT uid FROM t_user_tag WHERE tag = 3
    ) t;
    -- Time: 1,494 ms
    
    -- Get user IDs
    SELECT uid FROM t_user_tag WHERE tag = 1
    INTERSECT
    SELECT uid FROM t_user_tag WHERE tag = 3;
    -- Time: 3,246 ms
  6. Query users matching tag 1 OR 3 OR 10 OR 200:

    -- Count matching users
    SELECT count(*) FROM (
      SELECT uid FROM t_user_tag WHERE tag = 1
      UNION SELECT uid FROM t_user_tag WHERE tag = 3
      UNION SELECT uid FROM t_user_tag WHERE tag = 10
      UNION SELECT uid FROM t_user_tag WHERE tag = 200
    ) t;
    -- Time: 3,578 ms
    
    -- Get user IDs
    SELECT uid FROM t_user_tag WHERE tag = 1
    UNION SELECT uid FROM t_user_tag WHERE tag = 3
    UNION SELECT uid FROM t_user_tag WHERE tag = 10
    UNION SELECT uid FROM t_user_tag WHERE tag = 200;
    -- Time: 5,682 ms

Solution 2: Tag arrays with GIN indexes

This solution requires PostgreSQL. MySQL does not support arrays or GIN indexes.

Schema overview

KEY:    user ID
VALUES: array of tag IDs
  • Index: GIN on the tag array field

  • Search operators: @> (AND/contains), && (OR/overlaps), NOT @> (NOT)

Disadvantages

  • GIN indexes consume substantially more storage than B-tree indexes for large data sets.

  • Adding tags to a user requires updating their row, which can be expensive at scale.

  • GIN writes use a deferred fastupdate mechanism: updates accumulate in a pending list and are flushed in batches. When the pending list reaches gin_pending_list_limit (default 4 MB), a flush occurs during a write operation, causing periodic latency spikes. For real-time user profile update workloads, monitor this behavior and consider tuning gin_pending_list_limit or disabling fastupdate if write latency consistency is critical.

Procedure

  1. Create a tag dictionary table (same as Solution 1):

    CREATE TABLE t_tag_dict (
      tag      int PRIMARY KEY,
      info     text,
      crt_time timestamp
    );
  2. Insert 100,000 tags:

    INSERT INTO t_tag_dict VALUES (1, 'Male', now());
    INSERT INTO t_tag_dict VALUES (2, 'Female', now());
    INSERT INTO t_tag_dict VALUES (3, 'Older than 24 years old', now());
    
    INSERT INTO t_tag_dict
    SELECT generate_series(4, 100000), md5(random()::text), clock_timestamp();
  3. Create a user profile table with one row per user and a tag array:

    CREATE TABLE t_user_tags (
      uid      int8 PRIMARY KEY,  -- User ID
      tags     int[],             -- Array of tag IDs
      mod_time timestamp
    );
  4. Create a helper function to generate random tag arrays:

    CREATE OR REPLACE FUNCTION gen_rand_tags(int, int) RETURNS int[] AS $$
      SELECT array_agg(ceil(random() * $1)::int) FROM generate_series(1, $2);
    $$ LANGUAGE sql STRICT;
  5. Tag 20 million users with 64 random tags each (10 million male, 10 million female):

    INSERT INTO t_user_tags
    SELECT generate_series(1, 10000000),
      array_append(gen_rand_tags(100000, 63), 1), now();
    
    INSERT INTO t_user_tags
    SELECT generate_series(10000001, 20000000),
      array_append(gen_rand_tags(100000, 63), 2), now();
  6. Create a GIN index on the tag array:

    CREATE INDEX idx_t_user_tags_1 ON t_user_tags USING gin (tags);
    -- Index creation time: ~20 minutes
  7. Query users matching tags 1 AND 3:

    -- Count matching users
    SELECT count(uid) FROM t_user_tags WHERE tags @> ARRAY[1, 3];
    
    -- Get user IDs
    SELECT uid FROM t_user_tags WHERE tags @> ARRAY[1, 3];
  8. Query users matching tag 1 OR 3 OR 10 OR 200:

    -- Count matching users
    SELECT count(uid) FROM t_user_tags WHERE tags && ARRAY[1, 3, 10, 200];
    
    -- Get user IDs
    SELECT uid FROM t_user_tags WHERE tags && ARRAY[1, 3, 10, 200];

Solution 3: Roaring bitmap aggregation

This solution requires PostgreSQL with the roaringbitmap extension. MySQL does not support this extension.

How it works

The roaring bitmap model inverts the schema: instead of storing which tags a user has, it stores — for each tag — a compressed bitmap of all users who carry that tag. Think of a roaringbitmap as a very large bit string where each bit position represents a user ID; if bit *n* is set, user *n* has the tag.

The Roaring Bitmap algorithm divides 32-bit integers into 2^16 chunks based on the high 16 bits, and stores the low 16 bits in a container. Two container types are used automatically: an array container for sparse data (fewer than 4,096 integers) and a bitmap container for dense data. This adaptive structure is why the index shrinks from 62,464 MB in Solution 1 to just 2 MB here, and why AND/OR queries complete in under 2 milliseconds.

Schema overview

KEY:    tag ID
VALUES: compressed bitmap of user IDs
  • Index: B-tree on tag ID

  • Operations: rb_and_agg / rb_and_cardinality_agg (AND), rb_or_agg / rb_or_cardinality_agg (OR), rb_build_agg (build bitmap)

Advantages

  • Minimal storage: one B-tree index with one entry per tag (at most a few hundred thousand entries).

  • Adding a tag to a user group only requires inserting or updating one bitmap row — no mass row updates.

  • AND or OR queries run in under 2 milliseconds on 20 million users.

Limitations

  • User IDs must be integers. If your system uses non-integer IDs, create a mapping table.

  • The standard roaringbitmap type supports 32-bit integers (up to ~4 billion user IDs). If your user IDs exceed 4 billion, use the uid_offset approach described below.

Procedure

For more information about the roaringbitmap extension, see pg_roaringbitmap. For handling UID overflow, see Troubleshooting for UID overflow.
  1. Install the roaringbitmap extension:

    CREATE EXTENSION roaringbitmap;
  2. Create a bitmap table that maps each tag to the set of users who have it:

    CREATE TABLE t_tag_users (
      tagid      int PRIMARY KEY,   -- Tag ID
      uid_offset int,               -- Offset bucket (converts INT8 user IDs to INT4 range)
      userbits   roaringbitmap,     -- Compressed bitmap of user IDs
      mod_time   timestamp
    );
  3. Populate the bitmap table from the t_user_tags table created in Solution 2:

    INSERT INTO t_tag_users
    SELECT tagid, uid_offset, rb_build_agg(uid::int) AS userbits
    FROM (
      SELECT
        unnest(tags) AS tagid,
        (uid / (2^31)::int8) AS uid_offset,   -- High bits as offset bucket
        mod(uid, (2^31)::int8) AS uid          -- Low bits as bitmap position
      FROM t_user_tags
    ) t
    GROUP BY tagid, uid_offset;
  4. Query users matching tags 1 AND 3:

    -- Count matching users
    SELECT sum(ub) FROM (
      SELECT uid_offset, rb_and_cardinality_agg(userbits) AS ub
      FROM t_tag_users
      WHERE tagid IN (1, 3)
      GROUP BY uid_offset
    ) t;
    -- Time: 1.5 ms
    
    -- Get the bitmap result (user IDs reconstructed from offset + bitmap)
    SELECT uid_offset, rb_and_agg(userbits) AS ub
    FROM t_tag_users
    WHERE tagid IN (1, 3)
    GROUP BY uid_offset;
  5. Query users matching tag 1 OR 3 OR 10 OR 200:

    -- Count matching users
    SELECT sum(ub) FROM (
      SELECT uid_offset, rb_or_cardinality_agg(userbits) AS ub
      FROM t_tag_users
      WHERE tagid IN (1, 3, 10, 200)
      GROUP BY uid_offset
    ) t;
    -- Time: 1.7 ms
    
    -- Get the bitmap result
    SELECT uid_offset, rb_or_agg(userbits) AS ub
    FROM t_tag_users
    WHERE tagid IN (1, 3, 10, 200)
    GROUP BY uid_offset;

Summary

ApsaraDB RDS for PostgreSQL 12 and later supports the roaringbitmap extension, which enables efficient bitmap generation, compression, and aggregation with AND, OR, NOT, and XOR operations. For precision marketing systems with hundreds of millions of users and tens of millions of tags, Solution 3 delivers query times under 2 milliseconds and index storage under 2 MB — meeting both the speed and scale requirements that simpler schema models cannot satisfy.

What's next