All Products
Search
Document Center

AnalyticDB:Funnel and retention functions

Last Updated:Mar 28, 2026

AnalyticDB for MySQL provides four built-in functions for funnel and retention analysis: window_funnel, retention, retention_range_count, and retention_range_sum. Use these functions to measure conversion rates across user journey steps and track how many users return over time.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for MySQL cluster running minor version 3.1.6.0 or later

Note

To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Background

Funnel analysis measures how users move through a defined sequence of steps — for example, from browsing a product to completing a purchase. At each step, some users drop off; the conversion rate shows how many make it through. This technique is widely used in traffic analysis and product goal conversion analysis.

AnalyticDB for MySQL supports the following functions:

FunctionPurpose
window_funnelCounts how far a user progressed through a defined event sequence within a sliding time window
retentionChecks whether each user satisfies a set of date-based conditions, returning a binary array
retention_range_countRecords per-user retention status across specified intervals, returning a 2-dimensional array
retention_range_sumAggregates retention_range_count output across all users to compute daily retention rates

Test dataset

The examples in this topic use User Behavior Data from Taobao for Recommendation from Tianchi Lab. The dataset contains four behavior types:

BehaviorDescription
pvProduct page view (counted as one click)
buyPurchase
cartAdd to shopping cart
favAdd to favorites

To load the dataset into AnalyticDB for MySQL, upload it to Object Storage Service (OSS) and then import it using an OSS external table.

  1. Upload the dataset to OSS. For more information, see Upload objects.

  2. Create an OSS external table.

    CREATE TABLE `user_behavior_oss` (
      `user_id` string,
      `item_id` string,
      `cate_id` string,
      `event` string,
      `ts` bigint
    ) ENGINE = 'oss'
    TABLE_PROPERTIES = '{
      "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
      "accessid":"******",
      "accesskey":"*******",
      "url":"oss://<bucket-name>/user_behavior/",
      "delimiter":","
    }'

    For more information, see OSS external table syntax.

  3. Create a test table in AnalyticDB for MySQL.

    CREATE TABLE user_behavior(
      uid string,
      event string,
      ts string
    )
  4. Import data from the OSS external table.

    SUBMIT JOB INSERT OVERWRITE user_behavior SELECT user_id, event, ts FROM user_behavior_oss;

window_funnel

window_funnel searches a user's event history for a specified sequence of events within a sliding time window. It returns the length of the longest matching prefix of that sequence.

How it works

The function processes each user's events in chronological order:

  1. It scans events to find the first occurrence of the initial condition in your sequence. When found, the sliding window starts.

  2. It then looks for each subsequent condition in order within the window. Each match advances the counter by one.

  3. If the sequence is interrupted or the window expires before all conditions match, the counter stops. The function returns the highest counter value reached.

Examples:

  • Event list [c1, c2, c3], user data [c1, c2, c3, c4] → returns 3 (full sequence matched)

  • Event list [c1, c2, c3], user data [c4, c3, c2, c1] → returns 1 (only c1 matched from start)

  • Event list [c1, c2, c3], user data [c4, c3] → returns 0 (first condition never matched)

Syntax

window_funnel(window, mode, timestamp, cond1, cond2, ..., condN)

Parameters

ParameterTypeDescription
windowIntegerSize of the sliding time window, in the same unit as the timestamp column
modeStringWorking mode. Set to "default"
timestampBIGINTTimestamp column. Must be of the BIGINT data type. If your timestamp column is not BIGINT, use TIMESTAMPDIFF to convert it. For example: TIMESTAMPDIFF('second', '2017-11-25 00:00:00.000', ts)
cond1, ..., condNBooleanEvent conditions that define the funnel steps, evaluated in order

Example

The following example analyzes the conversion path — browse → add to favorites → add to cart → purchase — from 2017-11-25 00:00:00 to 2017-11-26 00:00:00. The sliding time window is 30 minutes (1,800 seconds), and timestamps are in Unix format (1511539200 to 1511625600).

The query runs in two stages: the inner query computes a funnel depth per user, and the outer query counts how many users reached each depth.

SELECT
  funnel,
  count(1)
FROM (
  SELECT
    uid,
    window_funnel(
      cast(1800 as integer),  -- 30-minute sliding window
      "default",
      ts,
      event = 'pv',           -- Step 1: Browse
      event = 'fav',          -- Step 2: Add to favorites
      event = 'cart',         -- Step 3: Add to cart
      event = 'buy'           -- Step 4: Purchase
    ) AS funnel
  FROM user_behavior
  WHERE ts > 1511539200
    AND ts < 1511625600
  GROUP BY uid
)
GROUP BY funnel;

Sample result:

+--------+----------+
| funnel | count(1) |
+--------+----------+
|      0 |    19687 |
|      1 |   596104 |
|      2 |    78458 |
|      3 |    11640 |
|      4 |      746 |
+--------+----------+
5 rows in set (0.64 sec)

A value of 4 means the user completed all four steps. A value of 1 means only the first step (browse) was matched.

retention

retention checks whether a user satisfies a set of conditions — typically one condition per day — and returns a UINT8 array where each element is 1 (condition met) or 0 (condition not met).

Syntax

retention(cond1, cond2, ..., cond32)

Parameters

ParameterTypeDescription
cond1, ..., cond32UINT8Conditions to evaluate, up to 32. Returns 1 if the condition is met, 0 otherwise

Example

The following example measures 7-day retention starting November 25, 2017. sum(r[1]) counts users who were active on day 1; sum(r[2]) through sum(r[7]) count how many of those returned on each subsequent day.

Stage 1 — compute per-user retention arrays:

SELECT
  uid,
  retention(
    ds = '2017-11-25' AND event = 'pv',  -- Day 1: active (baseline)
    ds = '2017-11-25',                   -- Day 1 retention check
    ds = '2017-11-26',                   -- Day 2
    ds = '2017-11-27',                   -- Day 3
    ds = '2017-11-28',                   -- Day 4
    ds = '2017-11-29',                   -- Day 5
    ds = '2017-11-30'                    -- Day 6
  ) AS r
FROM user_behavior_date
GROUP BY uid

Stage 2 — aggregate across all users:

SELECT
  sum(r[1]),   -- Active users on day 1
  sum(r[2]),   -- Retained on day 1
  sum(r[3]),   -- Retained on day 2
  sum(r[4]),   -- Retained on day 3
  sum(r[5]),   -- Retained on day 4
  sum(r[6]),   -- Retained on day 5
  sum(r[7])    -- Retained on day 6
FROM (
  SELECT
    retention(
      ds = '2017-11-25' AND event = 'pv',
      ds = '2017-11-25',
      ds = '2017-11-26',
      ds = '2017-11-27',
      ds = '2017-11-28',
      ds = '2017-11-29',
      ds = '2017-11-30'
    ) AS r
  FROM user_behavior_date
  GROUP BY uid
);

Sample result:

+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| sum(r[1]) | sum(r[2]) | sum(r[3]) | sum(r[4]) | sum(r[5]) | sum(r[6]) | sum(r[7]) |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|    686953 |    686953 |    544367 |    529979 |    523516 |    524530 |    528105 |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
1 row in set (2.96 sec)

retention_range_count and retention_range_sum

retention_range_count and retention_range_sum are designed for user growth analysis. They support flexible retention intervals and produce output suitable for visualization.

  • retention_range_count computes per-user retention status across specified day intervals, returning a 2-dimensional array.

  • retention_range_sum takes the output of retention_range_count and aggregates it across all users to produce daily retention rates.

Syntax

-- Per-user retention
retention_range_count(is_first, is_active, dt, intervals, outputFormat)

-- Aggregate across users
retention_range_sum(retention_range_count_result)

Parameters

retention_range_count:

ParameterTypeDescription
is_firstBooleanWhether this row matches the activation event. true = this is the initial behavior
is_activeBooleanWhether this row matches the retention event. true = this counts as a retained behavior
dtDateDate of the event, in date format (for example, 2022-05-01)
intervals[]ArrayRetention intervals to track (for example, array(1, 2) for day-1 and day-2 retention). Up to 15 intervals are supported
outputFormatStringFormat of the return value. Valid values: normal (default) or expand

outputFormat values:

ValueOutput format
normal (default)[[d1(start date), 1, 0, ...], [d2(start date), 1, 0, ...], ...] — each row is [start date, interval_1_flag, interval_2_flag, ...], where 1 = retained and 0 = not retained
expand[[d1(start date), d1+1(retention date)], [d1, d1+2], [d2, d2+1], [d2, d2+3]]

retention_range_sum:

ParameterDescription
retention_range_count_resultThe 2-dimensional array output from retention_range_count

Example

The following example computes day-1 and day-2 retention for users who logged in on May 1–2, 2022, based on activity from May 1–4, 2022. The activation event is login and the retention event is pay.

Step 1. Create a test table and insert data.

CREATE TABLE event(uid string, event string, ds date);
INSERT INTO event VALUES
  ("user1", "pay",   "2022-05-01"),
  ("user1", "login", "2022-05-01"),
  ("user1", "pay",   "2022-05-02"),
  ("user1", "login", "2022-05-02"),
  ("user2", "login", "2022-05-01"),
  ("user3", "login", "2022-05-02"),
  ("user3", "pay",   "2022-05-03"),
  ("user3", "pay",   "2022-05-04");

Sample data:

+-------+-------+------------+
| uid   | event | ds         |
+-------+-------+------------+
| user1 | login | 2022-05-01 |
| user1 | pay   | 2022-05-01 |
| user1 | login | 2022-05-02 |
| user1 | pay   | 2022-05-02 |
| user2 | login | 2022-05-01 |
| user3 | login | 2022-05-02 |
| user3 | pay   | 2022-05-03 |
| user3 | pay   | 2022-05-04 |
+-------+-------+------------+

Step 2. Compute per-user retention status.

SELECT
  uid,
  r
FROM (
  SELECT
    uid,
    retention_range_count(
      event = 'login',   -- Activation event
      event = 'pay',     -- Retention event
      ds,
      array(1, 2)        -- Track day-1 and day-2 retention
    ) AS r
  FROM event
  GROUP BY uid
) AS t
ORDER BY uid;

Sample result:

+-------+-----------------------------+
| uid   | r                           |
+-------+-----------------------------+
| user1 | [[738642,0,0],[738641,1,0]] |
| user2 | [[738641,0,0]]              |
| user3 | [[738642,1,1]]              |
+-------+-----------------------------+

Each inner array contains [start_date_as_days, day1_retained, day2_retained]. For example, user3 logged in on May 2 (day number 738642) and made a payment on both May 3 (day 1) and May 4 (day 2).

Step 3. Aggregate to get daily retention rates across all users.

SELECT
  from_days(u[1]) AS ds,
  u[3] / u[2] AS retention_d1,
  u[4] / u[2] AS retention_d2
FROM (
  SELECT retention_range_sum(r) AS r
  FROM (
    SELECT
      uid,
      retention_range_count(
        event = 'login',
        event = 'pay',
        ds,
        array(1, 2)
      ) AS r
    FROM event
    GROUP BY uid
  ) AS t
  ORDER BY uid
) AS r,
unnest(r.r) AS t(u);

Sample result:

+------------+--------------+--------------+
| ds         | retention_d1 | retention_d2 |
+------------+--------------+--------------+
| 2022-05-02 |          0.5 |          0.5 |
| 2022-05-01 |          0.5 |          0.0 |
+------------+--------------+--------------+

On May 1, two users logged in (user1 and user2). Only user1 paid on May 2 (day 1 after activation), so day-1 retention for May 1 is 0.5. Neither user paid on May 3 (day 2 after activation), so day-2 retention for May 1 is 0.0. On May 2, two users logged in (user1 and user3). Both paid within the next two days, so both day-1 and day-2 retention are 0.5 for users who activated on May 2.

What's next