AnalyticDB for MySQL provides four built-in functions for funnel and retention analysis: window_funnel, retention, retention_range_count, and retention_range_sum. Use these functions to measure conversion rates across user journey steps and track how many users return over time.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL cluster running minor version 3.1.6.0 or later
To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.
Background
Funnel analysis measures how users move through a defined sequence of steps — for example, from browsing a product to completing a purchase. At each step, some users drop off; the conversion rate shows how many make it through. This technique is widely used in traffic analysis and product goal conversion analysis.
AnalyticDB for MySQL supports the following functions:
| Function | Purpose |
|---|---|
window_funnel | Counts how far a user progressed through a defined event sequence within a sliding time window |
retention | Checks whether each user satisfies a set of date-based conditions, returning a binary array |
retention_range_count | Records per-user retention status across specified intervals, returning a 2-dimensional array |
retention_range_sum | Aggregates retention_range_count output across all users to compute daily retention rates |
Test dataset
The examples in this topic use User Behavior Data from Taobao for Recommendation from Tianchi Lab. The dataset contains four behavior types:
| Behavior | Description |
|---|---|
pv | Product page view (counted as one click) |
buy | Purchase |
cart | Add to shopping cart |
fav | Add to favorites |
To load the dataset into AnalyticDB for MySQL, upload it to Object Storage Service (OSS) and then import it using an OSS external table.
Upload the dataset to OSS. For more information, see Upload objects.
Create an OSS external table.
CREATE TABLE `user_behavior_oss` ( `user_id` string, `item_id` string, `cate_id` string, `event` string, `ts` bigint ) ENGINE = 'oss' TABLE_PROPERTIES = '{ "endpoint":"oss-cn-zhangjiakou.aliyuncs.com", "accessid":"******", "accesskey":"*******", "url":"oss://<bucket-name>/user_behavior/", "delimiter":"," }'For more information, see OSS external table syntax.
Create a test table in AnalyticDB for MySQL.
CREATE TABLE user_behavior( uid string, event string, ts string )Import data from the OSS external table.
SUBMIT JOB INSERT OVERWRITE user_behavior SELECT user_id, event, ts FROM user_behavior_oss;
window_funnel
window_funnel searches a user's event history for a specified sequence of events within a sliding time window. It returns the length of the longest matching prefix of that sequence.
How it works
The function processes each user's events in chronological order:
It scans events to find the first occurrence of the initial condition in your sequence. When found, the sliding window starts.
It then looks for each subsequent condition in order within the window. Each match advances the counter by one.
If the sequence is interrupted or the window expires before all conditions match, the counter stops. The function returns the highest counter value reached.
Examples:
Event list
[c1, c2, c3], user data[c1, c2, c3, c4]→ returns3(full sequence matched)Event list
[c1, c2, c3], user data[c4, c3, c2, c1]→ returns1(only c1 matched from start)Event list
[c1, c2, c3], user data[c4, c3]→ returns0(first condition never matched)
Syntax
window_funnel(window, mode, timestamp, cond1, cond2, ..., condN)Parameters
| Parameter | Type | Description |
|---|---|---|
window | Integer | Size of the sliding time window, in the same unit as the timestamp column |
mode | String | Working mode. Set to "default" |
timestamp | BIGINT | Timestamp column. Must be of the BIGINT data type. If your timestamp column is not BIGINT, use TIMESTAMPDIFF to convert it. For example: TIMESTAMPDIFF('second', '2017-11-25 00:00:00.000', ts) |
cond1, ..., condN | Boolean | Event conditions that define the funnel steps, evaluated in order |
Example
The following example analyzes the conversion path — browse → add to favorites → add to cart → purchase — from 2017-11-25 00:00:00 to 2017-11-26 00:00:00. The sliding time window is 30 minutes (1,800 seconds), and timestamps are in Unix format (1511539200 to 1511625600).
The query runs in two stages: the inner query computes a funnel depth per user, and the outer query counts how many users reached each depth.
SELECT
funnel,
count(1)
FROM (
SELECT
uid,
window_funnel(
cast(1800 as integer), -- 30-minute sliding window
"default",
ts,
event = 'pv', -- Step 1: Browse
event = 'fav', -- Step 2: Add to favorites
event = 'cart', -- Step 3: Add to cart
event = 'buy' -- Step 4: Purchase
) AS funnel
FROM user_behavior
WHERE ts > 1511539200
AND ts < 1511625600
GROUP BY uid
)
GROUP BY funnel;Sample result:
+--------+----------+
| funnel | count(1) |
+--------+----------+
| 0 | 19687 |
| 1 | 596104 |
| 2 | 78458 |
| 3 | 11640 |
| 4 | 746 |
+--------+----------+
5 rows in set (0.64 sec)A value of 4 means the user completed all four steps. A value of 1 means only the first step (browse) was matched.
retention
retention checks whether a user satisfies a set of conditions — typically one condition per day — and returns a UINT8 array where each element is 1 (condition met) or 0 (condition not met).
Syntax
retention(cond1, cond2, ..., cond32)Parameters
| Parameter | Type | Description |
|---|---|---|
cond1, ..., cond32 | UINT8 | Conditions to evaluate, up to 32. Returns 1 if the condition is met, 0 otherwise |
Example
The following example measures 7-day retention starting November 25, 2017. sum(r[1]) counts users who were active on day 1; sum(r[2]) through sum(r[7]) count how many of those returned on each subsequent day.
Stage 1 — compute per-user retention arrays:
SELECT
uid,
retention(
ds = '2017-11-25' AND event = 'pv', -- Day 1: active (baseline)
ds = '2017-11-25', -- Day 1 retention check
ds = '2017-11-26', -- Day 2
ds = '2017-11-27', -- Day 3
ds = '2017-11-28', -- Day 4
ds = '2017-11-29', -- Day 5
ds = '2017-11-30' -- Day 6
) AS r
FROM user_behavior_date
GROUP BY uidStage 2 — aggregate across all users:
SELECT
sum(r[1]), -- Active users on day 1
sum(r[2]), -- Retained on day 1
sum(r[3]), -- Retained on day 2
sum(r[4]), -- Retained on day 3
sum(r[5]), -- Retained on day 4
sum(r[6]), -- Retained on day 5
sum(r[7]) -- Retained on day 6
FROM (
SELECT
retention(
ds = '2017-11-25' AND event = 'pv',
ds = '2017-11-25',
ds = '2017-11-26',
ds = '2017-11-27',
ds = '2017-11-28',
ds = '2017-11-29',
ds = '2017-11-30'
) AS r
FROM user_behavior_date
GROUP BY uid
);Sample result:
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| sum(r[1]) | sum(r[2]) | sum(r[3]) | sum(r[4]) | sum(r[5]) | sum(r[6]) | sum(r[7]) |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| 686953 | 686953 | 544367 | 529979 | 523516 | 524530 | 528105 |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
1 row in set (2.96 sec)retention_range_count and retention_range_sum
retention_range_count and retention_range_sum are designed for user growth analysis. They support flexible retention intervals and produce output suitable for visualization.
retention_range_countcomputes per-user retention status across specified day intervals, returning a 2-dimensional array.retention_range_sumtakes the output ofretention_range_countand aggregates it across all users to produce daily retention rates.
Syntax
-- Per-user retention
retention_range_count(is_first, is_active, dt, intervals, outputFormat)
-- Aggregate across users
retention_range_sum(retention_range_count_result)Parameters
retention_range_count:
| Parameter | Type | Description |
|---|---|---|
is_first | Boolean | Whether this row matches the activation event. true = this is the initial behavior |
is_active | Boolean | Whether this row matches the retention event. true = this counts as a retained behavior |
dt | Date | Date of the event, in date format (for example, 2022-05-01) |
intervals[] | Array | Retention intervals to track (for example, array(1, 2) for day-1 and day-2 retention). Up to 15 intervals are supported |
outputFormat | String | Format of the return value. Valid values: normal (default) or expand |
outputFormat values:
| Value | Output format |
|---|---|
normal (default) | [[d1(start date), 1, 0, ...], [d2(start date), 1, 0, ...], ...] — each row is [start date, interval_1_flag, interval_2_flag, ...], where 1 = retained and 0 = not retained |
expand | [[d1(start date), d1+1(retention date)], [d1, d1+2], [d2, d2+1], [d2, d2+3]] |
retention_range_sum:
| Parameter | Description |
|---|---|
retention_range_count_result | The 2-dimensional array output from retention_range_count |
Example
The following example computes day-1 and day-2 retention for users who logged in on May 1–2, 2022, based on activity from May 1–4, 2022. The activation event is login and the retention event is pay.
Step 1. Create a test table and insert data.
CREATE TABLE event(uid string, event string, ds date);INSERT INTO event VALUES
("user1", "pay", "2022-05-01"),
("user1", "login", "2022-05-01"),
("user1", "pay", "2022-05-02"),
("user1", "login", "2022-05-02"),
("user2", "login", "2022-05-01"),
("user3", "login", "2022-05-02"),
("user3", "pay", "2022-05-03"),
("user3", "pay", "2022-05-04");Sample data:
+-------+-------+------------+
| uid | event | ds |
+-------+-------+------------+
| user1 | login | 2022-05-01 |
| user1 | pay | 2022-05-01 |
| user1 | login | 2022-05-02 |
| user1 | pay | 2022-05-02 |
| user2 | login | 2022-05-01 |
| user3 | login | 2022-05-02 |
| user3 | pay | 2022-05-03 |
| user3 | pay | 2022-05-04 |
+-------+-------+------------+Step 2. Compute per-user retention status.
SELECT
uid,
r
FROM (
SELECT
uid,
retention_range_count(
event = 'login', -- Activation event
event = 'pay', -- Retention event
ds,
array(1, 2) -- Track day-1 and day-2 retention
) AS r
FROM event
GROUP BY uid
) AS t
ORDER BY uid;Sample result:
+-------+-----------------------------+
| uid | r |
+-------+-----------------------------+
| user1 | [[738642,0,0],[738641,1,0]] |
| user2 | [[738641,0,0]] |
| user3 | [[738642,1,1]] |
+-------+-----------------------------+Each inner array contains [start_date_as_days, day1_retained, day2_retained]. For example, user3 logged in on May 2 (day number 738642) and made a payment on both May 3 (day 1) and May 4 (day 2).
Step 3. Aggregate to get daily retention rates across all users.
SELECT
from_days(u[1]) AS ds,
u[3] / u[2] AS retention_d1,
u[4] / u[2] AS retention_d2
FROM (
SELECT retention_range_sum(r) AS r
FROM (
SELECT
uid,
retention_range_count(
event = 'login',
event = 'pay',
ds,
array(1, 2)
) AS r
FROM event
GROUP BY uid
) AS t
ORDER BY uid
) AS r,
unnest(r.r) AS t(u);Sample result:
+------------+--------------+--------------+
| ds | retention_d1 | retention_d2 |
+------------+--------------+--------------+
| 2022-05-02 | 0.5 | 0.5 |
| 2022-05-01 | 0.5 | 0.0 |
+------------+--------------+--------------+On May 1, two users logged in (user1 and user2). Only user1 paid on May 2 (day 1 after activation), so day-1 retention for May 1 is 0.5. Neither user paid on May 3 (day 2 after activation), so day-2 retention for May 1 is 0.0. On May 2, two users logged in (user1 and user3). Both paid within the next two days, so both day-1 and day-2 retention are 0.5 for users who activated on May 2.