In data analysis, funnel analysis is an important tool for measuring and optimizing the conversion rates of users in a specific process. Retention analysis can be used to measure the continuous use of a product or service by users. Funnel analysis and retention analysis are critical indicators for assessing product health and user satisfaction. Funnel analysis functions and retention functions are designed to process and analyze data in such scenarios. You can use the functions to simplify the query process and improve efficiency. This topic describes how to use funnel analysis functions and retention functions in different scenarios.
Background information
An event represents a specific or series of meaningful behaviors of a user, such as downloading, registering with, and logging on to a game app. The actual use process of the user can be restored by analyzing various behavioral data of the user. This improves product conversion rates and promotes business growth. Common user behavior analysis includes event analysis, funnel analysis, and retention analysis. Funnel analysis and retention analysis are the most common types of user behavior analysis.
Funnel analysis: an analysis model that can be used to analyze the conversion status of users at various stages of a process. This model analyzes the conversion rate of user behavior at each stage to measure the corresponding performance. This way, you can take specific measures to optimize user experience on products to increase conversion rates.
Retention analysis: an analysis model that can be used to assess user engagement and stickiness. This model helps you calculate the number of users who remain engaged with your product after they initially sign up or begin to use your product. Then, you can assess the value of the product to users.
Funnel analysis functions
Definition
Funnel analysis is used to calculate the completion rate of users from the first step to the last step of a series of predefined steps. A funnel analysis function calculates the maximum number of consecutive events starting from the first event of an event chain within a sliding time window. The following rules apply to a funnel analysis function:
The function checks against the first step of the event chain. If the data contains an event that meets the condition of the first step, the function increments the counter by 1 and sets the corresponding event time as the start time of the sliding window. If no matching event is found in the data for the first step, the function returns 0.
Within the sliding window, the counter is incremented if the events in the event chain occur in order. Outside the sliding window, the counter is no longer incremented.
If multiple event chains meet the specified steps of the event chain, the function outputs the longest event chain.
Syntax
SELECT windowFunnel(time_frame, mode, time, ARRAY[Cond1,Cond2,...CondN])
FROM tbl
GROUP BY uid;Parameters
Parameter | Description |
time_frame | The size of the sliding window, which must be of the BIGINT type. The default unit is seconds. |
mode | The filtering mode of the event chain, which must be of the TEXT type. Valid values: default and strict. The default value is default, which specifies that general funnel calculation is performed. A value of strict specifies the deduplication mode. In this mode, the filtered event chains do not contain duplicate events. Assume that the array parameter is set to [event_type='A',event_type='B',event_type='C',event='D'] and an original event chain is "A-B-C-B-D". The filtered event chain is "A-B-C" because event B is duplicated. |
time | The time column, which must be of the TIMESTAMP type. |
array[Cond1,Cond2,...CondN] | The steps of the event chain, which must be of the ARRAY type. |
Return value
The function returns the maximum number of consecutive events that meet the conditions of the specified steps in an event chain within a sliding time window. The return value is of the BIGINT type.
Example
Before you perform funnel analysis or retention analysis, make sure that the event data is accurate and complete. The following statements show a complete purchase event chain of users from browsing a web page to placing an order:
CREATE TABLE action
(
uid INT,
event_type TEXT,
time TIMESTAMP
);
INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (1, 'Click', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (1, 'Order', '2020-01-02 11:20:00');
INSERT INTO action
VALUES (1, 'Pay', '2020-01-02 11:30:00');
INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (2, 'Order', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (2, 'Pay', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (3, 'Browse', '2020-01-02 11:20:00');
INSERT INTO action
VALUES (3, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (4, 'Browse', '2020-01-02 11:50:00');
INSERT INTO action
VALUES (4, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (5, 'Browse', '2020-01-02 11:50:00');
INSERT INTO action
VALUES (5, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (5, 'Order', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (6, 'Browse', '2020-01-02 11:50:00');
INSERT INTOaction
VALUES (6, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (6, 'Order', '2020-01-02 12:10:00');You can use a funnel function to analyze the event engagement level reached by each user within the specified event window. In this example, the highest engagement level that indicates completing the entire process is set to 4. The following statement queries the event engagement levels of each user.
Query user events
SELECT uid,
windowFunnel(1800, 'default',
TIME,
ARRAY [event_type = 'Browse', event_type = 'Click',
event_type = 'Order', event_type = 'Pay']) AS level
FROM action
GROUP BY uid
ORDER BY uidSample result
uid | level
-----+-------
1 | 4
2 | 0
3 | 1
4 | 2
5 | 2
6 | 3
(6 ROWS)Retention functions
Definition
Retention functions are used to calculate the retention status of users during a period of time. You can specify 1 to 31 conditions for the following retention function. The function assesses whether an event meets each condition, starting from the first condition. The function returns 1 for each condition that is met and 0 for each condition that is not met. As a result, an array of 0s and 1s is returned. You can count the entries that contain a value of 1 in the result to calculate the user retention rate.
Syntax
SELECT retention(ARRAY[Cond1,...CondN]) FROM tbl;Parameters
Parameter | Description |
array[Conds] | The conditions, which must be of the ARRAY type. You can specify up to 31 conditions. Separate multiple conditions with commas (,). |
Return value
The function returns an array of 0s and 1s. The sum of 0s and 1s in the array is the same as the number of specified conditions. The function sequentially assesses each condition, starting from the first condition.
If an event meets the current condition, the function returns 1 and continues assessment.
If an event does not meet the current condition, the function returns 0 for the current condition and all subsequent conditions.
Example
Before you perform funnel analysis or retention analysis, make sure that the event data is accurate and complete. The following statements show a dataset for analyzing user purchase events:
CREATE TABLE retention_test
(
id INTEGER,
action text,
time TIMESTAMP
);
INSERT INTO retention_test
VALUES (1, 'pv', '2022-01-01 08:00:05'),
(2, 'pv', '2022-01-01 10:20:08'),
(1, 'buy', '2022-01-02 15:30:10'),
(2, 'pv', '2022-01-02 17:30:05'),
(3, 'buy', '2022-01-01 05:30:09'),
(3, 'buy', '22022-01-02 08:10:15'),
(4, 'pv', '2022-01-02 21:09:15'),
(5, 'pv', '2022-01-01 22:10:53'),
(5, 'pv', '2022-01-02 19:10:52'),
(5, 'buy', '2022-01-02 20:00:50');The following statement queries the details of users who visited the product page on January 1, 2022 and made a purchase on the following day.
Query user purchase details
SELECT id,
retention(ARRAY [action = 'pv' AND time::date = '2022-01-01', action = 'buy' AND time::date = '2022-01-02']) AS retention
FROM retention_test
GROUP BY id
ORDER BY id;
Sample result
id | retention
----+-----------
1 | {1,1}
2 | {1,0}
3 | {0,0}
4 | {0,0}
5 | {1,1}
(5 ROWS)The following statement queries the statistics of user purchase events.
Query user event statistics
SELECT SUM(r[1]) AS day1, SUM(r[2]) AS day2
FROM (SELECT id,
retention(ARRAY [action = 'pv' AND time::date = '2022-01-01',
action = 'buy' AND time::date = '2022-01-02']) AS r
FROM retention_test
GROUP BY id
ORDER BY id) t;Sample result
day1 | day2
------+-----
3 | 2
(1 ROW)Extended retention functions
Definition
Retention analysis is the most common and typical user growth analysis scenario. In most cases, you need to use charts to analyze user retention status. To meet the preceding business requirements, AnalyticDB for PostgreSQL provides the range_retention_count() and range_retention_sum() functions.
range_retention_count(): records the retention status of each user. The return value of this function is an array of the BIGINT type and can be used as the input value of the range_retention_sum() function.
range_retention_sum(): calculates the retention rate of all users on each day. The return value of this function is an array of the TEXT type.
Syntax
range_retention_count()
range_retention_count(is_first, is_active, dt, retention_intervals, retention_granularity)range_retention_sum()
range_retention_sum(range_retention_count)
Parameters
Parameter | Type | Description |
is_first | boolean | Specifies whether the first event is matched. Valid values:
|
is_active | boolean | Specifies whether the retention condition is met. Valid values:
|
dt | date | The date when the user behavior occurred. Example: 2020-12-12. |
retention_interval | int[] | The retention interval. Up to 15 retention intervals are supported. Example: array[1,3,5,7,15,30]. |
retention_granularity | text | The retention granularity. Set the value to day. |
Example of using the range_retention_count() function
The following statements query the retention status of each user at the following times: t, t+1, and t+2.
Prepare a dataset
CREATE TABLE event_tbl(uid INT, event TEXT, dt DATE);
INSERT INTO event_tbl VALUES
(1, 'pay', '2022-05-01'),(1, 'login', '2022-05-01'),
(1, 'pay', '2022-05-02'),(1, 'login', '2022-05-02'),
(2, 'login', '2022-05-01'),
(3, 'login', '2022-05-02'), (3, 'pay', '2022-05-03'),
(3, 'pay', '2022-05-04');Query user retention details
SELECT
uid, r
FROM
(
SELECT
uid,
range_retention_count(event = 'login', event = 'pay', dt,
ARRAY [1, 2], 'day') AS r
FROM
event_tbl
GROUP BY
uid
) AS t
ORDER BY
uid;Sample result
uid | r
-----+-----------------------------
1 | {"{8156,1,0}","{8157,0,0}"}
2 | {"{8156,0,0}"}
3 | {"{8157,1,1}"}
(3 ROWS)Convert a string into a date
SELECT to_date(8156)
to_date
------------
2022-05-01Example of using the range_retention_sum() function
In this example, the range_retention_sum() function uses the results of the range_retention_count() function as the input value.
Calculate the retention rate on each day
WITH retention_count_info AS (
SELECT
uid,
range_retention_count(event = 'login', event = 'pay',
dt, array[1, 2], 'day') AS info
FROM
event_tbl
GROUP BY
uid
), retention_sum AS (
SELECT regexp_split_to_array(unnest(range_retention_sum(info)), ',') AS s
FROM retention_count_info
) SELECT to_date(s[1]::int) AS login_date,
s[3]::numeric / s[2]::numeric AS retention_d1,
s[4]::numeric / s[2]::numeric AS retention_d2
FROM retention_sum
ORDER BY login_date;Sample result
login_date | retention_d1 | retention_d2
------------+------------------------+------------------------
2022-05-01 | 0.50000000000000000000 | 0.00000000000000000000
2022-05-02 | 0.50000000000000000000 | 0.50000000000000000000
(2 ROWS)