All Products
Search
Document Center

AnalyticDB:Funnel analysis functions and retention functions

Last Updated:Jul 17, 2024

In data analysis, funnel analysis is an important tool for measuring and optimizing the conversion rates of users in a specific process. Retention analysis can be used to measure the continuous use of a product or service by users. Funnel analysis and retention analysis are critical indicators for assessing product health and user satisfaction. Funnel analysis functions and retention functions are designed to process and analyze data in such scenarios. You can use the functions to simplify the query process and improve efficiency. This topic describes how to use funnel analysis functions and retention functions in different scenarios.

Background information

An event represents a specific or series of meaningful behaviors of a user, such as downloading, registering with, and logging on to a game app. The actual use process of the user can be restored by analyzing various behavioral data of the user. This improves product conversion rates and promotes business growth. Common user behavior analysis includes event analysis, funnel analysis, and retention analysis. Funnel analysis and retention analysis are the most common types of user behavior analysis.

  • Funnel analysis: an analysis model that can be used to analyze the conversion status of users at various stages of a process. This model analyzes the conversion rate of user behavior at each stage to measure the corresponding performance. This way, you can take specific measures to optimize user experience on products to increase conversion rates.

  • Retention analysis: an analysis model that can be used to assess user engagement and stickiness. This model helps you calculate the number of users who remain engaged with your product after they initially sign up or begin to use your product. Then, you can assess the value of the product to users.

Funnel analysis functions

Definition

Funnel analysis is used to calculate the completion rate of users from the first step to the last step of a series of predefined steps. A funnel analysis function calculates the maximum number of consecutive events starting from the first event of an event chain within a sliding time window. The following rules apply to a funnel analysis function:

  • The function checks against the first step of the event chain. If the data contains an event that meets the condition of the first step, the function increments the counter by 1 and sets the corresponding event time as the start time of the sliding window. If no matching event is found in the data for the first step, the function returns 0.

  • Within the sliding window, the counter is incremented if the events in the event chain occur in order. Outside the sliding window, the counter is no longer incremented.

  • If multiple event chains meet the specified steps of the event chain, the function outputs the longest event chain.

Syntax

SELECT windowFunnel(time_frame, mode, time, ARRAY[Cond1,Cond2,...CondN]) 
FROM tbl 
GROUP BY uid;

Parameters

Parameter

Description

time_frame

The size of the sliding window, which must be of the BIGINT type. The default unit is seconds.

mode

The filtering mode of the event chain, which must be of the TEXT type. Valid values: default and strict.

The default value is default, which specifies that general funnel calculation is performed.

A value of strict specifies the deduplication mode. In this mode, the filtered event chains do not contain duplicate events. Assume that the array parameter is set to [event_type='A',event_type='B',event_type='C',event='D'] and an original event chain is "A-B-C-B-D". The filtered event chain is "A-B-C" because event B is duplicated.

time

The time column, which must be of the TIMESTAMP type.

array[Cond1,Cond2,...CondN]

The steps of the event chain, which must be of the ARRAY type.

Return value

The function returns the maximum number of consecutive events that meet the conditions of the specified steps in an event chain within a sliding time window. The return value is of the BIGINT type.

Example

Before you perform funnel analysis or retention analysis, make sure that the event data is accurate and complete. The following statements show a complete purchase event chain of users from browsing a web page to placing an order:

CREATE TABLE action
(
    uid        INT,
    event_type TEXT,
    time       TIMESTAMP
);

INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (1, 'Click', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (1, 'Order', '2020-01-02 11:20:00');
INSERT INTO action
VALUES (1, 'Pay', '2020-01-02 11:30:00');
INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (1, 'Browse', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (2, 'Order', '2020-01-02 11:00:00');
INSERT INTO action
VALUES (2, 'Pay', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (3, 'Browse', '2020-01-02 11:20:00');
INSERT INTO action
VALUES (3, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (4, 'Browse', '2020-01-02 11:50:00');
INSERT INTO action
VALUES (4, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (5, 'Browse', '2020-01-02 11:50:00');
INSERT INTO action
VALUES (5, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (5, 'Order', '2020-01-02 11:10:00');
INSERT INTO action
VALUES (6, 'Browse', '2020-01-02 11:50:00');
INSERT INTOaction
VALUES (6, 'Click', '2020-01-02 12:00:00');
INSERT INTO action
VALUES (6, 'Order', '2020-01-02 12:10:00');

You can use a funnel function to analyze the event engagement level reached by each user within the specified event window. In this example, the highest engagement level that indicates completing the entire process is set to 4. The following statement queries the event engagement levels of each user.

Query user events

SELECT uid,
       windowFunnel(1800, 'default',
                    TIME,
                    ARRAY [event_type = 'Browse', event_type = 'Click',
                           event_type = 'Order', event_type = 'Pay']) AS level
FROM action
GROUP BY uid
ORDER BY uid

Sample result

 uid | level
-----+-------
   1 |     4
   2 |     0
   3 |     1
   4 |     2
   5 |     2
   6 |     3
(6 ROWS)

Retention functions

Definition

Retention functions are used to calculate the retention status of users during a period of time. You can specify 1 to 31 conditions for the following retention function. The function assesses whether an event meets each condition, starting from the first condition. The function returns 1 for each condition that is met and 0 for each condition that is not met. As a result, an array of 0s and 1s is returned. You can count the entries that contain a value of 1 in the result to calculate the user retention rate.

Syntax

SELECT retention(ARRAY[Cond1,...CondN]) FROM tbl;

Parameters

Parameter

Description

array[Conds]

The conditions, which must be of the ARRAY type. You can specify up to 31 conditions. Separate multiple conditions with commas (,).

Return value

The function returns an array of 0s and 1s. The sum of 0s and 1s in the array is the same as the number of specified conditions. The function sequentially assesses each condition, starting from the first condition.

  • If an event meets the current condition, the function returns 1 and continues assessment.

  • If an event does not meet the current condition, the function returns 0 for the current condition and all subsequent conditions.

Example

Before you perform funnel analysis or retention analysis, make sure that the event data is accurate and complete. The following statements show a dataset for analyzing user purchase events:

CREATE TABLE retention_test
(
    id     INTEGER,
    action text,
    time   TIMESTAMP
);

INSERT INTO retention_test
VALUES (1, 'pv', '2022-01-01 08:00:05'),
       (2, 'pv', '2022-01-01 10:20:08'),
       (1, 'buy', '2022-01-02 15:30:10'),
       (2, 'pv', '2022-01-02 17:30:05'),
       (3, 'buy', '2022-01-01 05:30:09'),
       (3, 'buy', '22022-01-02 08:10:15'),
       (4, 'pv', '2022-01-02 21:09:15'),
       (5, 'pv', '2022-01-01 22:10:53'),
       (5, 'pv', '2022-01-02 19:10:52'),
       (5, 'buy', '2022-01-02 20:00:50');

The following statement queries the details of users who visited the product page on January 1, 2022 and made a purchase on the following day.

Query user purchase details

SELECT id,
       retention(ARRAY [action = 'pv' AND time::date = '2022-01-01', action = 'buy' AND time::date = '2022-01-02']) AS retention
FROM retention_test
GROUP BY id
ORDER BY id;

Sample result

 id | retention
----+-----------
  1 | {1,1}
  2 | {1,0}
  3 | {0,0}
  4 | {0,0}
  5 | {1,1}
(5 ROWS)

The following statement queries the statistics of user purchase events.

Query user event statistics

SELECT SUM(r[1]) AS day1, SUM(r[2]) AS day2
FROM (SELECT id,
             retention(ARRAY [action = 'pv' AND time::date = '2022-01-01',
             action = 'buy' AND time::date = '2022-01-02']) AS r
      FROM retention_test
      GROUP BY id
      ORDER BY id) t;

Sample result

 day1 | day2
------+-----
   3  |   2
(1 ROW)

Extended retention functions

Definition

Retention analysis is the most common and typical user growth analysis scenario. In most cases, you need to use charts to analyze user retention status. To meet the preceding business requirements, AnalyticDB for PostgreSQL provides the range_retention_count() and range_retention_sum() functions.

  • range_retention_count(): records the retention status of each user. The return value of this function is an array of the BIGINT type and can be used as the input value of the range_retention_sum() function.

  • range_retention_sum(): calculates the retention rate of all users on each day. The return value of this function is an array of the TEXT type.

Syntax

  • range_retention_count()

    range_retention_count(is_first,  
                          is_active, 
                          dt, 
                          retention_intervals, 
                          retention_granularity)
  • range_retention_sum()

    range_retention_sum(range_retention_count)

Parameters

Parameter

Type

Description

is_first

boolean

Specifies whether the first event is matched. Valid values:

  • true

  • false

is_active

boolean

Specifies whether the retention condition is met. Valid values:

  • true

  • false

dt

date

The date when the user behavior occurred. Example: 2020-12-12.

retention_interval

int[]

The retention interval. Up to 15 retention intervals are supported.

Example: array[1,3,5,7,15,30].

retention_granularity

text

The retention granularity. Set the value to day.

Example of using the range_retention_count() function

The following statements query the retention status of each user at the following times: t, t+1, and t+2.

Prepare a dataset

CREATE TABLE event_tbl(uid INT, event TEXT, dt DATE);
INSERT INTO event_tbl VALUES
                        (1, 'pay', '2022-05-01'),(1, 'login', '2022-05-01'),
                        (1, 'pay', '2022-05-02'),(1, 'login', '2022-05-02'),
                        (2, 'login', '2022-05-01'),
                        (3, 'login', '2022-05-02'), (3, 'pay', '2022-05-03'),
                        (3, 'pay', '2022-05-04');

Query user retention details

SELECT
    uid, r
FROM
    (
        SELECT
            uid,
            range_retention_count(event = 'login', event = 'pay', dt, 
            ARRAY [1, 2], 'day') AS r
        FROM
            event_tbl
        GROUP BY
            uid
    ) AS t
ORDER BY
    uid;

Sample result

 uid |  r
-----+-----------------------------
   1 | {"{8156,1,0}","{8157,0,0}"}
   2 | {"{8156,0,0}"}
   3 | {"{8157,1,1}"}
(3 ROWS)

Convert a string into a date

SELECT to_date(8156)
to_date
------------
2022-05-01

Example of using the range_retention_sum() function

In this example, the range_retention_sum() function uses the results of the range_retention_count() function as the input value.

Calculate the retention rate on each day

WITH retention_count_info AS (
    SELECT
        uid,
        range_retention_count(event = 'login', event = 'pay', 
                               dt, array[1, 2], 'day') AS info
    FROM
        event_tbl
    GROUP BY
        uid
), retention_sum AS (
         SELECT regexp_split_to_array(unnest(range_retention_sum(info)), ',') AS s
         FROM retention_count_info
     ) SELECT to_date(s[1]::int) AS login_date,
             s[3]::numeric / s[2]::numeric AS retention_d1,
             s[4]::numeric / s[2]::numeric AS retention_d2
FROM retention_sum
ORDER BY login_date;

Sample result

 login_date | retention_d1           | retention_d2
------------+------------------------+------------------------
 2022-05-01 | 0.50000000000000000000 | 0.00000000000000000000
 2022-05-02 | 0.50000000000000000000 | 0.50000000000000000000
(2 ROWS)