All Products
Search
Document Center

AnalyticDB:Funnel and retention functions

Last Updated:Aug 11, 2025

This topic describes how to use funnel and retention functions to perform funnel analysis in AnalyticDB for MySQL.

Prerequisites

The minor version of an AnalyticDB for MySQL cluster is 3.1.6.0 or later.

Note

To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.

Background information

Funnel analysis is a common conversion analysis method that shows user behavior conversion rates at various stages. It is widely used in user behavior analysis and app data analytics for tasks such as traffic analysis and product goal conversion analysis. AnalyticDB for MySQL supports the following funnel and retention functions:

Test dataset

In this topic, the Taobao behavior data from Tianchi Lab is used to test funnel and retention functions. For more information, see User Behavior Data from Taobao for Recommendation.

The following table describes four types of user behavior that are provided in the test dataset.

Behavior type:

Description

pv

Each product page view is counted as a click.

buy

Purchases a commodity.

cart

Adds a commodity to the shopping cart.

fav

Adds a commodity to favorites.

Before testing, upload the test dataset to Object Storage Service (OSS) and then import the data from OSS into AnalyticDB for MySQL.

  1. Upload the test dataset to OSS. For more information, see Upload objects.

  2. Use an OSS external table to import the data into AnalyticDB for MySQL. The following example shows the required operations:

    1. Create an OSS external table.

      CREATE TABLE `user_behavior_oss` (
        `user_id` string,
        `item_id` string,
        `cate_id` string,
        `event` string,
         `ts` bigint) ENGINE = 'oss'
      TABLE_PROPERTIES = '{
      "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
      "accessid":"******",
      "accesskey":"*******",
      "url":"oss://<bucket-name>/user_behavior/",
      "delimiter":","
      }'

      For more information, see OSS external table syntax.

    2. Create an AnalyticDB for MySQL test table.

      CREATE TABLE user_behavior(
        uid string,
        event string,
        ts string
      )
    3. Import data from the OSS external table to the test table.

      SUBMIT JOB INSERT OVERWRITE user_behavior SELECT user_id , event, ts FROM user_behavior_oss;

window_funnel

The window_funnel function is used to search an event list in a sliding time window and count the maximum number of consecutive events. The window_funnel function starts the count from the first event in the event list that you specify, checks the events in sequence, and then returns the maximum number of consecutive events.

  • If you specify an event list that contains c1, c2, and c3 and the user data contains c1, c2, c3, and c4, the function returns 3.

  • If you specify an event list that contains c1, c2, and c3 and the user data contains c4, c3, c2, and c1, the function returns 1.

  • If you specify an event list that contains c1, c2, and c3 and the user data contains c4 and c3, the function returns 0.

Syntax

window_funnel(window, mode, timestamp, cond1, cond2, ..., condN)

Parameters

Parameter

Description

window

The size of the sliding time window.

mode

The working mode of the function. This parameter is an extensible parameter. Set the value to default.

timestamp

The time column, which must be of the BIGINT data type. If the time column is of another data type, you must convert it to the BIGINT type.

For example, for the TIMESTAMP data type, you must use the TIMESTAMPDIFF function to query the difference between the timestamp and the initial time. Sample statement:

select uid, window_funnel(cast(1800 as integer),"default", TIMESTAMPDIFF('second','2017-11-25 00:00:00.000',ts),event = 'pv',event = 'fav',event = 'cart',event = 'buy') as funnel from user_behavior where ts > '2017-11-25 00:00:00.000' and ts < '2017-11-26 00:00:000.000'

cond

The event steps.

Examples

For example, you want to analyze the conversion status from 2017-11-25 00:00:00 to 2017-11-26 00:00:00 based on the following conversion path: browse a commodity > add the commodity to favorites > add the commodity to the shopping cart > purchase the commodity. The related SQL statement involves the following conditions:

  • The size of the sliding time window is 30 minutes, which is equivalent to 1,800 seconds.

  • The query time is from 2017-11-25 00:00:00 to 2017-11-26 00:00:00, which is equivalent to 1511539200 to 1511625600 in the timestamp format.

  • The event steps are performed in the following order: browse a commodity > add the commodity to favorites > add the commodity to the shopping cart > purchase the commodity.

Execute the following SQL statement:

SELECT funnel, count(1) FROM (SELECT uid, window_funnel(cast(1800 as integer),"default", ts, event='pv',  event='fav', event='cart', event='buy') AS funnel FROM user_behavior WHERE ts > 1511539200 AND ts < 1511625600 group by uid) GROUP BY funnel;

Sample result:

+--------+----------+
| funnel | count(1) |
+--------+----------+
|      0 |    19687 |
|      2 |    78458 |
|      3 |    11640 |
|      1 |   596104 |
|      4 |      746 |
+--------+----------+
5 rows in set (0.64 sec)

retention

The retention function can use a set of conditions as parameters to analyze whether an event meets the conditions.

Syntax

retention(cond1, cond2, ..., cond32)

Parameters

Parameter

Description

cond

The analysis condition, which is of the UINT8 data type. You can specify 1 to 32 conditions.

If a condition is met, the return value is increased by 1. If a condition is not met, the return value is increased by 0.

Examples

For example, you want to analyze the user retention status as of November 25, 2017. The related SQL statement involves the following conditions:

  • The start date for analysis is 2017-11-25.

  • The number of active users on the first day is represented by sum(r[1]). The number of retained users from the second to the seventh day is represented by sum(r[2]), sum(r[3])...sum(r[7]).

Execute the following SQL statement:

SELECT sum(r[1]),sum(r[2]),sum(r[3]),sum(r[4]),sum(r[5]),sum(r[6]),sum(r[7]) FROM (SELECT retention(ds='2017-11-25' AND event='pv', ds='2017-11-25', ds='2017-11-26',ds='2017-11-27',ds='2017-11-28',ds='2017-11-29',ds='2017-11-30') AS r FROM user_behavior_date GROUP BY uid);

Sample result:

+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| sum(r[1]) | sum(r[2]) | sum(r[3]) | sum(r[4]) | sum(r[5]) | sum(r[6]) | sum(r[7]) |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|    686953 |    686953 |    544367 |    529979 |    523516 |    524530 |    528105 |
+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
1 row in set (2.96 sec)

retention_range_count and retention_range_sum

The retention analysis functions (retention_range_count and retention_range_sum) can be used to generate visualized graphs and analyze user retention in user growth analysis scenarios. The retention_range_count function can be used to record the retention status of each user. The return value of this function is a 2-dimensional array and can serve as an input parameter of the retention_range_sum function. The retention_range_sum function can be used to summarize the daily retention rate of all users.

Syntax

  • retention_range_count

    retention_range_count(is_first, is_active, dt, intervals, outputFormat)
  • retention_range_sum

    retention_range_sum(retention_range_count_result)

Parameters

Parameter

Description

is_first

Specifies whether the first event is matched. Valid values:

  • true: Follows the initial behavior.

  • false

is_active

Specifies whether the retention condition is met. Valid values:

  • true: Complies with the retention behavior.

  • false: The behavior is not a subsequent retention behavior.

dt

The date on which the behavior occurred. The format is date, for example, 2022-05-01.

intervals[]

The retention interval. Up to 15 retention intervals are supported.

outputFormat

The format of the return value. Valid values:

  • If you set outputFormat to normal, the output is in the format [[d1(start date), 1, 0...], [d2(start date), 1, 0...], [d3(start date), 1, 0...]]. In this format, 1 indicates that a retention event exists, and 0 indicates that no retention event exists.

  • If you set outputFormat to expand, the output is in the format [[d1(start date),d1+1(retention date)], [d1, d1+2], [d2, d2+1], [d2, d2+3]].

The default value is normal.

Examples

For example, you want to query the retention status on May 1, 2022 and May 2, 2022 based on the user data from May 1, 2022 to May 4, 2022. The activation event is login, and the retention event is pay.

  1. Create a test table and insert data into the table.

    1. Create a table.

      CREATE TABLE event(uid string, event string, ds date);
    2. Insert data into the table.

      INSERT INTO event VALUES("user1", "pay", "2022-05-01"),("user1", "login", "2022-05-01"),("user1", "pay", "2022-05-02"),("user1", "login", "2022-05-02"),("user2", "login", "2022-05-01"),("user3", "login", "2022-05-02"),("user3", "pay", "2022-05-03"),("user3", "pay", "2022-05-04");

      Sample test data:

      +-------+-------+------------+
      | uid   | event | ds         |
      +-------+-------+------------+
      | user1 | login | 2022-05-01 |
      | user1 | pay   | 2022-05-01 |
      | user1 | login | 2022-05-02 |
      | user1 | pay   | 2022-05-02 |
      | user2 | login | 2022-05-01 |
      | user3 | login | 2022-05-02 |
      | user3 | pay   | 2022-05-03 |
      | user3 | pay   | 2022-05-04 |
      +-------+-------+------------+
  2. Query the retention status of each user.

    SELECT
      uid,
      r
    FROM
      (
        SELECT
          uid,
          retention_range_count(event = 'login', event = 'pay', ds, array(1, 2)) AS r
        FROM
          event
        GROUP BY
          uid
      ) AS t
    ORDER BY
      uid;

    Sample result:

    +-------+-----------------------------+
    | uid   | r                           |
    +-------+-----------------------------+
    | user1 | [[738642,0,0],[738641,1,0]] |
    | user2 | [[738641,0,0]]              |
    | user3 | [[738642,1,1]]              |
    +-------+-----------------------------+
  3. Query the daily retention rate of all users.

    SELECT
      from_days(u [1]) AS ds,
      u [3] / u [2] AS retention_d1,
      u [4] / u [2] AS retention_d2
    FROM
      (
        SELECT
          retention_range_sum(r) AS r
        FROM
          (
            SELECT
              uid,
              retention_range_count(event = 'login', event = 'pay', ds, array(1, 2)) AS r
            FROM
              event
            GROUP BY
              uid
          ) AS t
        ORDER BY
          uid
      ) AS r,
      unnest(r.r) AS t(u);

    Sample result:

    +------------+--------------+--------------+
    | ds         | retention_d1 | retention_d2 |
    +------------+--------------+--------------+
    | 2022-05-02 |          0.5 |          0.5 |
    | 2022-05-01 |          0.5 |          0.0 |
    +------------+--------------+--------------+