Demystifying Second-level Funnel Analysis Function of AnalyticDB for MySQL

By Yu Li (Hengjiang)

Business Challenges

Insight analysis, intelligent user identification, and business reporting are crucial scenarios that require the use of OLAP analytic databases. AnalyticDB for MySQL, a cloud-native data warehouse, has been successfully implemented in various marketing scenarios for platforms like Taobao, Ele.me, Cainiao, Youku, and Hema. Through a series of articles, we will explore the implementation and application of AnalyticDB for MySQL in marketing data products. This article focuses on the implementation and application of Funnel Analysis.

Marketing professionals aim to optimize the user conversion path and regain lost customers by taking specific measures and conducting experiments at different stages of AARRR (Acquisition、Activation, Retention, Revenue and Referral). To achieve this, accurate calculation of user behavioral data at each stage is essential, referred to as funnel transformation. Additionally, high-performance data products are crucial, as no one likes a constantly loading frontend UI. Lack of ad hoc interaction capabilities in data products often leads to complaints and dissatisfaction.

Technical Challenges

In the past, database products focused primarily on providing standard SQL semantics without considering the implementation of specific business scenarios. Let's explore how to implement a funnel function using standard SQL semantics. Suppose we have a dataset of user behavior that includes comprehensive information about the user, their actions, and the location where these actions occur. The user behavior data table, named "user_behavior," is structured as follows:

There are four types of user behaviors:

When developing a data product, we often encounter two types of funnel requirements. The first is to display data on the homepage of the data report for decision-makers, focusing only on the statistics of each event. This implementation is relatively simple.

select
  event_type,
  count(distinct uid)
from
  user_behavior
where
  item_id = 3838928
  and ts >= 1511540732
  and ts <= 1512312625
group by
event_type

However, this type of funnel can only provide coarse-grained statistical information and cannot analyze the causal relationship and behavior path before and after the event.

For example, the path for a user to purchase a product may involve clicking on the product (pv), adding it to the shopping cart (cart), and finally making a purchase (buy). It could also involve clicking on the product (pv), adding it to favorites (fav), adding it to the shopping cart (cart), and then making a purchase (buy). While we can calculate the number of independent users for each event, we cannot express the information about how many users make direct purchases or add products to their favorites before buying.

To address this, we need to analyze each user's event sequence, such as pv, pv, fav, cart, buy, pv, pv, buy, pv, cart, buy, and determine if they meet specific continuous event sequences like "pv, fav, cart, buy." In practical business scenarios, it is more common to require satisfaction of discontinuous subsequences like pv...fav...cart..buy, as shown in the figure below.

Unfortunately, aggregate functions that provide subsequence matching functionality are typically not supported in database products. Based on the author's understanding, a possible solution is to use string matching functions. This involves converting each user's event sequence into a string and then matching each string with the target sequence string. The SQL pseudocode for this implementation is as follows:

/*Transform the data that meets the conditions of the target event into an event flag, such as e1 , e2 , and e3.*/
with t1 as (
  select
    uid,
    ts,
    case event_type when "pv" then "e1" 
    when "fav" then "e2" 
    when "cart" then "e3" 
    when "buy" then "e4" 
    else "ex" end as event_code
  from
    user_behavior
)
/*Count the number of users at each level.*/
select
  level,
  count(distinct uid)
from
  (
    select
      uid,
      /*Calculate the maximum number of subsequences for each user.*/
      case
      when event_lst like "%e1%e2%e3%e4%" then "level_4"
      when event_lst like "%e1%e2%e3%" then "level_3"
      when event_lst like "%e1%e2%" then "level_2"
      when event_lst like "%e1%" then "level_1"
      else "level_0" end as level
    from
      (
        /*Aggregate user events into an event sequence.*/
        SELECT
          uid,
          GROUP_CONCAT(
            event_code
            order by
              ts asc
          ) as event_lst
        from
          t1
        group by
          uid
      )
  )
  group by level

There are several performance bottlenecks in the above implementation:

GROUP_CONCAT(event_code order by event_time asc) is a group aggregate function. In real business scenarios, there are interference data (for example, click farm will lead to many abnormal events). As a result, the GROUP_CONCAT() has heavy computational burden. These interference data will also increase the computational burden in the subsequent steps.
String fuzzy matching case event_lst like "%e1%e2%e3%e4%" then "level_4". Note that this step usually consumes a large amount of CPU resources. When the matching level is more than 5, the query performance will be greatly affected.
Operations such as type conversion, sorting, and grouping during the entire computing process also greatly reduce the execution efficiency.

In addition, it can be seen that although this implementation works, the SQL is extremely complex and has not been combined with other user attributes, such as the associated queries of user tag tables, so the expansion capability is limited.

Optimization Method for AnalyticDB for MySQL

To address the pain points in the previous funnel scenarios, AnalyticDB for MySQL has introduced the window_funnel function, which is defined as follows:

Function description: The window_funnel function allows you to query events from a sliding time window. It calculates the maximum length of the event list that matches the specified conditions. The function starts matching from the first event in the list and continues the longest ordered matching process. If a match fails, the entire matching process ends. Assuming a large enough time window, the following examples illustrate the usage of this function:

• If query events are c1, c2, and c3, and the user data consists of c1, c2, c3, and c4, the function will match events c1, c2, and c3. Therefore, the function will return 3.

• If query events are c1, c2, and c3, and the user data consists of c4, c3, c2, and c1, the function will match event c1. Therefore, the function will return 1.

• If query events are c1, c2, and c3, and the user data consists of c4 and c3, no events will be matched. Therefore, the function will return 0.

Syntax: window_funnel (window, mode, timestamp, cond1, cond2, ..., condN)

Parameter description

Based on the window_funnel function, we implement the funnel computational logic. The SQL statement is as follows:

select
  funnel_step,
  count(1)
from
  (
    /*Directly calculate the satisfying behavior sequence for each user.*/
    select
      uid,
      window_funnel(
        cast(86400000 /*Standard SQL implementation of semantic alignment is set to 1000 days, and the actual business can be flexibly configured according to the needs*/ as integer),
        "default",
        ts,
        event_type = 'pv',
        event_type = 'fav',
        event_type = 'cart',
        event_type = 'buy'
      ) as funnel_step
    from
      user_behavior
    group by
      uid
  )
group by
funnel_step;

As you can see, compared to standard SQL implementation:

AnalyticDB for MySQL uses the window_funnel function to encapsulate all computational logics into an aggregate function. This simplifies the SQL logic, reduces the complexity of business implementation, and facilitates code maintenance and expansion.
window_tunnel supports setting a sliding window to count users who satisfy specific behavior sequences within the defined time window. Users have the flexibility to adjust the window length. However, achieving the same semantics using standard SQL statements can be challenging.
In addition, the optimized implementation requires only one group by operation for the user_behavior table, eliminating time-consuming tasks such as grouping, sorting, aggregation, type conversion, and character matching. This significantly enhances computing performance. In the same instance, the comparison between the two implementations in terms of performance (execution time) is as follows:

Note:

The test data can be downloaded through https://tianchi.aliyun.com/dataset/649
The test results can be reproduced by purchasing instances through https://www.alibabacloud.com/product/analyticdb-for-mysql
Test instance specification: C series (high-performance edition), four groups of work 96 cores

Summary

This article discusses the scenario of insight analysis - funnel analysis and how to use the window_funnel function in AnalyticDB for MySQL to implement the funnel computation function. Based on the analysis above, we can observe that compared to the traditional SQL implementation, the window_funnel function can reduce the complexity of SQL queries, provide richer semantics for sliding windows, and offer better query performance. The query performance does not degrade as the funnel level deepens. In scenarios with deep funnel levels, there is a performance improvement of over 10 times, enabling end users to obtain analysis results without waiting.

Learn more: https://www.alibabacloud.com/help/analyticdb-for-mysql/latest/window-funnel-retention-function

Check out Free Tier & Special Offers of Alibaba Cloud Database Products: https://www.alibabacloud.com/product/databases

Community

Demystifying Second-level Funnel Analysis Function of AnalyticDB for MySQL

Business Challenges

Technical Challenges

Optimization Method for AnalyticDB for MySQL

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

AnalyticDB for MySQL

PolarDB for MySQL

Hologres

Personalized Content Recommendation Solution