The funnel analysis feature of Data Lake Analytics (DLA) allows Q&M personnel to analyze the conversion rates at each step in a multi-step operation process. Assume that the purchase of a product triggers a series of events, including startup, logon, product search, product browsing, and order generation. O&M personnel analyze the conversion rates for these events in sequence within a period of time (for example, from January 5, 2017 to February 5, 2017). During the analysis, they calculate the following numbers: A, which refers to the total number of persons that trigger logon events among all users. B, which refers to the total number of persons that trigger product search events among A. C, which refers to the total number of persons that trigger product browsing events among B. D, which refers to the total number of persons that trigger order generation events among C.

Time window is a term in funnel analysis. All events are triggered within the same time window. Assume that the time window is one day. The product search events triggered by User 001 are valid only when they occur on the same day as logon events triggered by User 001. Similarly, the product browsing events triggered by User 001 are valid only when they occur on the same day as logon events triggered by User 001. The time window can be set to any time range, such as one day, three days, seven days, one hour, or six hours.

Prerequisites

The test data in this example is stored in Object Storage Service (OSS). You can perform the following steps to store your test data in OSS:

  1. Activate OSS.
  2. Create directories.
  3. Upload objects.

    In this example, the following test data is uploaded to the funnel_data folder:

    1000002  1483258815538  Add products to favorites  {}  20170101
     1000002  1483274981790  Start up  {}  20170101
     1000002  1483223087508  Search for products  {"content": "computer", "page_num": 1}  20170101
     1000002  1483232016805  Search for products  {"content": "Apple", "page_num": 2}  20170101
     1000002  1483200895552  Complete orders  {}  20170101
     1000003  1483218002826  Search for products  {"content": "computer", "page_num": 2}  20170101
     1000003  1483206471681  Add products to the shopping cart  {}  20170101
     1000003  1483284553531  Add products to the shopping cart  {}  20170101
     1000003  1483279891663  Browse products  {"brand": "Apple", "price": 9500}  20170101
     1000003  1483259182702  Add products to the shopping cart  {}  20170101
     1000004  1483260505099  Log on  {}  20170101
     1000004  1483231995064  Add products to favorites  {}  20170101
     1000004  1483206318588  Add products to the shopping cart  {}  20170101
     1000004  1483241973408  Log on  {}  20170101
     1000004  1483202785557  Add products to favorites  {}  20170101
     1000005  1483214265416  Add products to favorites  {}  20170101
     1000005  1483206343383  Add products to favorites  {}  20170101
     1000005  1483229265488  Browse products  {"brand": "HuaW", "price": 9500}  20170101
     1000005  1483244294552  Generate orders  {"price_all": 5000}  20170101
     1000005  1483246988534  Log on  {}  20170101
     1000006  1483207838307  Add products to favorites  {}  20170101

Before you use funnel analysis, make sure that the following prerequisites are met:

  1. DLA is activated. For more information, see Activate DLA.
  2. Reset the database password

Step 1: Create an OSS schema

​CREATE Schema funnel_test_schema
WITH DBPROPERTIES (
  catalog = 'oss',
  location = 'oss://bucket-name/'
)​

Step 2. Create a table

​CREATE EXTERNAL TABLE IF NOT EXISTS funnel_test ( 
     user_id bigint NOT NULL COMMENT '',
     event_time bigint NOT NULL COMMENT '',
     event_id int NOT NULL COMMENT '',
     event_name varchar NOT NULL COMMENT '',
     event_attr varchar NOT NULL COMMENT '',
     event_date date NOT NULL COMMENT ''
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE 
LOCATION 'oss://bucket-name/funnel_data/';​

Step 3: Use a funnel analysis function to analyze data

The event IDs for the startup, logon, product search, product browsing, and order generation events are 10001, 10002, 10003, 10004, and 10007 in sequence.

  • In the following example, the funnel_count function is used to query the funnels that contain five events. These events are triggered from January 1 to January 20 with a time window of seven days. The unit of the time window is milliseconds. The function can also be used to save the data of the users involved in each event.

    ​SELECT user_id,
      funnel_count(event_time,7 * 86400000,event_id,
      '10001,10002,10003,10004,10007') AS xwho_state
      FROM funnel_test
      WHERE event_id IN (10001, 10002, 10003, 10004, 10007)
      AND event_date BETWEEN '2017-01-01' AND '2019-01-20'
      GROUP BY user_id;​
  • In the following example, the funnel_sum function is used based on the funnel_count function. This allows you to obtain the overall conversion rate of the events.

    ​SELECT funnel_sum(xwho_state, 5)
      FROM (
        SELECT user_id,funnel_count(event_time,7 * 86400000, event_id,
       '10001,10002,10003,10004,10007') AS xwho_state
        FROM funnel_test
        WHERE event_id IN (10001, 10002, 10003, 10004, 10007)
          AND event_date BETWEEN '2017-01-01' AND '2019-01-20'
        GROUP BY user_id 
      );​
  • If you want to filter data in the JSON attribute column event_attr, for example, filtering the prices that range from 3,500 to 5,000 from events whose IDs are 10004, you can execute the following SQL statement:

    ​SELECT funnel_sum(xwho_state, 5)
      FROM (
        SELECT user_id,funnel_count(event_time, 7 * 86400000,event_id,
        '10001,10002,10003,10004,10007') AS xwho_state
        FROM funnel_test
        WHERE event_id IN (10001, 10002, 10003, 10004, 10007)
          AND event_date BETWEEN '2017-01-01' AND '2019-01-20'
           OR (event_id = 10004 AND
               json_extract_scalar(event_attr, '$.price') BETWEEN 3500 AND 5000)
        GROUP BY user_id 
      );​

The preceding paths and events are represented by IDs. In actual scenarios, you may have only event names. DLA allows event names to represent the path events. In this example, the event names are listed in the event_name column.

​SELECT funnel_sum(xwho_state, 7) AS funnel
FROM (SELECT user_id,funnel_count(event_time, 7 * 86400000,event_name,'startup,logon,product search, product browsing,order generation,order completion,product evaluation') AS xwho_state FROM funnel_test
      WHERE event_name IN 
      ('startup', 'logon', 'product search', 'product browsing', 'order generation', 'order completion', 'product evaluation')
      GROUP BY user_id);​