edit-icon download-icon

Demand analysis

Last Updated: Mar 27, 2018

Analyze the following items based on this network log:

  1. Collect the statistics of Page View (PV) and Unique Visitor (UV) of the website based on the device types of the user (such as Android, iPad, iPhone, and PC), and generate the daily statistical statement.

  2. Obtain the access sources of the website to learn about the sources of website traffic.

    [Description] Website statistical indicators:

    PV and UV are two basic indicators for measuring website traffic. Each time a web page is opened is counted as one PV and each viewing of the page adds to its page view count. UV refers to the number of unique visitors to the website per day. Only one UV is counted if the same visitor accesses the website multiple times.

    Referer refers to the source of the request log. As a critical indicator for advertising evaluation of websites, Referer analyzes the complete access source, visitors, and their preferences.

The following describes the procedure to meet these two requirements:

1) Import the log data to the ODPS table. From the perspective of data warehouse, this table belongs to the ODS layer. Therefore, the name of this ODPS table is ods_log_tacker.

2) Process the data. The section for data description explains that the $request field of log data includes the “HTTP request type + request URL + HTTP protocol version number”. Since the subsequent analysis normally performs separate query and statistics on GET request and URL, the request field of the original table needs to be split. Write the split data into the table dw_log_parser (table at the data warehouse layer).

3) Analyze the data. Generally, in terms of user identity, website log data can be classified as real user requests and the requests sent by programs (such as subscription programs and crawlers). The statistical indicators for website traffic (PV and UV) are typically based on the logs of real user requests. In addition, the requests other than those sent on webpages when the user access the website, such as requests sent by the JS and images, must be filtered from the statistics of real user requests. For this reason, the analysis results must be written to a new table named dw_log_detail (table at the data warehouse layer).

4) In the data warehouse, the dimension table and fact table are normally created with the in-depth data analysis. In this tutorial, the user dimension table dim_user_info and the fact table dw_log_fact for the website access can be created.

5) Based on the dimension table and fact table at the data warehouse layer, the PV and UV statistical tables are generated based on the user device information (adm_user_measures at the application data mart layer) and the source URL statistical table of website requests (adm_refer_info at the application data mart layer).

Thank you! We've received your feedback.