This topic provides a use case of Gegejia, a partner of Realtime Compute, to describe how to use Realtime Compute to create real-time page view (PV) and unique visitor (UV) curves.
Background
With the rise of new retail, competition in the Internet e-commerce industry is becoming increasingly fierce. Real-time data is particularly important to the e-commerce industry, such as collecting statistics on the total PVs and UVs to a website.
Use case
- Business architecture
- Business process
- Use the SDK provided by DataHub to synchronize the binlog file to DataHub.
- Use Realtime Compute to subscribe to data in DataHub for real-time computing.
- Insert real-time data into RDS.
- Display data in Alibaba Cloud DataV or other visual dashboards.
- Preparations
The following table describes the fields in a log source table.
Field name Data type Description account_id VARCHAR The ID of the user. client_ip VARCHAR The IP address of the client. client_info VARCHAR The model of the device. platform VARCHAR The operating system type of the device. imei VARCHAR The International Mobile Equipment Identity (IMEI) number of the device. version BIGINT The operating system version of the device. action BIGINT The page jump description. gpm VARCHAR The tracking path. c_time VARCHAR The time when the request was made. target_type VARCHAR The type of requested data. target_id VARCHAR The ID of requested data. udata VARCHAR The extended information. session_id VARCHAR The ID of the session. product_id_chain VARCHAR The string of product IDs. cart_product_id_chain VARCHAR The ID string of the products added to the cart. tag VARCHAR The special tag. position VARCHAR The location of the user. network VARCHAR The network type of the user. p_dt VARCHAR The time partition by day. p_platform VARCHAR The partition system version. The following table describes the fields in an RDS result table.
Field name Data type Description summary_date BIGINT The date when the statistics are collected. summary_min VARCHAR The minute when the statistics are collected. pv BIGINT The number of clicks on the specified website. uv BIGINT The number of visitors who click the specified website. Note Only one UV is counted for multiple clicks by the same visitor within one day.currenttime TIMESTAMP The current system time. - Business logic
// Create source tables. CREATE TABLE source_ods_fact_log_track_action ( account_id VARCHAR, // The ID of the user. client_ip VARCHAR, // The IP address of the client. client_info VARCHAR, // The model of the device. platform VARCHAR, // The operating system type of the device. imei VARCHAR, // The IMEI number of the device. `version` VARCHAR, // The operating system version of the device. `action` VARCHAR, // The page jump description. gpm VARCHAR, // The tracking path. c_time VARCHAR, // The time when the request was made. target_type VARCHAR, // The type of requested data. target_id VARCHAR, // The ID of requested data. udata VARCHAR, // The extended information in JSON format. session_id VARCHAR, // The ID of the session. product_id_chain VARCHAR, // The string of product IDs. cart_product_id_chain VARCHAR, // The ID string of the products added to the car. tag VARCHAR, // The special tag. `position` VARCHAR, // The location of the user. network VARCHAR, // The network type of the user. p_dt VARCHAR, // The time partition by day. p_platform VARCHAR // The partition system version. ) WITH ( type='datahub', endPoint='yourEndpointURL', project='yourProjectName', topic='yourTopicName', accessId='yourAccessId', accessKey='yourAccessSecret', batchReadSize='1000' ); CREATE TABLE result_cps_total_summary_pvuv_min ( summary_date BIGINT, // The date when the statistics are collected. summary_min VARCHAR, // The minute when the statistics are collected. pv BIGINT, // The number of clicks on the specified website. uv BIGINT, // The number of visitors who click the specified website. Only one UV is counted for multiple clicks by the same visitor within one day. currenttime TIMESTAMP, // The current system time. primary key (summary_date,summary_min) ) WITH ( type= 'rds', url = 'yourRDSDatabaseURL', userName = 'yourDatabaseUserName', password = 'yourDatabasePassword', tableName = 'yourTableName' ); CREATE VIEW result_cps_total_summary_pvuv_min_01 AS select cast(p_dt as BIGINT) as summary_date // The time partition by day. ,count(client_ip) as pv // Compute the number of PVs by client IP address. ,count(distinct client_ip) as uv // Deduplicate visitors by client IP address. ,cast(max(c_time ) as TIMESTAMP) as c_time // The time when the request was made. from source_ods_fact_log_track_action group by p_dt; INSERT into result_cps_total_summary_pvuv_min select a.summary_date, // The time partition by day. cast(DATE_FORMAT(c_time,'HH:mm') as VARCHAR) as summary_min, // Obtain the time string representing the hour and minute. a.pv, a.uv, CURRENT_TIMESTAMP as currenttime // The current system time. from result_cps_total_summary_pvuv_min_01 AS a ;
- Key points
To help you understand structured code and facilitate code maintenance, we recommend that you use views to split the business logic into two modules. For more information about views, see Create a data view.
- Module 1
CREATE VIEW result_cps_total_summary_pvuv_min_01 AS select cast(p_dt as BIGINT) as summary_date // The time partition by day. ,count(client_ip) as pv // Compute the number of PVs by client IP address. ,count(distinct client_ip) as uv // Deduplicate visitors by client IP address. ,cast(max(c_time ) as TIMESTAMP) as c_time // The time when the request was made. from source_ods_fact_log_track_action group by p_dt;
- Compute the number of clicks on the website by client IP address and use it as the number of PVs. Deduplicate visitors by client IP address to compute the number of UVs.
cast(max(c_time) as TIMESTAMP)
// The time when the last request was made.- This snippet of business logic code groups data by
p_dt
(specifying the time partition byday
) and usesmax(c_time)
to determine the last access time in the time partition. Finally, Realtime Compute inserts the numbers of PVs and UVs into the database.
The following table describes the output results.
p_dt pv uv max(c_time)
2017-12-12 1000 100 2017-12-12 9:00:00
2017-12-12 1500 120 2017-12-12 9:01:00
2017-12-12 2200 200 2017-12-12 9:02:00
2017-12-12 3300 320 2017-12-12 9:03:00
- Module 2
INSERT into result_cps_total_summary_pvuv_min select a.summary_date, // The time partition by day. cast(DATE_FORMAT(c_time,'HH:mm') as VARCHAR) as summary_min, // Obtain the time string representing the hour and minute. a.pv, a.uv, CURRENT_TIMESTAMP as currenttime // The current system time. from result_cps_total_summary_pvuv_min_01 AS a
Extract the data in module 1 by hour and minute and obtain the PV and UV growth curves, as shown in the following figure.
- Module 1