Hologres provides a unique visitor (UV) computing solution in ad hoc queries. This solution allows you to use the COUNT DISTINCT function to count the number of UVs in tens of millions of data records in fact tables. The COUNT DISTINCT function helps return accurate and immediate results from raw data. This solution does not require pre-aggregated tables or complex data processing techniques. This simplifies the query process.
Description
If you want to count the number of UVs in tens of millions of data records, you can use the COUNT DISTINCT function and specify a time period in Hologres to directly query data from fact tables. The COUNT DISTINCT function is optimized in Hologres to support one or more fields. This enables the COUNT DISTINCT function to adapt to most UV computing scenarios. In queries in which fact tables are joined with dimension tables, you can configure indexes to improve the query performance.
Advantages and disadvantages
Advantages: This solution meets the requirements of real-time and flexible UV computing in ad hoc queries. You can specify a time period based on your business requirements. Pre-computing and scheduling configurations are not required. The COUNT DISTINCT function is automatically optimized in Hologres to significantly improve computing performance.
Disadvantages: If the amount of data increases, the computing efficiency and supported queries per second (QPS) may decrease.
Use scenarios
This solution is suitable for UV computing on tens of millions of data records.
Examples
Calculate the number of UVs in a wide fact table
In this example, the uid field is specified in the COUNT DISTINCT function to count the number of UVs in a wide fact table.
Create a wide fact table named
ods_app_detail.-- Create a wide fact table. BEGIN; CREATE TABLE IF NOT EXISTS ods_app_detail ( uid int, country text, prov text, city text, channel text, operator text, brand text, ip text, click_time text, year text, month text, day text, ymd text NOT NULL ); CALL set_table_property('ods_app_detail', 'orientation', 'column'); CALL set_table_property('ods_app_detail', 'bitmap_columns', 'country,prov,city,channel,operator,brand,ip,click_time, year, month, day, ymd'); -- Specify a distribution key for the table based on the real-time query requirement. Data is distributed to shards based on the distribution key. CALL set_table_property('ods_app_detail', 'distribution_key', 'uid'); CALL set_table_property('ods_app_detail', 'clustering_key', 'ymd'); CALL set_table_property('ods_app_detail', 'event_time_column', 'ymd'); COMMIT;Use the COUNT DISTINCT function to count the number of UVs.
-- Count the number of UVs and the number of page views (PVs) in a specified month. SELECT COUNT (DISTINCT uid) AS uv, country, prov, city, COUNT(1) AS pv FROM public.ods_app_detail WHERE ymd >= '20240301' AND ymd <= '20240331' GROUP BY country,prov,city;
Count the number of UVs by joining a fact table with a dimension table
In this example, a dimension table and a fact table are joined to count the number of UVs. This operation is required in specific business scenarios.
Prepare the required tables.
-- Create a fact table that stores the operation details of users. BEGIN; CREATE TABLE IF NOT EXISTS ods_app_detail ( uid int, channel text, operator text, brand text, ip text, click_time text, year text, month text, day text, ymd text NOT NULL ); CALL set_table_property('ods_app_detail', 'orientation', 'column'); CALL set_table_property('ods_app_detail', 'bitmap_columns', 'channel,operator,brand,ip,click_time, year, month, day, ymd'); -- Specify a distribution key for the table based on the real-time query requirement. Data is distributed to shards based on the distribution key. CALL set_table_property('ods_app_detail', 'distribution_key', 'uid'); -- Specify fields that can be used in the WHERE clause. We recommend that you configure a field that contains the year, month, and date information as the clustering key and event time column. CALL set_table_property('ods_app_detail', 'clustering_key', 'ymd'); CALL set_table_property('ods_app_detail', 'event_time_column', 'ymd'); COMMIT; -- Create a dimension table that stores the property information of users. BEGIN; CREATE TABLE dim_uid_info ( uid int NOT NULL, name text NOT NULL, gender text NOT NULL, country text, prov text, city text ); CALL set_table_property('dim_uid_info', 'orientation', 'column'); CALL set_table_property('dim_uid_info', 'bitmap_columns', 'country,prov,city'); CALL set_table_property('dim_uid_info', 'distribution_key', 'uid'); COMMIT;Count the number of UVs in a specific period of time by joining the fact table with the dimension table.
-- Count the number of UVs and the number of PVs of male users in a specific month. SELECT COUNT (DISTINCT B.uid) AS uv, country, prov, city, COUNT(1) AS pv FROM ( SELECT uid,country,prov,city FROM dim_uid_info WHERE gender = 'man' ) AS A LEFT JOIN ods_app_detail AS B ON A.uid = B.uid WHERE B.ymd >= '20240301' AND B.ymd <= '20240331' GROUP BY country,prov,city;