This topic describes the hll extension, which supports the HyperLogLog data type to estimate metrics such as page views (PVs) and unique visitors (UVs).
Background information
The hll extension supports an extendable, set-resembled data type HyperLogLog (hll) to estimate DISTINCT elements under a specified accuracy. For example, you can use 1,280 bytes of hll data to accurately estimate billions of DISTINCT elements. The hll extension is suitable for industries that need estimation analysis, such as Internet advertisement analysis to estimate PVs and UVs.
Create the hll extension
CREATE EXTENSION hll;
Basic operations
- Execute the following statement to create a table that contains hll fields:
CREATE TABLE agg (id INT PRIMARY KEY, userids hll);
- Execute the following statement to convert INT data to hll_hashval data:
SELECT 1::hll_hashval;
Basic operators
- The hll data type supports the following operators:
- =
- !=
- <>
- ||
- #
Examples:
SELECT hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval); SELECT hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval); SELECT #hll_add_agg(1::hll_hashval);
- The hll_hashval data type supports the following operators:
- =
- !=
- <>
Examples:
SELECT 1::hll_hashval = 2::hll_hashval; SELECT 1::hll_hashval <> 2::hll_hashval;
Basic functions
- The hll extension supports hash functions such as hll_hash_boolean, hll_hash_smallint, and hll_hash_bigint. Examples:
SELECT hll_hash_boolean(true); SELECT hll_hash_integer(1);
- The hll extension supports the hll_add_agg function to convert the data type from INT to hll. Example:
SELECT hll_add_agg(1::hll_hashval);
- The hll extension supports the hll_union function to perform UNION operations on hll data. Example:
SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));
- The hll extension supports the hll_set_defaults function to set the accuracy. Example:
SELECT hll_set_defaults(15,5,-1,1);
- The hll extension supports the hll_print function to display debug information. Example:
SELECT hll_print(hll_add_agg(1::hll_hashval));
Delete the hll extension
DROP EXTENSION hll;
Examples
CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);
INSERT INTO access_date
SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(1,10000) t(user_id);
INSERT INTO access_date
SELECT current_date-1, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(5000,20000) t(user_id);
INSERT INTO access_date
SELECT current_date-2, hll_add_agg(hll_hash_integer(user_id))
FROM generate_series(9000,40000) t(user_id);
SELECT #userids FROM access_date WHERE acc_date=current_date;
?column?
------------------
9725.852733707077
(1 row)
SELECT #userids FROM access_date WHERE acc_date=current_date-1;
?column?
------------------
14968.65968832792
(1 row)
SELECT #userids FROM access_date WHERE acc_date=current_date-2;
?column?
------------------
29361.520914991113
(1 row)