This topic describes the hll extension, which supports the HyperLogLog data type to estimate metrics such as page views (PVs) and unique visitors (UVs).

Prerequisites

The extension is supported on the PolarDB for PostgreSQL clusters that run the following engines:
  • PostgreSQL 14 (revision version 14.5.2.0 or later)
  • PostgreSQL 11 (revision version 1.1.28 or later)
Note You can execute the following statement to view the minor version that is used by PolarDB for PostgreSQL:
show polar_version;

Background information

The hll extension supports an extendable, set-resembled data type HyperLogLog (hll) to estimate DISTINCT elements under a specified accuracy. For example, you can use 1,280 bytes of hll data to accurately estimate billions of DISTINCT elements. The hll extension is suitable for industries that need estimation analysis, such as Internet advertisement analysis to estimate PVs and UVs.

Create the hll extension

CREATE EXTENSION hll;

Basic operations

  • Execute the following statement to create a table that contains hll fields:
    CREATE TABLE agg (id INT PRIMARY KEY, userids hll);
  • Execute the following statement to convert INT data to hll_hashval data:
    SELECT 1::hll_hashval;

Basic operators

  • The hll data type supports the following operators:
    • =
    • !=
    • <>
    • ||
    • #

    Examples:

    SELECT hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
    SELECT hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
    SELECT #hll_add_agg(1::hll_hashval);
  • The hll_hashval data type supports the following operators:
    • =
    • !=
    • <>

    Examples:

    SELECT 1::hll_hashval = 2::hll_hashval;
    SELECT 1::hll_hashval <> 2::hll_hashval;

Basic functions

  • The hll extension supports hash functions such as hll_hash_boolean, hll_hash_smallint, and hll_hash_bigint. Examples:
    SELECT hll_hash_boolean(true);
    SELECT hll_hash_integer(1);
  • The hll extension supports the hll_add_agg function to convert the data type from INT to hll. Example:
    SELECT hll_add_agg(1::hll_hashval);
  • The hll extension supports the hll_union function to perform UNION operations on hll data. Example:
    SELECT hll_union(hll_add_agg(1::hll_hashval), hll_add_agg(2::hll_hashval));
  • The hll extension supports the hll_set_defaults function to set the accuracy. Example:
    SELECT hll_set_defaults(15,5,-1,1);
  • The hll extension supports the hll_print function to display debug information. Example:
    SELECT hll_print(hll_add_agg(1::hll_hashval));

Delete the hll extension

DROP EXTENSION hll;

Examples

CREATE TABLE access_date (acc_date DATE UNIQUE, userids hll);
INSERT INTO access_date
     SELECT current_date, hll_add_agg(hll_hash_integer(user_id))
     FROM generate_series(1,10000) t(user_id);
INSERT INTO access_date
     SELECT current_date-1, hll_add_agg(hll_hash_integer(user_id))
     FROM generate_series(5000,20000) t(user_id);
INSERT INTO access_date
     SELECT current_date-2, hll_add_agg(hll_hash_integer(user_id))
     FROM generate_series(9000,40000) t(user_id);

SELECT #userids FROM access_date WHERE acc_date=current_date;
     ?column?
------------------
 9725.852733707077
(1 row)

SELECT #userids FROM access_date WHERE acc_date=current_date-1;
     ?column?
------------------
 14968.65968832792
(1 row)

SELECT #userids FROM access_date WHERE acc_date=current_date-2;
     ?column?
------------------
 29361.520914991113
(1 row)