HybridDB for PostgreSQL is nested with native features of Greenplum Database, and also supports HyperLogLog. It provides solutions for industries with the Internet advertisement analysis requirements and requirements similar to estimation analysis computing to facilitate quick estimation of PV, UV, and other business metrics.

Create a HyperLogLog extension

Run the following command to create a HyperLogLog extension:

CREATE EXTENSION hll;

Basic types

  • Run the following command to create a table containing the hll field:

    create table agg (id int primary key,userids hll);
  • Run the following command to convert int to hll_hashval:

    select 1::hll_hashval;

Basic operators

  • The hll type supports =, !=, <>, ||, and #.

    select hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
    select hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
    select #hll_add_agg(1::hll_hashval);
  • The hll_hashval type supports =, !=, and <>.

    select 1::hll_hashval = 2::hll_hashval;
    select 1::hll_hashval <> 2::hll_hashval;

Basic functions

  • The supported functions include hll_hash_boolean, hll_hash_smallint, hll_hash_bigint, and other hash functions.

    select hll_hash_boolean(true);
    select hll_hash_integer(1);
  • hll_add_agg: Used to convert int to the hll format.

    select hll_add_agg(1::hll_hashval);
  • hll_union: The union of hll.

    select hll_union(hll_add_agg(1::hll_hashval),hll_add_agg(2::hll_hashval));
  • hll_set_defaults: Used to set the precision.

    select hll_set_defaults(15,5,-1,1);
  • hll_print: Used for debug information.

    select hll_print(hll_add_agg(1::hll_hashval));

Examples

create table access_date (acc_date date unique, userids hll);
insert into access_date select current_date, hll_add_agg(hll_hash_integer(user_id)) from generate_series(1,10000) t(user_id);
insert into access_date select current_date-1, hll_add_agg(hll_hash_integer(user_id)) from generate_series(5000,20000) t(user_id);
insert into access_date select current_date-2, hll_add_agg(hll_hash_integer(user_id)) from generate_series(9000,40000) t(user_id);
postgres=# select #userids from access_date where acc_date=current_date;
     ?column?
------------------
 9725.85273370708
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-1;
     ?column?
------------------
 14968.6596883279
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-2;
     ?column?
------------------
 29361.5209149911
(1 row)