AnalyticDB for PostgreSQL has the features of Greenplum Database and supports HyperLogLog. AnalyticDB for PostgreSQL is suited for industries such as Internet advertising and estimation analysis that require quick estimation of business metrics such as page views (PVs) and unique visitors (UVs).

Important To install or upgrade extensions on an instance that runs V6.3.8.9 or later, Submit a ticket.

For more information about how to view the minor version of an instance, see View the minor engine version.

Create a HyperLogLog extension

Execute the following statement to create a HyperLogLog extension:

CREATE EXTENSION hll;

Basic types

  • Execute the following statement to create a table that contains the hll field:

    create table agg (id int primary key,userids hll);
  • Execute the following statement to convert int to hll_hashval:

    select 1::hll_hashval;

Basic operators

  • The hll type supports =, !=, <>, ||, and #.
    select hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
    select hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
    select #hll_add_agg(1::hll_hashval);
  • The hll_hashval type supports =, !=, and <>.
    select 1::hll_hashval = 2::hll_hashval;
    select 1::hll_hashval <> 2::hll_hashval;

Basic functions

  • Hash functions such as hll_hash_boolean, hll_hash_smallint, and hll_hash_bigint.
    select hll_hash_boolean(true);
    select hll_hash_integer(1);
  • hll_add_agg: converts the int format to the hll format.
    select hll_add_agg(1::hll_hashval);
  • hll_union: aggregates the hll fields.
    select hll_union(hll_add_agg(1::hll_hashval),hll_add_agg(2::hll_hashval));
  • hll_set_defaults: sets the precision.
    select hll_set_defaults(15,5,-1,1);
  • hll_print: displays debugging information.
    select hll_print(hll_add_agg(1::hll_hashval));

Examples

create table access_date (acc_date date unique, userids hll);
insert into access_date select current_date, hll_add_agg(hll_hash_integer(user_id)) from generate_series(1,10000) t(user_id);
insert into access_date select current_date-1, hll_add_agg(hll_hash_integer(user_id)) from generate_series(5000,20000) t(user_id);
insert into access_date select current_date-2, hll_add_agg(hll_hash_integer(user_id)) from generate_series(9000,40000) t(user_id);
postgres=# select #userids from access_date where acc_date=current_date;
     ?column?
------------------
 9725.85273370708
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-1;
     ?column?
------------------
 14968.6596883279
(1 row)
postgres=# select #userids from access_date where acc_date=current_date-2;
     ?column?
------------------
 29361.5209149911
(1 row)