edit-icon download-icon

Use HyperLogLog

Last Updated: Aug 03, 2017

HybridDB for PostgreSQL is nested with native features of Greenplum Database, and also supports HyperLogLog. It provides solutions for industries with the Internet advertisement analysis requirements and requirements similar to estimation analysis computing to facilitate quick estimation of PV, UV, and other business metrics.

Create a HyperLogLog extension

Run the following command to create a HyperLogLog extension:

  1. CREATE EXTENSION hll;

Basic types

  • Run the following command to create a table containing the hll field:

    1. create table agg (id int primary key,userids hll);
  • Run the following command to convert int to hll_hashval:

    1. select 1::hll_hashval;

Basic operators

  • The hll type supports =, !=, <>, ||, and #.

    1. select hll_add_agg(1::hll_hashval) = hll_add_agg(2::hll_hashval);
    2. select hll_add_agg(1::hll_hashval) || hll_add_agg(2::hll_hashval);
    3. select #hll_add_agg(1::hll_hashval);
  • The hll_hashval type supports =, !=, and <>.

    1. select 1::hll_hashval = 2::hll_hashval;
    2. select 1::hll_hashval <> 2::hll_hashval;

Basic functions

  • The supported functions include hll_hash_boolean, hll_hash_smallint, hll_hash_bigint, and other hash functions.

    1. select hll_hash_boolean(true);
    2. select hll_hash_integer(1);
  • hll_add_agg: Used to convert int to the hll format.

    1. select hll_add_agg(1::hll_hashval);
  • hll_union: The union of hll.

    1. select hll_union(hll_add_agg(1::hll_hashval),hll_add_agg(2::hll_hashval));
  • hll_set_defaults: Used to set the precision.

    1. select hll_set_defaults(15,5,-1,1);
  • hll_print: Used for debug information.

    1. select hll_print(hll_add_agg(1::hll_hashval));

Examples

  1. create table access_date (acc_date date unique, userids hll);
  2. insert into access_date select current_date, hll_add_agg(hll_hash_integer(user_id)) from generate_series(1,10000) t(user_id);
  3. insert into access_date select current_date-1, hll_add_agg(hll_hash_integer(user_id)) from generate_series(5000,20000) t(user_id);
  4. insert into access_date select current_date-2, hll_add_agg(hll_hash_integer(user_id)) from generate_series(9000,40000) t(user_id);
  5. postgres=# select #userids from access_date where acc_date=current_date;
  6. ?column?
  7. ------------------
  8. 9725.85273370708
  9. (1 row)
  10. postgres=# select #userids from access_date where acc_date=current_date-1;
  11. ?column?
  12. ------------------
  13. 14968.6596883279
  14. (1 row)
  15. postgres=# select #userids from access_date where acc_date=current_date-2;
  16. ?column?
  17. ------------------
  18. 29361.5209149911
  19. (1 row)
Thank you! We've received your feedback.