HLL_COUNT_INIT is an aggregate function that builds a HyperLogLog++ (HLL++) sketch from a column of values. The resulting sketch is a compact BINARY representation used for approximate distinct counting at scale.
Usage notes
The BINARY sketches produced by HLL_COUNT_INIT are the required input for HLL_COUNT_EXTRACT, HLL_COUNT_MERGE, and HLL_COUNT_MERGE_PARTIAL. Sketches generated by external systems are not compatible and cannot be used as input.
Syntax
BINARY HLL_COUNT_INIT(<col_name> [, BIGINT <precision>])Parameters
| Parameter | Required | Type | Description |
|---|---|---|---|
col_name | Yes | BIGINT, DECIMAL, STRING, or BINARY | The column to aggregate into a sketch. |
precision | No | BIGINT in [10, 24] | Controls sketch accuracy. Default: 15. Higher values reduce estimation error but increase sketch size. |
Return value
Returns a BINARY HLL++ sketch. Returns NULL if col_name is NULL.
Example
The following query builds an HLL++ sketch per country to approximate the number of distinct customers invoiced in each country. The precision is set to 10.
SELECT
country,
HLL_COUNT_INIT(customer_id, 10) AS hll_sketch
FROM VALUES
('UA', 'customer_id_1', 'invoice_id_11'),
('BR', 'customer_id_3', 'invoice_id_31'),
('CZ', 'customer_id_2', 'invoice_id_22'),
('CZ', 'customer_id_2', 'invoice_id_23'),
('BR', 'customer_id_3', 'invoice_id_31'),
('UA', 'customer_id_2', 'invoice_id_24')
t(country, customer_id, invoice_id)
GROUP BY country;Output:
+---------+------------+
| country | hll_sketch |
+---------+------------+
| BR | =02=01=0A=00=01=00=00=00=20s=8E=00 |
| CZ | =02=01=0A=00=01=00=00=00=98_$=03 |
| UA | =02=01=0A=00=02=00=00=00=F0=8B=DD=00=98_$=03 |
+---------+------------+Each row contains a binary sketch for the corresponding country. Pass these sketches to HLL_COUNT_EXTRACT to retrieve distinct counts, or to HLL_COUNT_MERGE to combine sketches across partitions before extracting a final count.
What's next
HLL_COUNT_EXTRACT — retrieve the approximate distinct count from a sketch
HLL_COUNT_MERGE — merge multiple sketches into one
HLL_COUNT_MERGE_PARTIAL — partially merge sketches
HyperLogLog++ functions — full list of HLL++ functions in MaxCompute