HyperLogLog functions are approximate aggregate functions, similar to approx_distinct. They quickly return estimated results when you analyze large datasets.
The following HyperLogLog functions are available.
|
Function |
Syntax |
Description |
SQL support |
SPL support |
|
approx_set(x) |
Estimates the number of distinct values in x. The default maximum standard error is 0.01625. |
√ |
× |
|
|
cardinality(x) |
Converts a HyperLogLog value to a bigint value. |
√ |
× |
|
|
empty_approx_set() |
Returns an empty HyperLogLog set. The default maximum standard error is 0.01625. |
√ |
× |
|
|
merge(x) |
Merges multiple HyperLogLog sets into a single set. |
√ |
× |
approx_set function
The approx_set function estimates the number of distinct values in x. The default maximum standard error is 0.01625.
Syntax
approx_set(x)
Parameters
|
Parameter |
Description |
|
x |
An expression of any data type. |
Return value type
HyperLogLog
Examples
Estimate per-minute unique visitors (UVs) for a website. The result is in HyperLogLog format.
-
Query and analysis statement
* | SELECT date_trunc('minute', __time__) AS Time, approx_set(client_ip) AS UV FROM website_log GROUP BY Time ORDER BY Time -
Query and analysis results: Returns
Time(timestamp truncated to the minute) andUV(encoded approximate set fromapprox_set). Total: 16 records.
cardinality function
The cardinality function converts a HyperLogLog value to a bigint value.
Syntax
cardinality(x)
Parameters
|
Parameter |
Description |
|
x |
A HyperLogLog expression. |
Return value type
bigint
Examples
Use approx_set to estimate per-minute UVs (returns a HyperLogLog value), then use cardinality to convert that value to a bigint.
-
Query and analysis statement
* | SELECT Time, cardinality(UV) AS UV FROM ( SELECT date_trunc('minute', __time__) AS Time, approx_set(client_ip) AS UV FROM website_log GROUP BY Time ORDER BY Time ) AS UV -
Results show per-minute UV counts in the Time and UV columns. For example, 78 UVs at 2021-09-09 15:12:00 and 561 at 15:13:00.
empty_approx_set function
The empty_approx_set function returns an empty HyperLogLog set. The default maximum standard error is 0.01625.
Syntax
empty_approx_set()
Return value type
HyperLogLog
Examples
Returns an empty HyperLogLog set.
-
Query and analysis statement
* | SELECT empty_approx_set() -
Returns a single column
_col0with valueAgwAAA==, a Base64-encoded empty approximate set.
merge function
The merge function aggregates multiple HyperLogLog sets into a single set.
Syntax
merge(x)
Parameters
|
Parameter |
Description |
|
x |
A HyperLogLog expression. |
Return value type
HyperLogLog
Examples
Use approx_set to estimate per-minute UVs, merge to aggregate the UV sets, and cardinality to convert the final value to a bigint.
-
Query and analysis statement
* | SELECT Time, cardinality(UV) AS UV, cardinality(merge(UV) over()) AS Total_UV FROM ( SELECT date_trunc('minute', __time__) AS Time, approx_set(client_ip) AS UV FROM log GROUP BY Time ORDER BY Time ) -
Query and analysis results: Per-minute UV counts vary (561, 658, 625), while Total_UV shows the cumulative total (8564) across all rows.