This topic describes the basic syntax and provides examples of approximate functions.
Simple Log Service supports the following approximate functions.
Function name | Syntax | Description | SQL support | SPL support |
approx_distinct(x) | Calculates the approximate number of distinct values in x. The default standard error is 2.3%. | √ | × | |
approx_distinct(x, e) | Calculates the approximate number of distinct values in x and lets you specify a custom standard error. | √ | × | |
approx_percentile(x, percentage) | Sorts x in ascending order and returns the approximate value of x at the specified percentage. The result is an approximation, and its stability and consistency are not guaranteed. | √ | × | |
approx_percentile(x, array[percentage01, percentage02...]) | Sorts x in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The result is an approximation, and its stability and consistency are not guaranteed. | √ | × | |
approx_percentile(x, weight, percentage) | Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The result is an approximation, and its stability and consistency are not guaranteed. | √ | × | |
approx_percentile(x, weight, array[percentage01, percentage02...]) | Sorts the product of x and its weight in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The result is an approximation, and its stability and consistency are not guaranteed. | √ | × | |
approx_percentile(x, weight, percentage, accuracy) | Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. Lets you set the accuracy of the return value. The result is an approximation, and its stability and consistency are not guaranteed. | √ | × | |
numeric_histogram(bucket, x) | Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in JSON format. | √ | × | |
numeric_histogram(bucket, x, weight) | Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in JSON format. Lets you set a weight for x. | √ | × | |
numeric_histogram_u(bucket, x) | Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in a multi-row, multi-column format. | √ | × | |
approx_most_frequent(k, x) | Calculates the approximate frequencies of the | √ | × |
approx_distinct function
The approx_distinct function calculates the approximate number of distinct values in x.
Syntax
Calculates the approximate number of distinct values in x. The default standard error is 2.3%.
approx_distinct(x)Calculates the approximate number of distinct values in x and lets you specify a custom standard error.
approx_distinct(x, e)
Parameters
Parameter | Description |
x | The value can be of any data type. |
e | The custom standard error. The value must be in the range of [0.0115, 0.26]. |
Return value type
bigint
Examples
Example 1: Use the count function to calculate page views (PVs) and the approx_distinct function to calculate the approximate number of unique visitors (UVs) based on the distinct values of the client_ip field. The standard error is 2.3%.
Query statement
* |SELECT count(*) AS PV, approx_distinct(client_ip) AS UVQuery and analytic results

Example 2: Use the count function to calculate PVs and the approx_distinct function to calculate the approximate number of UVs based on the distinct values of the client_ip field. The custom standard error is 10%.
Query statement
* |SELECT count(*) AS PV, approx_distinct(client_ip,0.1) AS UVQuery and analytic results

approx_percentile function
The approx_percentile function sorts x in ascending order and returns the approximate value at the specified percentage. Because the result is an approximation, its stability and consistency are not guaranteed.
Syntax
Sorts x in ascending order and returns the approximate value of x at the specified percentage. The return value is a double.
approx_percentile(x, percentage)Sorts x in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The return value is of the array(double,double) type.
approx_percentile(x, array[percentage01, percentage02...])Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The return value is a double.
approx_percentile(x, weight, percentage)Sorts the product of x and its weight in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The return value is of the array(double,double) type.
approx_percentile(x, weight, array[percentage01, percentage02...])Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The return value is a double. You can also set the accuracy of the return value.
approx_percentile(x, weight, percentage, accuracy)
Parameters
Parameter | Description |
x | The value must be of the double type. |
percentage | The percentage value. The value must be in the range of [0, 1]. |
accuracy | The accuracy. The value must be in the range of (0, 1). |
weight | The weight. The value must be an integer greater than 1. If you set a weight, the system sorts the data based on the product of x and the weight. |
Return value type
double or array(double,double)
Examples
Example 1: Sorts the request_time column and returns the approximate value of the request_time field at the 50th percentile.
Query statement
*| SELECT approx_percentile(request_time,0.5)Query and analytic results

Example 2: Sort the request_time column and return the request_time values at the 10th, 20th, and 70th percentiles.
Query statement
*| SELECT approx_percentile(request_time,array[0.1,0.2,0.7])Query and analytic results

Example 3: This example sorts the request_time column based on the product of the request_time value and its weight, and then returns the approximate 50th percentile value from the request_time field. The weight is 100 if the request_time value is less than 20, and 10 otherwise.
Query statement
* | SELECT approx_percentile( request_time,case when request_time < 20 then 100 else 10 end, 0.5 )Query and analytic results

Example 4: Sort the request_time column by the product of request_time and its weight, and then return the approximate values of request_time at the 80th and 90th percentiles. The weight is 100 if request_time is less than 20. Otherwise, the weight is 10.
Query statement
* | SELECT approx_percentile( request_time,case when request_time < 20 then 100 else 10 end, array [0.8,0.9] )Query and analytic results

Example 5: Sort the request_time column by the product of request_time and its weight, and then return the approximate value of the request_time field at the 50th percentile with an accuracy of 0.2, where the weight is 100 if request_time is less than 20 and 10 otherwise.
Query statement
* | SELECT approx_percentile( request_time,case when request_time < 20 then 100 else 10 end, 0.5, 0.2 )Query and analytic results

numeric_histogram function
The numeric_histogram function calculates an approximate histogram for x. The result is returned in JSON format.
Syntax
Calculates an approximate histogram for x based on a specified number of buckets.
numeric_histogram(bucket, x)Calculates an approximate histogram for x based on a specified number of buckets. You can also specify a weight for x.
numeric_histogram(bucket, x, weight)
Parameters
Parameter | Description |
bucket | The number of columns in the histogram. The value must be of the bigint type. |
x | The value must be of the double type. |
weight | The weight. The value must be an integer greater than 0. If you set a weight, the system groups the data based on the product of x and the weight. |
Return value type
JSON
Examples
Example 1: Calculate an approximate histogram of request durations for POST requests.
Query statement
request_method:POST | SELECT numeric_histogram(10,request_time)Query and analytic results

Example 2: Calculate a weighted, approximate histogram of request durations for POST requests. The weight is based on the request_time value.
Query statement
request_method:POST| SELECT numeric_histogram(10, request_time,case when request_time<20 then 100 else 10 end)Query and analytic results

numeric_histogram_u function
The numeric_histogram_u function calculates an approximate histogram for x. The result is returned in a multi-row, multi-column format.
Syntax
numeric_histogram_u(bucket, x)Parameters
Parameter | Description |
bucket | The number of columns in the histogram. The value must be of the bigint type. |
x | The value must be of the double type. |
Return value type
double
Examples
Calculate an approximate histogram of request durations for POST requests.
Query statement
request_method:POST | select numeric_histogram_u(10,request_time)Query and analytic results

approx_most_frequent function
Calculates the approximate frequencies of the k most frequent values in column x.
Syntax
approx_most_frequent(k, x)Parameters
Parameter | Description |
k | The number of most frequent values to return. For example, a value of 5 indicates that the function returns the approximate frequencies of the top 5 most frequent values. |
x | The value must be of the varchar type. |
Return value type
map(varchar, bigint)
Example
Retrieve the three most frequent values in the content field.
Sample data
content: 'A' 'B' 'A' 'C' 'A' 'B' 'C' 'D' 'E'Query statement
select approx_most_frequent(3, content)Output
