Syntax and examples of approximate functions - Simple Log Service

This topic describes the syntax of approximate functions. This topic also provides examples on how to use the functions.

The following table describes the approximate functions that are supported by Simple Log Service.

Important If you want to use strings in analytic statements, you must enclose strings in single quotation marks (''). Strings that are not enclosed or enclosed in double quotation marks ("") indicate field names or column names. For example, 'status' indicates the status string, and status or "status" indicates the status log field.

Function	Syntax	Description	Supported in SQL	Supported in SPL
approx_distinct function	approx_distinct(x)	Estimates the number of distinct values in x. The default standard error is 2.3%.	√	×
approx_distinct function	approx_distinct(x, e)	Estimates the number of distinct values in x. You can specify a custom standard error.	√	×
approx_percentile function	approx_percentile(x, percentage)	Sorts the values of x in ascending order, and returns the value of x that is approximately at the percentage position.	√	×
	approx_percentile(x, array[percentage01, percentage02...])	Sorts the values of x in ascending order, and returns the values of x that are approximately at the percentage01 and percentage02 positions.	√	×
	approx_percentile(x, weight, percentage)	Sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that is approximately at the percentage position.	√	×
	approx_percentile(x, weight, array[percentage01, percentage02...])	Sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that are approximately at the percentage01 and percentage02 positions.	√	×
	approx_percentile(x, weight, percentage, accuracy)	Sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that is approximately at the percentage position. You can specify the accuracy of the return value.	√	×
numeric_histogram function	numeric_histogram(bucket, x)	Returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. The return value is of the JSON type.	√	×
numeric_histogram function	numeric_histogram(bucket, x, weight)	Returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. The return value is of the JSON type. You can specify weights for the values of x.	√	×
numeric_histogram_u function	numeric_histogram_u(bucket, x)	Returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. The return value contains multiple rows and columns.	√	×

approx_distinct function

The approx_distinct function estimates the number of distinct values in x.

Syntax

An approx_distinct function of the following syntax estimates the number of distinct values in x. The default standard error is 2.3%.
```
approx_distinct(x)
```
An approx_distinct function of the following syntax estimates the number of distinct values in x. You can specify a custom standard error.
```
approx_distinct(x, e)
```

Parameters

Parameter	Description
x	The value of this parameter is of an arbitrary data type.
e	The custom standard error. Valid values: 0.0115 to 0.26.

Return value type

The bigint type.

Examples

Example 1: Use the count function to calculate the number of page views (PVs). Then, use the approx_distinct function to estimate the unique values of the client_ip field as the number of unique visitors (UVs). The standard error is 2.3%.
- Query statement
```
* |SELECT count(*) AS PV, approx_distinct(client_ip) AS UV
```
- Query and analysis results
Example 2: Use the count function to calculate the number of PVs. Then, use the approx_distinct function to estimate the unique values of the client_ip field as the number of UVs. The standard error is 10%.
- Query statement
```
* |SELECT count(*) AS PV, approx_distinct(client_ip,0.1) AS UV
```
- Query and analysis results

approx_percentile function

The approx_percentile function sorts the values of x in ascending order, and returns the value of x that are approximately at the percentage position.

Syntax

An approx_percentile function of the following syntax sorts the values of x in ascending order, and returns the value of x that is approximately at the percentage position. The return value is of the double type.
```
approx_percentile(x, percentage)
```
An approx_percentile function of the following syntax sorts the values of x in ascending order, and returns the values of x that are approximately at the percentage01 and percentage02 positions. The return value is of the array(double,double) type.
```
approx_percentile(x, array[percentage01, percentage02...])
```
An approx_percentile function of the following syntax sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that is approximately at the percentage position. The return value is of the double type.
```
approx_percentile(x, weight, percentage)
```
An approx_percentile function of the following syntax sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that are approximately at the percentage01 and percentage02 positions. The return value is of the array(double,double) type.
```
approx_percentile(x, weight, array[percentage01, percentage02...])
```
An approx_percentile function of the following syntax sorts the values of x in ascending order based on the products of the values of x and the weight, and returns the value of x that is approximately at the percentage position. The return value is of the double type. You can specify the accuracy of the return value.
```
approx_percentile(x, weight, percentage, accuracy)
```

Parameters

Parameter	Description
x	The value of this parameter is of the double type.
percentage	The percentage value. Value range: [0, 1].
accuracy	The accuracy. Value range: (0, 1).
weight	The weight. A weight must be an integer that is greater than 1. After you specify a weight, the system sorts the values of x in ascending order based on the products of the values of x and the weight.

Return value type

The double or array(double,double) type.

Examples

Example 1: Sort the values of the request_time column in ascending order, and then return the value of the request_time field that is approximately at the 50% position.
- Query statement
```
*| SELECT approx_percentile(request_time,0.5)
```
- Query and analysis results
Example 2: Sort the values of the request_time column in ascending order, and then return the values of the request_time field that are approximately at the 10%, 20%, and 70% positions.
- Query statement
```
*| SELECT approx_percentile(request_time,array[0.1,0.2,0.7])
```
- Query and analysis results
Example 3: Sort the values of the request_time column in ascending order based on the products of the values of the request_time column and the weight, and return the value of the requets_time field that is approximately at the 50% position. If the value of request_time is less than 20, the weight is 100. Otherwise, the weight is 10.
- Query statement
```
* |
SELECT
  approx_percentile(
    request_time,case
      when request_time < 20 then 100
      else 10
    end,
    0.5
  )
```
- Query and analysis results
Example 4: Sort the values of the request_time column based on the products of the values of the request_time column and the weight, and return the values of the request_time field that are approximately at the 80% and 90% positions. If the value of request_time is less than 20, the weight is 100. Otherwise, the weight is 10.
- Query statement
```
* |
SELECT
  approx_percentile(
    request_time,case
      when request_time < 20 then 100
      else 10
    end,
    array [0.8,0.9]
  )
```
- Query and analysis results
Example 5: Sort the values of the request_time column based on the products of the values of the request_time and the weight, and return the value of the request_time field that is approximately at the 50% position. The accuracy is 0.2. If the value of request_time is less than 20, the weight is 100. Otherwise, the weight is 10.
- Query statement
```
* |
SELECT
  approx_percentile(
    request_time,case
      when request_time < 20 then 100
      else 10
    end,
    0.5,
    0.2
  )
```
- Query and analysis results

numeric_histogram function

The numeric_histogram function returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. The return value is of the JSON type.

Syntax

A numeric_histogram function of the following syntax returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter.
```
numeric_histogram(bucket, x)
```
A numeric_histogram function of the following syntax returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. You can specify weights for the values of x.
```
numeric_histogram(bucket, x, weight)
```

Parameters

Parameter	Description
bucket	The number of columns in the histogram. The value of this parameter is of the bigint type.
x	The value of this parameter is of the double type.
weight	The weight. A weight must be an integer that is greater than 0. After you specify a weight, the system groups the values of x based on the products of the values of x and the weight.

Return value type

The JSON type.

Examples

Example 1: Return the approximate histogram of the request duration for the POST method.
- Query statement
```
request_method:POST | SELECT numeric_histogram(10,request_time)
```
- Query and analysis results
Example 2: Group the values of the request_time field based on the products of the values of the values of the request_time field and the weight, and return the approximate histogram of the request duration for the POST method.
- Query statement
```
request_method:POST| SELECT numeric_histogram(10, request_time,case when request_time<20 then 100 else 10 end)
```
- Query and analysis results

numeric_histogram_u function

The numeric_histogram_u function returns the approximate histogram of x based on the number of histogram columns that are specified by the bucket parameter. A table that contains multiple rows and columns is returned.

Syntax

numeric_histogram_u(bucket, x)

Parameters

Parameter	Description
bucket	The number of columns in the histogram. The value of this parameter is of the bigint type.
x	The value of this parameter is of the double type.

Return value type

The double type.

Examples

Return the approximate histogram of the request duration for the POST method.

Query statement

request_method:POST | select numeric_histogram_u(10,request_time)

Query and analysis results