All Products
Search
Document Center

Simple Log Service:Approximate functions

Last Updated:Aug 09, 2025

This topic describes the basic syntax and provides examples of approximate functions.

Simple Log Service supports the following approximate functions.

Important If you want to use strings in analytic statements, you must enclose strings in single quotation marks (''). Strings that are not enclosed or enclosed in double quotation marks ("") indicate field names or column names. For example, 'status' indicates the status string, and status or "status" indicates the status log field.

Function name

Syntax

Description

SQL support

SPL support

approx_distinct function

approx_distinct(x)

Calculates the approximate number of distinct values in x. The default standard error is 2.3%.

×

approx_distinct(x, e)

Calculates the approximate number of distinct values in x and lets you specify a custom standard error.

×

approx_percentile function

approx_percentile(x, percentage)

Sorts x in ascending order and returns the approximate value of x at the specified percentage. The result is an approximation, and its stability and consistency are not guaranteed.

×

approx_percentile(x, array[percentage01, percentage02...])

Sorts x in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The result is an approximation, and its stability and consistency are not guaranteed.

×

approx_percentile(x, weight, percentage)

Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The result is an approximation, and its stability and consistency are not guaranteed.

×

approx_percentile(x, weight, array[percentage01, percentage02...])

Sorts the product of x and its weight in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The result is an approximation, and its stability and consistency are not guaranteed.

×

approx_percentile(x, weight, percentage, accuracy)

Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. Lets you set the accuracy of the return value. The result is an approximation, and its stability and consistency are not guaranteed.

×

numeric_histogram function

numeric_histogram(bucket, x)

Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in JSON format.

×

numeric_histogram(bucket, x, weight)

Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in JSON format. Lets you set a weight for x.

×

numeric_histogram_u function

numeric_histogram_u(bucket, x)

Calculates an approximate histogram for x based on the number of buckets (histogram columns). The result is returned in a multi-row, multi-column format.

×

approx_most_frequent function

approx_most_frequent(k, x)

Calculates the approximate frequencies of the k most frequent values in column x. The result is returned as a MAP type.

×

approx_distinct function

The approx_distinct function calculates the approximate number of distinct values in x.

Syntax

  • Calculates the approximate number of distinct values in x. The default standard error is 2.3%.

    approx_distinct(x)
  • Calculates the approximate number of distinct values in x and lets you specify a custom standard error.

    approx_distinct(x, e)

Parameters

Parameter

Description

x

The value can be of any data type.

e

The custom standard error. The value must be in the range of [0.0115, 0.26].

Return value type

bigint

Examples

  • Example 1: Use the count function to calculate page views (PVs) and the approx_distinct function to calculate the approximate number of unique visitors (UVs) based on the distinct values of the client_ip field. The standard error is 2.3%.

    • Query statement

      * |SELECT count(*) AS PV, approx_distinct(client_ip) AS UV
    • Query and analytic resultsapprox_distinct

  • Example 2: Use the count function to calculate PVs and the approx_distinct function to calculate the approximate number of UVs based on the distinct values of the client_ip field. The custom standard error is 10%.

    • Query statement

      * |SELECT count(*) AS PV, approx_distinct(client_ip,0.1) AS UV
    • Query and analytic resultsapprox_distinct

approx_percentile function

The approx_percentile function sorts x in ascending order and returns the approximate value at the specified percentage. Because the result is an approximation, its stability and consistency are not guaranteed.

Syntax

  • Sorts x in ascending order and returns the approximate value of x at the specified percentage. The return value is a double.

    approx_percentile(x, percentage)
  • Sorts x in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The return value is of the array(double,double) type.

    approx_percentile(x, array[percentage01, percentage02...])
  • Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The return value is a double.

    approx_percentile(x, weight, percentage)
  • Sorts the product of x and its weight in ascending order and returns the approximate values of x at the specified percentages, such as percentage01 and percentage02. The return value is of the array(double,double) type.

    approx_percentile(x, weight, array[percentage01, percentage02...])
  • Sorts the product of x and its weight in ascending order and returns the approximate value of x at the specified percentage. The return value is a double. You can also set the accuracy of the return value.

    approx_percentile(x, weight, percentage, accuracy)

Parameters

Parameter

Description

x

The value must be of the double type.

percentage

The percentage value. The value must be in the range of [0, 1].

accuracy

The accuracy. The value must be in the range of (0, 1).

weight

The weight. The value must be an integer greater than 1.

If you set a weight, the system sorts the data based on the product of x and the weight.

Return value type

double or array(double,double)

Examples

  • Example 1: Sorts the request_time column and returns the approximate value of the request_time field at the 50th percentile.

    • Query statement

      *| SELECT approx_percentile(request_time,0.5)
    • Query and analytic resultsapprox_percentile

  • Example 2: Sort the request_time column and return the request_time values at the 10th, 20th, and 70th percentiles.

    • Query statement

      *| SELECT approx_percentile(request_time,array[0.1,0.2,0.7])
    • Query and analytic resultsapprox_percentile

  • Example 3: This example sorts the request_time column based on the product of the request_time value and its weight, and then returns the approximate 50th percentile value from the request_time field. The weight is 100 if the request_time value is less than 20, and 10 otherwise.

    • Query statement

      * |
      SELECT
        approx_percentile(
          request_time,case
            when request_time < 20 then 100
            else 10
          end,
          0.5
        )
    • Query and analytic resultsapprox_percentile

  • Example 4: Sort the request_time column by the product of request_time and its weight, and then return the approximate values of request_time at the 80th and 90th percentiles. The weight is 100 if request_time is less than 20. Otherwise, the weight is 10.

    • Query statement

      * |
      SELECT
        approx_percentile(
          request_time,case
            when request_time < 20 then 100
            else 10
          end,
          array [0.8,0.9]
        )
    • Query and analytic resultsapprox_percentile

  • Example 5: Sort the request_time column by the product of request_time and its weight, and then return the approximate value of the request_time field at the 50th percentile with an accuracy of 0.2, where the weight is 100 if request_time is less than 20 and 10 otherwise.

    • Query statement

      * |
      SELECT
        approx_percentile(
          request_time,case
            when request_time < 20 then 100
            else 10
          end,
          0.5,
          0.2
        )
    • Query and analytic resultsapprox_percentile

numeric_histogram function

The numeric_histogram function calculates an approximate histogram for x. The result is returned in JSON format.

Syntax

  • Calculates an approximate histogram for x based on a specified number of buckets.

    numeric_histogram(bucket, x)
  • Calculates an approximate histogram for x based on a specified number of buckets. You can also specify a weight for x.

    numeric_histogram(bucket, x, weight)

Parameters

Parameter

Description

bucket

The number of columns in the histogram. The value must be of the bigint type.

x

The value must be of the double type.

weight

The weight. The value must be an integer greater than 0.

If you set a weight, the system groups the data based on the product of x and the weight.

Return value type

JSON

Examples

  • Example 1: Calculate an approximate histogram of request durations for POST requests.

    • Query statement

      request_method:POST | SELECT numeric_histogram(10,request_time)
    • Query and analytic resultsnumeric_histogram

  • Example 2: Calculate a weighted, approximate histogram of request durations for POST requests. The weight is based on the request_time value.

    • Query statement

      request_method:POST| SELECT numeric_histogram(10, request_time,case when request_time<20 then 100 else 10 end)
    • Query and analytic resultsnumeric_histogram

numeric_histogram_u function

The numeric_histogram_u function calculates an approximate histogram for x. The result is returned in a multi-row, multi-column format.

Syntax

numeric_histogram_u(bucket, x)

Parameters

Parameter

Description

bucket

The number of columns in the histogram. The value must be of the bigint type.

x

The value must be of the double type.

Return value type

double

Examples

Calculate an approximate histogram of request durations for POST requests.

  • Query statement

    request_method:POST | select numeric_histogram_u(10,request_time)
  • Query and analytic resultsnumeric_histogram_u

approx_most_frequent function

Calculates the approximate frequencies of the k most frequent values in column x.

Syntax

approx_most_frequent(k, x)

Parameters

Parameter

Description

k

The number of most frequent values to return. For example, a value of 5 indicates that the function returns the approximate frequencies of the top 5 most frequent values.

x

The value must be of the varchar type.

Return value type

map(varchar, bigint)

Example

Retrieve the three most frequent values in the content field.

  • Sample data

    content: 
    'A'
    'B'
    'A'
    'C'
    'A'
    'B'
    'C'
    'D'
    'E'
  • Query statement

    select approx_most_frequent(3, content)
  • Output

    image