All Products
Search
Document Center

Hologres:Hologres general-purpose aggregate functions

Last Updated:Oct 18, 2025

Hologres is compatible with PostgreSQL and supports the standard PostgreSQL syntax for data development. This topic describes the general-purpose aggregate functions supported by Hologres.

Hologres supports a subset of the functions available in PostgreSQL. For more information about how to use these functions, see Aggregate Functions. The following table lists the general-purpose aggregate functions that Hologres supports.

Function

Description

APPROX_COUNT_DISTINCT

Calculates the number of rows that have distinct values in a column. The result of this function is an approximate value.

APPROX_PERCENTILE

Calculates the approximate percentile of a numeric column.

Note

Hologres supports APPROX_PERCENTILE from V3.1 and later.

ARRAY_AGG

Aggregates the values of an expression into an array.

AVG

Calculates the average of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.

BIT_AND

Performs bitwise AND operations on the non-NULL values returned by a BIGINT or an INT expression.

BIT_OR

Performs bitwise OR operations on the non-NULL values returned by a BIGINT or an INT expression.

BOOL_AND

Checks whether all values returned by a BOOLEAN expression are TRUE.

BOOL_OR

Checks whether any value returned by a BOOLEAN expression is TRUE.

CORR

Calculates the correlation coefficient between two variables.

COUNT

Gets the number of rows in a specified table or the number of input rows from a BIGINT or NUMERIC expression.

COVAR_POP

Calculates the population covariance.

COVAR_SAMP

Calculates the sample covariance.

EVERY

Checks whether all values of a BOOLEAN expression are TRUE.

MAX

Gets the maximum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

MAX_BY

Compares the values in a column (column y) to find the maximum value, and returns the value of another specified column (column x) from the same row.

MIN

Gets the minimum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

MIN_BY

Compares the values in a column (column y) to find the minimum value, and returns the value of another specified column (column x) from the same row.

REGR_AVGX

Calculates the average of an independent variable (X).

REGR_AVGY

Calculates the average of a dependent variable (Y).

REGR_COUNT

Calculates the number of rows where neither input parameter is NULL.

REGR_R2

Calculates the square of the correlation coefficient.

REGR_INTERCEPT

Calculates the y-intercept of the least-squares-fit linear equation determined by the (Y, X) pairs.

REGR_SLOPE

Calculates the slope of the least-squares-fit linear equation.

REGR_SXX

Calculates the sum of squares of the independent variable (X), as sum(X^2) - sum(X)^2/N.

REGR_SXY

Calculates the sum of products of the independent variable (X) and dependent variable (Y), as sum(X*Y) - sum(X) * sum(Y)/N.

SUM

Gets the sum of all values from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

STDDEV

Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STDDEV_POP

Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STDDEV_SAMP

Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

STRING_AGG

Concatenates the non-NULL values returned by a TEXT expression into a string using a specified separator.

UNIQ

Calculates the number of rows that have distinct values in a column.

VARIANCE

Calculates the sample variance of the values returned by an INT or NUMERIC expression.

VAR_POP

Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.

VAR_SAMP

Calculates the sample variance of the values returned by an INT, FLOAT8, or NUMERIC expression.

Sample data

-- Create a table.
CREATE TABLE example_table(
    c1 INT,           
    c2 BOOLEAN,      
    c3 VARCHAR(10),  
    c4 DECIMAL(3, 1),
    c5 TEXT,
    c6 FLOAT8,
    c7 FLOAT8
);

-- Insert data.
INSERT INTO example_table (c1, c2, c3, c4, c5, c6, C7) VALUES
(1, true, 'a', 1.1, 3, 6, 7),
(2, false, 'b', 2.2, 4, 6, 2.6);

APPROX_COUNT_DISTINCT

  • Description: Calculates the number of rows that have distinct values in a column. This function returns a single, approximate value.

    APPROX_COUNT_DISTINCT ( <column> )
  • Parameters

    column: Required. Specifies the column for which you want to calculate the approximate number of distinct rows.

  • Usage notes

    The APPROX_COUNT_DISTINCT function uses HyperLogLog cardinality estimation to perform an inexact COUNT DISTINCT calculation. An inexact COUNT DISTINCT calculation improves query performance, especially when the column has many discrete values. The average error rate is typically between 0.1% and 1%. This function is suitable for performance-sensitive scenarios where some degree of error is acceptable. You can adjust the error rate using the following parameter.

    SET hg_experimental_approx_count_distinct_precision = 20;
    • The value ranges from 12 to 20. The default value is 17.

    • The precision parameter specifies the number of bits for bucketing in the HyperLogLog algorithm. A larger value indicates more buckets and higher theoretical precision.

    • A higher precision value also increases the computation time and memory overhead. However, the overhead is much lower than that of an exact COUNT DISTINCT ( column ) statement. Therefore, we recommend that you use APPROX_COUNT_DISTINCT instead of COUNT DISTINCT ( column ).

    • When the precision parameter is set to a value greater than 17, Hologres uses the HyperLogLog++ algorithm. This algorithm corrects the return value to further reduce and stabilize the error.

      For example, if you set hg_experimental_approx_count_distinct_precision to 20, the error rate can be reduced to a range of 0.01% to 0.2% in most cases.

    Note

    You can also perform an exact COUNT DISTINCT calculation using the COUNT DISTINCT ( column ) function. This function consumes more resources.

  • Example

    The following example uses the ORDERS table from the TPC-H public dataset provided by MaxCompute. You can run the following SQL statements directly without any additional preparation.

    -- Create a foreign table.
    IMPORT FOREIGN SCHEMA public_data
    LIMIT TO (ORDERS_10g)
    FROM SERVER odps_server INTO public options (if_table_exist 'update');
    
    -- Create an internal table.
    BEGIN;
    CREATE TABLE ORDERS (
        O_ORDERKEY bigint NOT NULL PRIMARY KEY,
        O_CUSTKEY int NOT NULL,
        O_ORDERSTATUS text NOT NULL,
        O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
        O_ORDERDATE timestamptz NOT NULL,
        O_ORDERPRIORITY text NOT NULL,
        O_CLERK text NOT NULL,
        O_SHIPPRIORITY int NOT NULL,
        O_COMMENT text NOT NULL
    );
    
    CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
    CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
    CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
    CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
    
    COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
    COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
    COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
    COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
    COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
    COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
    COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
    COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
    COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';
    
    COMMIT;
    
    -- Import data to the internal table.
    INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
    • Calculate the approximate number of distinct rows in the O_CUSTKEY column.

      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned.

      approx_count_distinct
      -----------------------
                     1000422
    • Set the global precision to 20 and calculate the approximate number of distinct rows in the O_CUSTKEY column.

      ALTER DATABASE dbname SET hg_experimental_approx_count_distinct_precision = 20;
      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned.

      approx_count_distinct
      -----------------------
                     1000422
    • Set the session-level precision to 20 and calculate the approximate number of distinct rows in the O_CUSTKEY column.

      --Set the session-level precision to 20.
      SET hg_experimental_approx_count_distinct_precision = 20;
      SELECT APPROX_COUNT_DISTINCT ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned.

       approx_count_distinct
      -----------------------
                      998854

APPROX_PERCENTILE

Note

Hologres supports APPROX_PERCENTILE from V3.1 and later.

  • Description: Calculates the approximate percentile of a numeric column. This function returns a single, approximate value.

    APPROX_PERCENTILE (fraction) WITHIN GROUP (ORDER BY sort_expression)
  • Parameters

    • fraction: A constant between 0 and 1. For example, 0.5 represents the median.

    • sort_expression: Specifies the column for which to calculate the percentile. This column must be sortable.

  • Usage notes

    The APPROX_PERCENTILE function uses a reservoir sampling algorithm. It samples a maximum of 8,192 elements and then calculates the percentile from these samples.

  • Example

    The following example calculates the approximate count of distinct values in the O_CUSTKEY column.

    SELECT
      APPROX_PERCENTILE(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
      APPROX_PERCENTILE(0.5)  WITHIN GROUP (ORDER BY salary) AS median,
      APPROX_PERCENTILE(0.75) WITHIN GROUP (ORDER BY salary) AS q3
    FROM employees;

ARRAY_AGG

  • Description: Concatenates the values of an expression into an array.

    ARRAY_AGG(expression)
  • Parameters

    expression: Required. An expression of any data type except JSON, JSONB, TIMETZ, INTERVAL, INET, OID, UUID, and array types.

  • Return value

    Returns a value of the ARRAY type.

  • Example

    SELECT ARRAY_AGG(c1) FROM example_table;

    The following result is returned.

     array_agg
    -----------
     {2,1}

AVG

  • Description: Calculates the average of the non-NULL values returned by a BIGINT, FLOAT8, FLOAT4, or INT expression.

    AVG(expression)
  • Parameters

    expression: Required. A BIGINT, FLOAT8, FLOAT4, or INT expression.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT AVG(c7) FROM example_table;

    The following result is returned.

       avg
    ----------
     1.500000

BIT_AND

  • Description: Performs bitwise AND operations on the non-NULL values returned by a BIGINT or INT expression.

    BIT_AND(expression)
  • Parameters

    expression: Required. A BIGINT or INT expression.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT BIT_AND(c1) FROM example_table;

    The result is as follows.

     bit_and
    ---------
           0

BIT_OR

  • Description: Performs bitwise OR operations on the non-NULL values returned by a BIGINT or INT expression.

    BIT_OR(expression)
  • Parameters

    expression: Required. A BIGINT or INT expression.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT BIT_OR(c1) FROM example_table;

    The following result is returned.

     bit_or
    --------
          3

BOOL_AND

  • Description: Checks whether all values returned by a BOOLEAN expression are TRUE.

    BOOL_AND(bool)
  • Parameters

    bool: Required. A BOOLEAN expression.

  • Return value

    Returns a value of the BOOLEAN type. This function returns TRUE (t) if all values returned by the expression are true, and FALSE (f) otherwise.

  • Example

    SELECT BOOL_AND(c2) FROM example_table;

    The following result is returned.

     bool_and
    ----------
     f

BOOL_OR

  • Description: Checks whether any value returned by a BOOLEAN expression is TRUE.

    BOOL_OR(bool)
  • Parameters

    bool: Required. A BOOLEAN expression.

  • Return value

    Returns a value of the BOOLEAN type. This function returns TRUE (t) if any value returned by the expression is true, and FALSE (f) otherwise.

  • Example

    SELECT BOOL_OR(c2) FROM example_table;

    The following result is returned.

     bool_or
    ---------
     t

CORR

  • Description: Calculates the correlation coefficient between two variables.

    CORR(Y, X)
    Note

    The parameters must be of the DOUBLE PRECISION, FLOAT, or FLOAT8 data type.

  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT CORR(c6, c7) FROM example_table;

    The following result is returned.

     corr
    ------
    

COUNT

  • Description: Retrieves the number of rows in a specified table or the number of input rows from a BIGINT or NUMERIC expression.

    • Retrieves the number of rows in a specified table.

      COUNT(*)
    • Retrieves the number of input rows from a BIGINT or NUMERIC expression.

      COUNT(expression)

      expression: Required. A non-NULL expression of the BIGINT or NUMERIC data type.

  • Return value

    Returns a value of the BIGINT type.

  • Example

    SELECT COUNT(*) FROM example_table;

    The following result is returned.

     count
    -------
         2

COVAR_POP

  • Description: Calculates the population covariance.

    COVAR_POP(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT COVAR_POP(c6, c7) FROM example_table;

    The following result is returned.

          covar_pop
    ---------------------
     3.5527136788005e-15

COVAR_SAMP

  • Description: Calculates the sample covariance.

    COVAR_SAMP(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT COVAR_SAMP(c6, c7) FROM example_table;

    The following result is returned.

         covar_samp
    --------------------
     7.105427357601e-15

EVERY

  • Description: Checks whether all values of a BOOLEAN expression are TRUE.

    EVERY(bool)
  • Parameters

    bool: Required. A BOOLEAN expression.

  • Return value

    Returns a value of the BOOLEAN type. This function returns TRUE (t) if all values of the expression are true, and FALSE (f) otherwise.

  • Example

    SELECT EVERY(c2) FROM example_table;

    The following result is returned.

     every
    -------
     f

MAX

  • Description: Retrieves the maximum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    MAX(expression)
  • Parameters

    expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT MAX(c1) FROM example_table;

    The following result is returned.

     max
    -----
       2

MAX_BY

  • Description: Compares the values in a column (column y) to find the maximum value and returns the value of another specified column (column x) from the corresponding row.

    MAX_BY(x, y);
    Note

    Numeric types are compared based on their numeric value. Non-numeric types are compared based on their dictionary sort order.

  • Parameters

    • y: Required. Specifies the column from which to retrieve the maximum value. Rows where column y is NULL are not included in the calculation.

    • x: Required. Specifies the column from which to retrieve a value that corresponds to the maximum value in column y.

  • Usage notes

    The MAX_BY function is supported only in Hologres V1.3.36 and later.

    Note

    If your instance version is earlier than V1.3.36, you can join the Hologres DingTalk group to request an instance upgrade. For more information, see How to obtain more online support?

  • Return value

    • The function returns the value from column x that corresponds to the maximum value in column y.

    • If all values in column y are NULL, the function returns NULL.

    • If the MAX_BY function finds multiple rows with the maximum value in column y, it returns the maximum value from the corresponding values in column x.

  • Example

    --Sample data
    DROP TABLE IF EXISTS test;
    CREATE TABLE IF NOT EXISTS test (
        id INT,
        name TEXT,
        cost INT
    );
    INSERT INTO test
        VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

    Query for the value in the name column that corresponds to the maximum value in the cost column. In this example, multiple maximum values exist.

    SELECT max_by(name, cost) FROM test;

    The following result is returned.

     max_by
    --------
     aaa

MIN

  • Description: Retrieves the minimum value from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    MIN(expression)
  • Parameters

    expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC type.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT MIN(c1) FROM example_table;

    The following result is returned.

     min
    -----
       1

MIN_BY

  • Description: Compares the values in a column (column y) to find the minimum value and returns the value of another specified column (column x) from the corresponding row.

    MIN_BY(x, y);
    Note

    Numeric types are compared based on their numeric value. Non-numeric types are compared based on their dictionary sort order.

  • Parameters

    • y: Required. Specifies the column from which to retrieve the minimum value. Rows where column y is NULL are not included in the calculation.

    • x: Required. Specifies the column from which to retrieve a value that corresponds to the minimum value in column y.

  • Usage notes

    The MAX_BY and MIN_BY functions are supported only in Hologres V1.3.36 and later.

    Note

    If your instance version is earlier than V1.3.36, you can join the Hologres DingTalk group to request an instance upgrade. For more information, see How to obtain more online support?

  • Return value

    • The function returns the value from column x that corresponds to the minimum value in column y.

    • If all values in column y are NULL, the function returns NULL.

    • If the MIN_BY function finds multiple rows with the minimum value in column y, it returns the minimum value from the corresponding values in column x.

  • Example

    --Sample data
    DROP TABLE IF EXISTS test;
    CREATE TABLE IF NOT EXISTS test (
        id INT,
        name TEXT,
        cost INT
    );
    INSERT INTO test
        VALUES (1, 'a', 100), (1, 'aa', 200), (1, 'aaa', 300), (2, 'b', 150), (2, 'bb', 300), (3, 'c', 150), (3, 'cc', 50);

    Query for the value in the name column that corresponds to the minimum value in the cost column.

    SELECT MIN_BY(name, cost) FROM test;

    The following result is returned.

    min_by
    --------
     cc
    (1 row)

REGR_AVGX

  • Description: Calculates the average of an independent variable (X).

    REGR_AVGX(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_AVGX(c6, c7) FROM example_table;

    The following result is returned.

     regr_avgx
    -----------
           4.8

REGR_AVGY

  • Description: Calculates the average of a dependent variable (Y).

    REGR_AVGY(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_AVGY(c6, c7) FROM example_table;

    The following result is returned.

     regr_avgy
    -----------
             6

REGR_COUNT

  • Description: Calculates the number of rows where neither input parameter is NULL.

    REGR_COUNT(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    Returns a value of the BIGINT type.

  • Example

    SELECT REGR_COUNT(c6, c7) FROM example_table;

    The following result is returned.

     regr_count
    ------------
              2

REGR_R2

  • Description: Calculates the square of the correlation coefficient.

    REGR_R2(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_R2(c6, c7) FROM example_table;

    The following result is returned.

     regr_r2
    ---------
           1

REGR_INTERCEPT

  • Description: Calculates the y-intercept of the least-squares-fit linear equation determined by the (Y, X) pairs.

    REGR_INTERCEPT(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_INTERCEPT(c6, c7) FROM example_table;

    The following result is returned.

     regr_intercept
    ----------------
                  6

REGR_SLOPE

  • Description: Calculates the slope of the least-squares-fit linear equation.

    REGR_SLOPE(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_SLOPE(c6, c7) FROM example_table;

    The following result is returned.

          regr_slope
    ----------------------
     7.34031751818285e-16

REGR_SXX

  • Description: Calculates the sum of squares of the independent variable (X), as sum(X^2) - sum(X)^2/N.

    REGR_SXX(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_SXX(c6, c7) FROM example_table;

    The following result is returned.

     regr_sxx
    ----------
         9.68

REGR_SXY

  • Description: Calculates the sum of products of the independent variable (X) and dependent variable (Y), as sum(X × Y) - sum(X) × sum(Y)/N.

    REGR_SXY(Y, X)
  • Parameters

    Y and X: Required. DOUBLE PRECISION, FLOAT, or FLOAT8 expressions.

  • Return value

    The return value is the same data type as the parameters.

  • Example

    SELECT REGR_SXY(c6, c7) FROM example_table;

    The following result is returned.

          regr_sxy
    --------------------
     7.105427357601e-15

SUM

  • Description: Calculates the sum of all values from a BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

    SUM(expression)
  • Parameters

    expression: Required. A BIGINT, FLOAT8, FLOAT4, INT, or NUMERIC expression.

  • Return value

    The return value is the same data type as the parameter.

  • Example

    SELECT SUM(c1) FROM example_table;

    The following result is returned.

     sum
    -----
       3

STDDEV

  • Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV(expression)
  • Parameters

    expression: Required. An INT, NUMERIC, or FLOAT8 expression.

  • Return value

    Returns a value of the NUMERIC or FLOAT8 type.

  • Example

    SELECT STDDEV(c1) FROM example_table;

    The following result is returned.

        stddev
    --------------
     0.7071067811

STDDEV_POP

  • Description: Calculates the population standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV_POP(expression)	
  • Parameters

    expression: Required. An INT, NUMERIC, or FLOAT8 expression.

  • Return value

    Returns a value of the NUMERIC or FLOAT8 type.

  • Example

    SELECT STDDEV_POP(c1) FROM example_table;

    The following result is returned.

      stddev_pop
    --------------
     0.5000000000

STDDEV_SAMP

  • Description: Calculates the sample standard deviation of the values returned by an INT, NUMERIC, or FLOAT8 expression.

    STDDEV_SAMP(expression)	
  • Parameters

    expression: Required. An INT, NUMERIC, or FLOAT8 expression.

  • Return value

    Returns a value of the NUMERIC or FLOAT8 type.

  • Example

    SELECT STDDEV_SAMP(c1) FROM example_table;

    The following result is returned.

     stddev_samp
    --------------
     0.7071067812

STRING_AGG

  • Description: Concatenates the non-NULL values returned by a TEXT expression into a string using a specified separator.

    STRING_AGG(<expression> TEXT, <delimiter> TEXT)
  • Parameters

    • expression: Required. The source string.

    • delimiter: Required. The separator.

  • Return value

    Returns a value of the TEXT type.

  • Example

    SELECT STRING_AGG(c5, '-') FROM example_table;

    The following result is returned.

     string_agg
    ------------
     3-4

UNIQ

  • Description: Calculates the number of rows that have distinct values in a column.

    UNIQ ( < column > );
  • Parameters

    column: Required. Specifies the column for which you want to calculate the number of rows that have distinct values. This column can be of the following data types: SMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISION, TEXT, VARCHAR, TIMESTAMP, TIMESTAMPTZ, DATE, TIMETZ, and UUID.

  • Usage notes

    • The UNIQ function is supported only in Hologres V1.3 and later.

      Note

      If your instance version is earlier than V1.3, see Common errors when preparing for an upgrade or join the Hologres DingTalk group to provide feedback. For more information, see How to obtain more online support?.

    • The UNIQ function performs better than the COUNT DISTINCT function only when the SQL statement contains a GROUP BY clause and the values of the GROUP BY field are evenly distributed without data skew.

    • Typically, UNIQ performs better than COUNT DISTINCT when the GROUP BY key has high cardinality. UNIQ also consumes less memory than COUNT DISTINCT. If you encounter an out-of-memory error when you use COUNT DISTINCT, you can use UNIQ as an alternative.

    • Since Hologres V2.1, the performance of COUNT DISTINCT has been significantly optimized for various scenarios, including those with single or multiple COUNT DISTINCT functions, data skew, or SQL statements without a GROUP BY clause. You no longer need to manually rewrite queries to use UNIQ for better performance. For more information, see Count Distinct optimization.

  • Example

    The following example uses the ORDERS table from the TPC-H public dataset provided by MaxCompute. You can run the following SQL statements directly without any additional preparation.

    -- Create a foreign table.
    IMPORT FOREIGN SCHEMA public_data
    LIMIT TO (ORDERS_10g)
    FROM SERVER odps_server INTO public options (if_table_exist 'update');
    
    -- Create an internal table.
    BEGIN;
    CREATE TABLE ORDERS (
        O_ORDERKEY bigint NOT NULL PRIMARY KEY,
        O_CUSTKEY int NOT NULL,
        O_ORDERSTATUS text NOT NULL,
        O_TOTALPRICE DECIMAL(12, 2) NOT NULL,
        O_ORDERDATE timestamptz NOT NULL,
        O_ORDERPRIORITY text NOT NULL,
        O_CLERK text NOT NULL,
        O_SHIPPRIORITY int NOT NULL,
        O_COMMENT text NOT NULL
    );
    
    CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
    CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
    CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
    CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
    
    COMMENT ON COLUMN ORDERS.O_ORDERKEY IS 'Order ID';
    COMMENT ON COLUMN ORDERS.O_CUSTKEY IS 'Customer ID';
    COMMENT ON COLUMN ORDERS.O_ORDERSTATUS IS 'Order status';
    COMMENT ON COLUMN ORDERS.O_TOTALPRICE IS 'Total price';
    COMMENT ON COLUMN ORDERS.O_ORDERDATE IS 'Order date';
    COMMENT ON COLUMN ORDERS.O_ORDERPRIORITY IS 'Order priority';
    COMMENT ON COLUMN ORDERS.O_CLERK IS 'Cashier';
    COMMENT ON COLUMN ORDERS.O_SHIPPRIORITY IS 'Shipping priority';
    COMMENT ON COLUMN ORDERS.O_COMMENT IS 'Comment';
    
    COMMIT;
    
    -- Import data to the internal table.
    INSERT INTO ORDERS SELECT * FROM ORDERS_10g;
    • Calculate the number of distinct rows in the O_CUSTKEY column of the ORDERS table.

      SELECT UNIQ ( O_CUSTKEY ) FROM ORDERS;

      The following result is returned.

        uniq
      --------
       999982
    • Calculate the number of distinct O_CUSTKEY values for each O_ORDERSTATUS in the ORDERS table.

      SELECT O_ORDERSTATUS, UNIQ ( O_CUSTKEY ) FROM ORDERS GROUP BY O_ORDERSTATUS;

      The following result is returned.

       o_orderstatus |  uniq
      ---------------+--------
       P             | 313478
       F             | 996258
       O             | 996400

VARIANCE

  • Description: Calculates the sample variance of the values returned by an INT or NUMERIC expression.

    VARIANCE(expression)
  • Parameters

    expression: Required. An INT or NUMERIC expression.

  • Return value

    Returns a value of the NUMERIC type.

  • Example

    SELECT VARIANCE(c1) FROM example_table;

    The following result is returned.

       variance
    --------------
     0.5000000000

VAR_POP

  • Description: Calculates the population variance of the values returned by a FLOAT8, INT, or NUMERIC expression.

    VAR_POP(expression)
  • Parameters

    expression: Required. A FLOAT8, INT, or NUMERIC expression.

  • Return value

    Returns a value of the NUMERIC or FLOAT8 type.

  • Example

    SELECT VAR_POP(c1) FROM example_table;

    The following result is returned.

       var_pop
    --------------
     0.2500000000

VAR_SAMP

  • Description: Calculates the sample variance of the values returned by an INT, FLOAT8, or NUMERIC expression.

    VAR_SAMP(expression)
  • Parameters

    expression: Required. An INT, FLOAT8, or NUMERIC expression.

  • Return value

    Returns a value of the NUMERIC or FLOAT8 type.

  • Example

    SELECT VAR_SAMP(c1) FROM example_table;

    The following result is returned.

       var_samp
    --------------
     0.5000000000