All Products
Search
Document Center

MaxCompute:STDDEV

Last Updated:Jul 26, 2023

Calculates the population standard deviation of all input values.

Limits

Before you use window functions, take note of the following limits:

  • Window functions are supported only in SELECT statements.

  • A window function cannot contain nested window functions or nested aggregate functions.

  • You cannot use window functions together with aggregate functions of the same level.

Usage notes

MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types that are supported in the MaxCompute V2.0 data type edition, you must run a SET command to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.

  • Session level: To enable the MaxCompute V2.0 data type edition, you must add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.

  • Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. To enable the MaxCompute V2.0 data type edition at the project level, run the following command:

    setproject odps.sql.type.system.odps2=true;

    For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.

If you use an SQL statement that includes multiple aggregate functions and resources in your project are insufficient, a memory overflow issue may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.

Syntax

-- Calculate the population standard deviation of all the values in a column.
double stddev(double <colname>)
decimal stddev(decimal <colname>)

-- Calculate the population standard deviation of expr in a window.
double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Parameters

  • colname: required. The name of a column, which can be of the DOUBLE or DECIMAL type. If the specified column is of the STRING or BIGINT type, the values in the column are implicitly converted into the DOUBLE type before calculation.

  • expr: required. The expression that is used to calculate the population standard deviation. The input values can be of the DOUBLE or DECIMAL type.

    • If an input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

    • If the value in a row is null, this row is not used for calculation.

    • If the distinct keyword is specified, the population standard deviation of distinct values is calculated.

  • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

Return value

  • If the value of the column specified by colname in a row is null, the row is not used for calculation. The following table describes the mappings between data types of input data and return values.

    Input type

    Return value type

    TINYINT

    DOUBLE

    SMALLINT

    DOUBLE

    INT

    DOUBLE

    BIGINT

    DOUBLE

    FLOAT

    DOUBLE

    DOUBLE

    DOUBLE

    DECIMAL

    DECIMAL

  • A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned.

Sample data

This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample statement:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Examples

  • Example 1: Calculate the population standard deviation of all values in the sal column. Sample statement:

    select stddev(sal) from emp;

    The following result is returned:

    +------------+
    | _c0        |
    +------------+
    | 1262.7549932628976 |
    +------------+
  • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the population standard deviation of salary values of employees in each department. Sample statement:

    select deptno, stddev(sal) from emp group by deptno;

    The following result is returned:

    +------------+------------+
    | deptno     | _c1        |
    +------------+------------+
    | 10         | 1546.1421524412158 |
    | 20         | 1004.7387720198718 |
    | 30         | 610.1001739241043 |
    +------------+------------+
  • Example 3: Use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is not specified. This function returns the cumulative population standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:

    select deptno, sal, stddev(sal) over (partition by deptno) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 1546.1421524412158 |   -- This row is the first row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
    | 10         | 2450       | 1546.1421524412158 |   -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. 
    | 10         | 5000       | 1546.1421524412158 |
    | 10         | 1300       | 1546.1421524412158 |
    | 10         | 5000       | 1546.1421524412158 |
    | 10         | 2450       | 1546.1421524412158 |
    | 20         | 3000       | 1004.7387720198718 |
    | 20         | 3000       | 1004.7387720198718 |
    | 20         | 800        | 1004.7387720198718 |
    | 20         | 1100       | 1004.7387720198718 |
    | 20         | 2975       | 1004.7387720198718 |
    | 30         | 1500       | 610.1001739241042 |
    | 30         | 950        | 610.1001739241042 |
    | 30         | 1600       | 610.1001739241042 |
    | 30         | 1250       | 610.1001739241042 |
    | 30         | 1250       | 610.1001739241042 |
    | 30         | 2850       | 610.1001739241042 |
    +------------+------------+------------+
  • Example 4: In non-Hive-compatible mode, use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:

    -- Disable the Hive-compatible mode. 
    set odps.sql.hive.compatible=false;
    -- Execute the following statement: 
    select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 0.0        |           -- This row is the first row of this window. 
    | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows. 
    | 10         | 2450       | 542.1151989096865 |    -- The return value is the cumulative population standard deviation of the values from the first row to the third row. 
    | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. 
    | 10         | 5000       | 1351.6656391282572 |
    | 10         | 5000       | 1546.1421524412158 |
    | 20         | 800        | 0.0        |
    | 20         | 1100       | 150.0      |
    | 20         | 2975       | 962.4188277460079 |
    | 20         | 3000       | 1024.2947268730811 |
    | 20         | 3000       | 1004.7387720198718 |
    | 30         | 950        | 0.0        |
    | 30         | 1250       | 150.0      |
    | 30         | 1250       | 141.4213562373095 |
    | 30         | 1500       | 194.8557158514987 |
    | 30         | 1600       | 226.71568097509268 |
    | 30         | 2850       | 610.1001739241042 |
    +------------+------------+------------+
  • Example 5: In Hive-compatible mode, use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statements:

    -- Enable the Hive-compatible mode. 
    set odps.sql.hive.compatible=true;
    -- Execute the following statement: 
    select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 0.0        |           -- This row is the first row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. 
    | 10         | 1300       | 0.0        |           -- The return value is the cumulative population standard deviation of the values in the first and second rows. 
    | 10         | 2450       | 575.0      |           -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. 
    | 10         | 2450       | 575.0      |           -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. 
    | 10         | 5000       | 1546.1421524412158 |
    | 10         | 5000       | 1546.1421524412158 |
    | 20         | 800        | 0.0        |
    | 20         | 1100       | 150.0      |
    | 20         | 2975       | 962.4188277460079 |
    | 20         | 3000       | 1004.7387720198718 |
    | 20         | 3000       | 1004.7387720198718 |
    | 30         | 950        | 0.0        |
    | 30         | 1250       | 141.4213562373095 |
    | 30         | 1250       | 141.4213562373095 |
    | 30         | 1500       | 194.8557158514987 |
    | 30         | 1600       | 226.71568097509268 |
    | 30         | 2850       | 610.1001739241042 |
    +------------+------------+------------+

Related functions

STDDEV is an aggregate function or a window function.

  • For more information about the functions that are used to calculate the average value of multiple input records and aggregate parameters, see Aggregate functions.

  • For more information about the functions that are used to calculate the sum of data of columns in a window and sort data, see Window functions.