STDDEV_SAMP calculates the sample standard deviation of a set of non-NULL values. It works as both an aggregate function and a window function.
To calculate the population standard deviation instead, use STDDEV_POP.
Syntax
Aggregate function
DOUBLE STDDEV_SAMP(DOUBLE <colname>)
DECIMAL STDDEV_SAMP(DECIMAL <colname>)Window function
DOUBLE STDDEV_SAMP([DISTINCT] <expr>) OVER ([partition_clause] [orderby_clause] [frame_clause])
DECIMAL STDDEV_SAMP([DISTINCT] <expr>) OVER ([partition_clause] [orderby_clause] [frame_clause])Parameters
Aggregate function
| Parameter | Required | Description |
|---|---|---|
colname | Yes | The column to calculate the sample standard deviation for. Must be DOUBLE or DECIMAL. STRING and BIGINT values are implicitly converted to DOUBLE. |
Window function
| Parameter | Required | Description |
|---|---|---|
expr | Yes | The expression to calculate the sample standard deviation for. Must evaluate to DOUBLE or DECIMAL. STRING and BIGINT values are implicitly converted to DOUBLE; other types cause an error. |
DISTINCT | No | When specified, calculates the sample standard deviation of unique values only. |
partition_clause, orderby_clause, frame_clause | No | Window definition. For details, see Window functions. |
Return value
Aggregate function — return type by input type:
| Input type | Return type |
|---|---|
| TINYINT | DOUBLE |
| SMALLINT | DOUBLE |
| INT | DOUBLE |
| BIGINT | DOUBLE |
| FLOAT | DOUBLE |
| DOUBLE | DOUBLE |
| DECIMAL | DECIMAL |
Window function — the return type matches the type of expr.
Usage notes
STDDEV_SAMPis supported only inSELECTstatements when used as a window function.Window functions cannot contain nested window functions or nested aggregate functions.
Window functions and aggregate functions of the same level cannot be used in the same query.
NULL rows are excluded from the calculation in both aggregate and window mode.
If all input values are NULL, the function returns NULL.
If the input contains exactly one non-NULL value, the function returns
0.
If you run SET odps.sql.bigquery.compatible=true; to enable BigQuery compatibility mode, a window with exactly one non-NULL value returns NULL instead of 0.
When
STDDEV_SAMPis used as a window function without anORDER BYclause, the function calculates the sample standard deviation over all rows in the partition and returns the same value for every row in that partition.When used with
ORDER BY, the function calculates a cumulative sample standard deviation from the first row of the partition up to the current row.
Sample data
The examples in the next section use an emp table. Create the table and load the sample data with the following statements:
CREATE TABLE IF NOT EXISTS emp
(empno BIGINT,
ename STRING,
job STRING,
mgr BIGINT,
hiredate DATETIME,
sal BIGINT,
comm BIGINT,
deptno BIGINT);
TUNNEL UPLOAD emp.txt emp; -- Replace emp.txt with the actual path to your data file.The emp.txt file contains the following data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10Examples
Example 1: Sample standard deviation per department (no ORDER BY)
The window is partitioned by deptno with no ORDER BY clause. Because there is no ordering, the function calculates the standard deviation over all rows in each partition and returns the same value for every row in that partition.
SELECT deptno, sal, STDDEV_SAMP(sal) OVER (PARTITION BY deptno) FROM emp;Result:
+------------+------------+--------------------+
| deptno | sal | _c2 |
+------------+------------+--------------------+
| 10 | 1300 | 1693.7138680032904 |
| 10 | 2450 | 1693.7138680032904 |
| 10 | 5000 | 1693.7138680032904 |
| 10 | 1300 | 1693.7138680032904 |
| 10 | 5000 | 1693.7138680032904 |
| 10 | 2450 | 1693.7138680032904 |
| 20 | 3000 | 1123.3320969330487 |
| 20 | 3000 | 1123.3320969330487 |
| 20 | 800 | 1123.3320969330487 |
| 20 | 1100 | 1123.3320969330487 |
| 20 | 2975 | 1123.3320969330487 |
| 30 | 1500 | 668.331255192114 |
| 30 | 950 | 668.331255192114 |
| 30 | 1600 | 668.331255192114 |
| 30 | 1250 | 668.331255192114 |
| 30 | 1250 | 668.331255192114 |
| 30 | 2850 | 668.331255192114 |
+------------+------------+--------------------+Example 2: Cumulative sample standard deviation per department (with ORDER BY)
The window is partitioned by deptno and ordered by sal. The function calculates a running sample standard deviation from the first row of the partition through the current row, so the value increases as more rows are included.
SELECT deptno, sal, STDDEV_SAMP(sal) OVER (PARTITION BY deptno ORDER BY sal) FROM emp;Result:
+------------+------------+------------------------+
| deptno | sal | _c2 |
+------------+------------+------------------------+
| 10 | 1300 | 0.0 |
| 10 | 1300 | 0.0 |
| 10 | 2450 | 663.9528095680697 |
| 10 | 2450 | 663.9528095680696 |
| 10 | 5000 | 1511.2081259707413 |
| 10 | 5000 | 1693.7138680032904 |
| 20 | 800 | 0.0 |
| 20 | 1100 | 212.13203435596427 |
| 20 | 2975 | 1178.7175234126282 |
| 20 | 3000 | 1182.7536725793752 |
| 20 | 3000 | 1123.3320969330487 |
| 30 | 950 | 0.0 |
| 30 | 1250 | 212.13203435596427 |
| 30 | 1250 | 173.20508075688772 |
| 30 | 1500 | 225.0 |
| 30 | 1600 | 253.4758371127315 |
| 30 | 2850 | 668.331255192114 |
+------------+------------+------------------------+Example 3: Sample standard deviation across all employees
SELECT STDDEV_SAMP(sal) FROM emp;Result:
+--------------------+
| _c0 |
+--------------------+
| 1301.6180541247609 |
+--------------------+Example 4: Sample standard deviation per department using GROUP BY
SELECT deptno, STDDEV_SAMP(sal) FROM emp GROUP BY deptno;Result:
+------------+--------------------+
| deptno | _c1 |
+------------+--------------------+
| 10 | 1693.7138680032901 |
| 20 | 1123.3320969330487 |
| 30 | 668.3312551921141 |
+------------+--------------------+