All Products
Search
Document Center

MaxCompute:MEDIAN

Last Updated:Jul 27, 2023

Calculates the median value of a column or the median of expr in a window.

Limits

Before you use window functions, take note of the following limits:

  • Window functions are supported only in SELECT statements.

  • A window function cannot contain nested window functions or nested aggregate functions.

  • You cannot use window functions together with aggregate functions of the same level.

Precautions

MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types, you must enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.

  • Session level: To use the MaxCompute V2.0 data type edition, you must add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.

  • Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. Sample command:

    setproject odps.sql.type.system.odps2=true;

    For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.

  • A worker can contain a maximum of 2 million elements.

If you use an SQL statement that includes multiple aggregate functions and the project resources are insufficient, memory overflow may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.

Syntax

-- Calculate the median of a column.
double median(double <colname>)
decimal median(decimal <colname>)

-- Calculate the median of expr in a window. 
median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])

Parameters

  • colname: required. The name of a column, which is of the DOUBLE or DECIMAL type. If the input value is of the STRING or BIGINT type, the value is implicitly converted into the DOUBLE type before calculation.

  • expr: required. The expression that is used to calculate the median. The value is of the DOUBLE or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.

    • If the input value is of the STRING or BIGINT type, it is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.

    • If the input value is NULL, NULL is returned.

  • partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.

Return value

A value of the DECIMAL or DOUBLE type is returned. The return value varies based on the following rules:
  • If the value of colname is of the DECIMAL type, a value of the DECIMAL type is returned.
  • If the value of colname is of another valid data type, a value of the DOUBLE type is returned.
  • If the value of colname is of the BOOLEAN type, the value is not used for calculation.
  • If the value of colname is null, the row that contains this value is not used for calculation.

Sample data

This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample statement:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp;

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Examples

  • Example 1: Calculate the median salary (sal) of all employees. Sample statement:

    select median(sal) from emp;

    The following result is returned:

    +------------+
    | _c0        |
    +------------+
    | 1600.0     |
    +------------+
  • Example 2: Use this function with GROUP BY to group all employees by department (deptno) and calculate the median salary values of employees in each department. Sample statement:

    select deptno, median(sal) from emp group by deptno;

    The following result is returned:

    +------------+------------+
    | deptno     | _c1        |
    +------------+------------+
    | 10         | 2450.0     |
    | 20         | 2975.0     |
    | 30         | 1375.0     |
    +------------+------------+
  • Example 3: Use the deptno column to define a window and calculate the median value of the sal column. This function returns the median value of the current window. The current window includes the rows that have the same deptno value. Sample statement:

    select deptno, sal, median(sal) over (partition by deptno) from emp;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | sal        | _c2        |
    +------------+------------+------------+
    | 10         | 1300       | 2450.0     |   -- This row is the first row of this window. The return value is the median value of the values from the first row to the sixth row. 
    | 10         | 2450       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 1300       | 2450.0     |
    | 10         | 5000       | 2450.0     |
    | 10         | 2450       | 2450.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 3000       | 2975.0     |
    | 20         | 800        | 2975.0     |
    | 20         | 1100       | 2975.0     |
    | 20         | 2975       | 2975.0     |
    | 30         | 1500       | 1375.0     |
    | 30         | 950        | 1375.0     |
    | 30         | 1600       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 1250       | 1375.0     |
    | 30         | 2850       | 1375.0     |
    +------------+------------+------------+

Related functions

MEDIAN is an aggregate function or a window function.

  • For more information about the functions that are used to calculate the average value of multiple input records and to aggregate parameters, see Aggregate functions.

  • For more information about the functions that are used to calculate the sum of data of columns in a window and to sort data, see Window functions.