All Products
Search
Document Center

MaxCompute:NUMERIC_HISTOGRAM

Last Updated:May 23, 2025

Returns the approximate histogram of a specified column. This function is an additional function of MaxCompute V2.0.

Usage notes

  • MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types, including TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY, you must execute a statement to enable the MaxCompute V2.0 data type edition:

    • Session level: Add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.

    • Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. Statement:

      setproject odps.sql.type.system.odps2=true;

      The configuration takes effect after 10 to 15 minutes.

      For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.

  • If you use an SQL statement that includes multiple aggregate functions and the project resources are insufficient, memory overflow may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.

Syntax

map<double key, double value> numeric_histogram(bigint <buckets>,
                                                double <colname>
                                                [, double <weight>])

Parameters

  • buckets: required. A value of the BIGINT type. This parameter specifies the maximum number of buckets in the column whose approximate histogram is returned.

  • colname: required. A value of the DOUBLE type. This parameter specifies the columns whose approximate histograms need to be calculated.

  • weight: optional. The weight value of the DOUBLE type. This parameter specifies the weight of a row of data.

Return value

A value of the map<double key, double value> type is returned. In the return value, key indicates the X-axis of the approximate histogram, and value indicates the approximate height of the Y-axis of the approximate histogram. The return value varies based on the following rules:

  • If the value of buckets is null, null is returned.

  • If the value of colname is null, the row that contains this value is not used for calculation.

Sample data

This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample statement:

create table if not exists emp
   (empno bigint,
    ename string,
    job string,
    mgr bigint,
    hiredate datetime,
    sal bigint,
    comm bigint,
    deptno bigint);
tunnel upload emp.txt emp; -- Replace emp.txt with the actual path (path and name) to which you upload the data file.

The emp.txt file contains the following sample data:

7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10

Examples

  • Return the approximate histogram of the sal column. Sample statement:

    select numeric_histogram(5, sal) from emp;

    The following result is returned:

    +------------+
    | _c0        |
    +------------+
    | {"1328.5714285714287":7.0,"2450.0":2.0,"5000.0":2.0,"875.0":2.0,"2956.25":4.0} |
    +------------+
  • Return the approximate histogram of the sal column. The deptno in each row of data indicates the department weight. Sample statement:

    select numeric_histogram(5, sal, deptno) from emp;

    The following result is returned:

    +------------+
    | _c0        |
    +------------+
    | {"2944.4444444444443":90.0,"2450.0":20.0,"5000.0":20.0,"890.0":50.0,"1350.0":160.0} |
    +------------+

Related functions

NUMERIC_HISTOGRAM is an aggregate function. For more information about aggregate functions, see Aggregate functions.