All Products
Search
Document Center

APPROX_COUNT_DISTINCT

Last Updated: Jun 18, 2021

The APPROX_COUNT_DISTINCT function is an aggregate function. It calculates the number of rows in a column where duplicates are removed, and can return only one approximate value. You can use this function to further calculate the selectivity of the referenced column.

Compared with the COUNT(DISTINCT x) function, APPROX_COUNT_DISTINCT returns an approximate value. Therefore, the calculation speed of APPROX_COUNT_DISTINCT is super high. It often takes a long time for COUNT(DISTINCT x) to process a large amount of data. APPROX_COUNT_DISTINCT sacrifices a small amount of accuracy for a significant increase in computational efficiency.

Syntax

APPROX_COUNT_DISTINCT(expr)

Parameters

Parameter

Description

expr

The numeric column.

Return type

The data of the NUMBER type is returned.

Examples

The following statements create the employees table and insert data into the table:

CREATE TABLE employees (manager_id INT,last_name varchar(50),hiredate varchar(50),SALARY INT);
INSERT INTO employees VALUES(100, 'Raphaely', '2017-07-01', 1700);
INSERT INTO employees VALUES(100, 'De Haan', '2018-05-01',11000);      
INSERT INTO employees VALUES(100, 'Errazuriz', '2017-07-21', 1400);
INSERT INTO employees VALUES(100, 'Hartstein', '2019-05-01',14000);     
INSERT INTO employees VALUES(100, 'Raphaely', '2017-07-22', 1700);
INSERT INTO employees VALUES(100, 'Weiss',  '2019-07-11',13500);     
INSERT INTO employees VALUES(100, 'Russell', '2019-10-05', 13000);
INSERT INTO employees VALUES(100, 'Partners',  '2018-12-01',14000);     
INSERT INTO employees VALUES(200, 'Ross',  '2019-06-11',13500);     
INSERT INTO employees VALUES(200, 'Bell', '2019-05-25', 13000);
INSERT INTO employees VALUES(200, 'Part',  '2018-08-11',14000);  
COMMIT;

Execute the following statement:

SELECT last_name, salary, APPROX_COUNT_DISTINCT(salary) OVER (ORDER BY hiredate) "Variance"
FROM employees WHERE manager_id = 100 ORDER BY last_name, salary, "Variance";

The following query result is returned:

+-----------+--------+----------+
| LAST_NAME | SALARY | Variance |
+-----------+--------+----------+
| De Haan   |  11000 |        3 |
| Errazuriz |   1400 |        2 |
| Hartstein |  14000 |        4 |
| Partners  |  14000 |        4 |
| Raphaely  |   1700 |        1 |
| Raphaely  |   1700 |        2 |
| Russell   |  13000 |        6 |
| Weiss     |  13500 |        5 |
+-----------+--------+----------+