All Products
Search
Document Center

DISTINCT

Last Updated: Jun 18, 2021

The DISTINCT operators remove duplicate data rows, including duplicate NULL values.

The DISTINCT operators include HASH DISTINCT and MERGE DISTINCT.

HASH DISTINCT

The HASH DISTINCT operator uses the hash algorithm to perform a DISTINCT operation.

Example 1: Using the hash algorithm to perform a DISTINCT operation and removing duplicate rows in column c1 of table t1

obclient>CREATE TABLE t1(c1 INT, c2 INT);
Query OK, 0 rows affected (0.09 sec)

obclient>CREATE TABLE t2(c1 INT, c2 INT);
Query OK, 0 rows affected (0.09 sec)

obclient>EXPLAIN SELECT /*+USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1\G;
*************************** 1. row ***************************
Query Plan: 
|=======================================
|ID|OPERATOR     |NAME|EST. ROWS|COST |
---------------------------------------
|0 |HASH DISTINCT|    |101      |99169|
|1 | TABLE SCAN  |t1  |100000   |66272|
=======================================
Outputs & filters: 
-------------------------------------
  0 - output([t1.c1]), filter(nil), 
      distinct([t1.c1])
  1 - output([t1.c1]), filter(nil), 
      access([t1.c1]), partitions(p0)

In the preceding example, HASH DISTINCT, the No. 0 operator in the execution plan display, performs the deduplication. The Outputs & filters section shows in detail the output information of the HASH DISTINCT operator.

Field

Description

output

The output columns of the operator.

filter

The filter predicates of the operator.

In this example, the filter is set to nil because no filter is configured for the HASH DISTINCT operator.

partition

The partitions to be scanned in the query.

distinct

Specifies the columns to be deduplicated.

For example, the t1.c1 parameter in distinct([t1.c1]) specifies to deduplicate the c1 column of the table t1 by using the hash algorithm.

MERGE DISTINCT

The MERGE DISTINCT operator uses the merge algorithm to perform a DISTINCT operation.

Example 2: Using the merge algorithm to perform a DISTINCT operation

obclient>EXPLAIN SELECT /*+NO_USE_HASH_AGGREGATION*/ DISTINCT c1 FROM t1\G;
*************************** 1. row ***************************
Query Plan: 
|=======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |MERGE DISTINCT|    |3        |40  |
|1 | SORT         |    |3        |39  |
|2 |  TABLE SCAN  |t1  |3        |37  |
=======================================
Outputs & filters: 
-------------------------------------
  0 - output([t1.c1]), filter(nil), 
      distinct([t1.c1])
  1 - output([t1.c1]), filter(nil), sort_keys([t1.c1, ASC])
  2 - output([t1.c1]), filter(nil), 
      access([t1.c1]), partitions(p0)

In the preceding example, MERGE DISTINCT, the No. 0 operator, performs the deduplication by using the merge algorithm. Because the MERGE DISTINCT operator only takes ordered input data, a SORT operator is required to sort the unordered output data from the No. 2 operator before deduplication. The Outputs & filters section in the execution plan display shows in detail the output information of the MERGE DISTINCT operator.

Field

Description

output

The output columns of the operator.

filter

The filter predicates of the operator.

In this example, the filter is set to nil because no filter condition is configured for the MERGE DISTINCT operator.

distinct

Specifies the columns to be deduplicated.

For example, the t1.c1 parameter in distinct([t1.c1]) specifies deduplication of the c1 column of table t1 by merge algorithm.