All Products
Search
Document Center

PolarDB:Deduplication and sorting by using DISTINCT and ORDER BY

Last Updated:Mar 28, 2026

DISTINCT removes duplicate rows from a query result. ORDER BY sorts that result. Used together, they have specific interaction rules—particularly around which columns ORDER BY can reference. This topic explains the syntax, execution order, usage constraints, and performance considerations for combining DISTINCT and ORDER BY in PolarDB for PostgreSQL (Compatible with Oracle).

How it works

DISTINCT runs before ORDER BY. The database first eliminates duplicate rows, then sorts the deduplicated result set.

This order has a direct consequence: because DISTINCT collapses rows based on the selected columns, ORDER BY can only sort by columns that appear in the SELECT list. Sorting by a column that is not in the output would be meaningless—after deduplication, there is no unambiguous value for that column to sort by.

Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;
ClauseDescription
SELECT DISTINCTReturns only unique records, eliminating duplicates.
FROMSpecifies the table to query.
WHERE(Optional) Filters rows before deduplication.
ORDER BYSorts the deduplicated result set. Accepts one or more columns in ascending (ASC) or descending (DESC) order.

Sample dataset

All examples in this topic use the following employees table.

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,          -- unique employee ID
    first_name  VARCHAR(50) NOT NULL,
    last_name   VARCHAR(50) NOT NULL,
    department  VARCHAR(50),
    salary      DECIMAL(10,2) CHECK (salary > 0)
);

INSERT INTO employees VALUES (1, 'John',  'Doe',     'Sales',     5000);
INSERT INTO employees VALUES (2, 'Jane',  'Smith',   'Marketing', 6000);
INSERT INTO employees VALUES (3, 'John',  'Doe',     'Sales',     5000);
INSERT INTO employees VALUES (4, 'Alice', 'Johnson', 'IT',        7000);
INSERT INTO employees VALUES (5, 'Bob',   'Brown',   'Sales',     5000);

Examples

Unique departments in ascending order

SELECT DISTINCT DEPARTMENT
FROM employees
ORDER BY DEPARTMENT ASC;

Output:

 department
------------
 IT
 Marketing
 Sales
(3 rows)

Unique salaries in descending order

SELECT DISTINCT SALARY
FROM employees
ORDER BY SALARY DESC;

Output:

  salary
---------
  7000.00
  6000.00
  5000.00
(3 rows)

Unique full names in ascending order

SELECT DISTINCT FIRST_NAME, LAST_NAME
FROM employees
ORDER BY FIRST_NAME ASC, LAST_NAME ASC;

Output:

 first_name | last_name
------------+-----------
 Alice      | Johnson
 Bob        | Brown
 Jane       | Smith
 John       | Doe
(4 rows)
DISTINCT applies to the combination of all selected columns. Rows 1 and 3 share the same FIRST_NAME and LAST_NAME, so only one appears in the result.

Usage notes

Columns in ORDER BY must appear in SELECT

When DISTINCT is present, every column referenced in ORDER BY must exist in the SELECT list. The constraint exists because DISTINCT collapses rows based on selected columns—a sort column not in the output has no unambiguous value to sort by after deduplication.

Referencing a column that is not selected returns an error:

-- Incorrect: LAST_NAME is not in the SELECT list
SELECT DISTINCT FIRST_NAME
FROM employees
ORDER BY LAST_NAME ASC;
ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

Oracle compatibility exception

To match Oracle behavior, set the polar_enable_distinct_orderby_new_column parameter to ON. This allows ORDER BY to reference columns not in the SELECT list, but requires a FETCH FIRST X ROWS ONLY or LIMIT clause at the end of the statement.

polar_enable_distinct_orderby_new_column was introduced in PolarDB for PostgreSQL (Compatible with Oracle) 2.0, revision version 2.0.14.26.0. If your cluster is on an earlier version, upgrade it first. For upgrade instructions, see Version management.

Example after enabling the parameter:

SELECT DISTINCT FIRST_NAME
FROM employees
ORDER BY LAST_NAME ASC
FETCH FIRST 2 ROWS ONLY;

Using aliases in ORDER BY

Define an alias in SELECT and reference it in ORDER BY:

SELECT DISTINCT FIRST_NAME AS FN
FROM employees
ORDER BY FN ASC;

Using column positions in ORDER BY

Reference a column by its position in the SELECT list:

SELECT DISTINCT FIRST_NAME, LAST_NAME
FROM employees
ORDER BY 1 ASC;
Positional references are fragile. If the column order in SELECT changes, the sort behavior changes silently without an error. Prefer explicit column names in production queries.

DISTINCT applies to all selected columns

DISTINCT deduplicates based on the full combination of selected columns, not just the first one.

SELECT DISTINCT FIRST_NAME, DEPARTMENT
FROM employees
ORDER BY FIRST_NAME ASC;

This query deduplicates by the (FIRST_NAME, DEPARTMENT) pair, not by FIRST_NAME alone. A user named "John" in two different departments would appear twice.

Advanced usage

Filtering with WHERE

WHERE filters rows before DISTINCT removes duplicates:

SELECT DISTINCT DEPARTMENT
FROM employees
WHERE SALARY > 5000
ORDER BY DEPARTMENT ASC;

Output:

 department
------------
 IT
 Marketing
(2 rows)

Finding the maximum value per group

Use GROUP BY and MAX() to get the highest salary in each department:

SELECT DEPARTMENT, MAX(SALARY) AS MAX_SALARY
FROM employees
GROUP BY DEPARTMENT
ORDER BY DEPARTMENT ASC;

Output:

 department | max_salary
------------+------------
 IT         |    7000.00
 Marketing  |    6000.00
 Sales      |    5000.00
(3 rows)

Ranking with window functions

Use ROW_NUMBER() to identify the top earner in each department, then sort by salary:

SELECT DISTINCT DEPARTMENT, FIRST_NAME, LAST_NAME, SALARY
FROM (
    SELECT DEPARTMENT, FIRST_NAME, LAST_NAME, SALARY,
           ROW_NUMBER() OVER (PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS rn
    FROM employees
)
WHERE rn = 1
ORDER BY SALARY DESC;

Output:

 department | first_name | last_name | salary
------------+------------+-----------+--------
 IT         | Alice      | Johnson   |   7000
 Marketing  | Jane       | Smith     |   6000
 Sales      | John       | Doe       |   5000
(3 rows)

Performance considerations

DISTINCT and ORDER BY together require the database to deduplicate and then sort the result, which can be expensive on large datasets. Apply the following optimizations as needed:

  • Create indexes on columns used in WHERE, DISTINCT, and ORDER BY to reduce scan costs.

    -- Index to speed up queries on the DEPARTMENT column
    CREATE INDEX idx_employees_department ON employees(DEPARTMENT);
  • Select only necessary columns. Each additional column in SELECT DISTINCT increases the data volume the database must process and compare.

  • Partition large tables. Partitioning limits the rows each query touches, improving query efficiency.

  • Analyze execution plans. Use EXPLAIN PLAN to identify bottlenecks and verify that indexes are being used.