All Products
Search
Document Center

PolarDB:Deduplication and sorting by using DISTINCT and ORDER BY

Last Updated:Dec 03, 2024

DISTINCT and ORDER BY are two Oracle SQL clauses. DISTINCT deduplicates query results, and ORDER BY sorts query results. When the two clauses are used together, you need to notice their syntax and execution order. This topic describes how to use DISTINCT and ORDER BY in Oracle SQL statements.

Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;
  • SELECT DISTINCT: duplicates the result and returns unique records.

  • FROM: specifies the table to be queried.

  • WHERE: optional. The query conditions.

  • ORDER BY: sorts the result set. You can specify one or more columns and select ascending (ASC) or descending (DESC ).

How they work

In SQL queries, DISTINCT is executed before ORDER BY. The system selects unique records and then sorts them.

Examples

A table named employees has the following structure:

CREATE TABLE employees (
    employee_id INT PRIMARY KEY , - The unique ID of an employee.
    first_name VARCHAR(50) NOT NULL, -- The first name of an employee
    last_name VARCHAR(50) NOT NULL, -- The last name of an employee.
    department VARCHAR(50), -- The department of an employee.
    salary DECIMAL (10,2) CHECK (salary > 0) -- The salary of an employee.
);

INSERT INTO employees VALUES (1, 'John', 'Doe', 'Sales', 5000);
INSERT INTO employees VALUES (2, 'Jane', 'Smith', 'Marketing', 6000);
INSERT INTO employees VALUES (3, 'John', 'Doe', 'Sales', 5000);
INSERT INTO employees VALUES (4, 'Alice', 'Johnson', 'IT', 7000);
INSERT INTO employees VALUES (5, 'Bob', 'Brown', 'Sales', 5000);
  • Select unique department values and sort them in ascending order.

    SELECT DISTINCT DEPARTMENT
    FROM employees
    ORDER BY DEPARTMENT ASC;

    Sample result:

     department 
    ------------
     IT
     Marketing
     Sales
    (3 rows)
  • Select unique salary values and sort them in descending order.

    SELECT DISTINCT SALARY
    FROM employees
    ORDER BY SALARY DESC;

    Sample result:

     salary  
    ---------
     7000.00
     6000.00
     5000.00
    (3 rows)
  • Select unique name values and sort them in ascending order.

    SELECT DISTINCT FIRST_NAME, LAST_NAME
    FROM employees
    ORDER BY FIRST_NAME ASC, LAST_NAME ASC;

    Sample result:

    Note

    Even if multiple records with the same FIRST_NAME and LAST_NAME, only one unique name is returned because DISTINCT is used.

     first_name | last_name 
    ------------+-----------
     Alice      | Johnson
     Bob        | Brown
     Jane       | Smith
     John       | Doe
    (4 rows)

Usage notes

  • Columns in ORDER BY

    When you use ORDER BY, the columns specified in the clause must exist in the SELECT statement. If you reference a column in ORDER BY that is not in the SELECT statement, PolarDB returns an error.

    Incorrect example:

    SELECT DISTINCT FIRST_NAME
    FROM employees
    ORDER BY LAST_NAME ASC;

    The following error message is returned:

    ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

    For the sake of Oracle compatibility, you can reference columns that are not in a SELECT statement after setting the polar_enable_distinct_orderby_new_column parameter to ON. To use this feature, perform the following steps:

    1. Set the polar_enable_distinct_orderby_new_column parameter to ON.

      Note

      The polar_enable_distinct_orderby_new_column parameter is first introduced in PolarDB for PostgreSQL (Compatible with Oracle) 2.0 with revision version 2.0.14.26.0. If your cluster version is earlier than 2.0.14.26.0 and you want to use this feature, upgrade your cluster to 2.0.14.26.0 or later. For more information about how to view and upgrade minor versions, see Version management.

    2. Reference a column in ORDER BY that is not in the SELECT statement.

    3. Add the FETCH FIRST X rows ONLY or LIMIT keyword to the end of the query statement.

      SELECT DISTINCT FIRST_NAME
      FROM employees
      ORDER BY LAST_NAME ASC
      FETCH FIRST 2 rows ONLY;
  • Use aliases

    You can specify aliases for columns in the SELECT statement and then use these aliases in ORDER BY.

    SELECT DISTINCT FIRST_NAME AS FN
    FROM employees
    ORDER BY FN ASC;
  • Use positions of the columns referenced in ORDER BY

    You can use the position of a column to sort data. For example, ORDER BY 1 indicates that data is sorted by the first column in the SELECT statement.

    SELECT DISTINCT FIRST_NAME, LAST_NAME
    FROM employees
    ORDER BY 1 ASC;

Advanced usage

  • Combined with WHERE

    DISTINCT and ORDER BY can be combined with WHERE to filter records.

    Example:

    SELECT DISTINCT DEPARTMENT
    FROM employees
    WHERE SALARY > 5000
    ORDER BY DEPARTMENT ASC;

    Sample result:

     department 
    ------------
     IT
     Marketing
    (2 rows)
  • Combined with subqueries

    In some complex query scenarios, DISTINCT and ORDER BY can be combined with subqueries.

    Example:

    Select the department with the highest salary and then sort the departments:

    SELECT DEPARTMENT, MAX(SALARY) AS MAX_SALARY
    FROM employees
    GROUP BY DEPARTMENT
    ORDER BY DEPARTMENT ASC;

    Sample result:

     department | max_salary 
    ------------+------------
     IT         |    7000.00
     Marketing  |    6000.00
     Sales      |    5000.00
    (3 rows)
  • Combined with window functions

    DISTINCT and ORDER BY can be combined with window functions to implement more complex data analysis.

    Example:

    Select the employee with the highest salary in each department and sort the departments in descending order of salary.

    SELECT DISTINCT DEPARTMENT, FIRST_NAME, LAST_NAME, SALARY
    FROM (
        SELECT DEPARTMENT, FIRST_NAME, LAST_NAME, SALARY,
               ROW_NUMBER() OVER (PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS rn
        FROM employees
    )
    WHERE rn = 1
    ORDER BY SALARY DESC;

    Sample result:

     department | first_name | last_name | salary 
    ------------+------------+-----------+--------
     IT         | Alice      | Johnson   |   7000
     Marketing  | Jane       | Smith     |   6000
     Sales      | John       | Doe       |   5000
    (3 rows)

Common errors and debugging

Sort the column not specified in WHERE

As described in Columns in ORDER BY, an error is returned if you use a column in the ORDER BY that is not in the SELECT statement. You must ensure that all columns specified in ORDER BY exist in the SELECT statement, or use aliases or column positions.

Note

If you set the polar_enable_distinct_orderby_new_column parameter to ON and add the FETCH FIRST X rows ONLY keyword, it is allowed for DISTINCT to implicitly reference the column in the ORDER BY.

Confuse the coverage of DISTINCT

DISTINCT apply to all columns in the SELECT statement, not just a single column. Therefore, the SELECT DISTINCT statement that specifies multiple columns deduplicates all specified columns.

Example:

SELECT DISTINCT FIRST_NAME, DEPARTMENT
FROM employees
ORDER BY FIRST_NAME ASC;

The above SQL statement deduplicates records filtered using FIRST_NAME and DEPARTMENT, not just using FIRST_NAME.

Performance improvements

When large datasets are involved, using DISTINCT and ORDER BY has impacts on query performance. The following optimization suggestions can be used:

  • Index optimization: Create appropriate indexes for the columns specified in WHERE, DISTINCT, and ORDER BY to speed up queries.

  • Minimize columns returned: Select only the necessary columns to reduce data volumes to be processed.

  • Use partitioning: You can partition large tables to improve query efficiency.

  • Execution plan analysis: Use the EXPLAIN PLAN statement to analyze execution plans and identify potential performance bottlenecks.

Example:

Use index optimization.

-- Create an index to optimize queries on the DEPARTMENT column.
CREATE INDEX idx_employees_department ON employees(DEPARTMENT);

Summary

DISTINCT and ORDER BY are powerful SQL tools in PolarDB for deduplicating records and sorting result sets. Understanding how they work, their syntax, and common scenarios helps you improve query efficiency and accuracy. These two clauses can be combined with other SQL features to implement complex data processing and analysis tasks.