All Products
Search
Document Center

MaxCompute:Sequence for executing clauses in a SELECT statement

Last Updated:Jan 15, 2024

Clauses in a SELECT statement that is written in compliance with the SELECT syntax of MaxCompute are executed in a different sequence from the clauses in a standard SELECT statement. This topic describes the sequence for executing clauses in a SELECT statement of MaxCompute and provides examples for reference.

Sequence for executing clauses in a SELECT statement

The SELECT syntax includes the following clauses:

  • select

  • from

  • where

  • group by

  • having

  • window

  • qualify

  • order by

  • distribute by

  • sort by

  • limit

The following clauses cannot be used with DISTRIBUTE BY or SORT BY: ORDER BY and GROUP BY. You can execute clauses in a SELECT statement in one of the following sequences:

  • Sequence 1: FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT

  • Sequence 2: FROM > WHERE > SELECT > DISTRIBUTE BY > SORT BY

To prevent confusion, MaxCompute allows you to write a SELECT statement in the preceding sequences. The syntax of a SELECT statement can be changed to the following form:

from <table_reference>
[where <where_condition>]
[group by <col_list>]
[having <having_condition>]
[window <window_name> AS (<window_definition>)]
[qualify <expression>]
select [all | distinct] <select_expr>, <select_expr>, ...
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>] ]
[limit <number>]

Sample data

Sample source data is provided for you to better understand the examples in this topic. Sample statements:

-- Create a partitioned table named sale_detail. 
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions to the sale_detail table. 
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');

-- Insert data into the sale_detail table. 
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);

Query data in the sale_detail table. Sample statement:

set odps.sql.allow.fullscan=true;
select * from sale_detail; 
-- The following result is returned: 
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+

Examples

  • Example 1: Clauses in a SELECT statement are executed in Sequence 1.

    Note

    If you execute the following statement to query data from a partitioned table, you must add set odps.sql.allow.fullscan=true; before the statement to enable a full table scan or specify partitions in the statement.

    -- Write a SELECT statement based on the SELECT syntax. 
    set odps.sql.allow.fullscan=true;
    select region,max(total_price) 
    from sale_detail 
    where total_price > 100
    group by region 
    having sum(total_price)>300.5 
    order by region 
    limit 5;
    -- Write a SELECT statement based on Sequence 1. The following statement is equivalent to the preceding statement. 
    from sale_detail 
    where total_price > 100 
    group by region 
    having sum(total_price)>300.5 
    select region,max(total_price) 
    order by region 
    limit 5;

    The following result is returned:

    +------------+------------+
    | region     | _c1        |
    +------------+------------+
    | china      | 100.3      |
    +------------+------------+

    Logic to execute clauses in the SELECT statement:

    1. Retrieves the data that meets the condition (WHERE total_price > 100) from the sale_detail table (FROM sale_detail).

    2. Groups the data obtained from Step a based on the values of the region column (GROUP BY).

    3. Retrieves the data of the groups whose sum of total_price is greater than 305 from the data obtained from Step b (HAVING sum(total_price)>305).

    4. Obtains the maximum value of the total_price column in each region (SELECT region,max(total_price)) from the data obtained from Step c.

    5. Sorts the data obtained from Step d based on the values of the region column (ORDER BY region).

    6. Displays the first five data records (LIMIT 5) of the data obtained from Step e.

  • Example 2: Clauses in a SELECT statement are executed in Sequence 2.

    -- Write a SELECT statement based on the SELECT syntax. 
    set odps.sql.allow.fullscan=true;
    select shop_name
           ,total_price
           ,region
    from   sale_detail
    where  total_price > 100.2
    distribute by region
    sort by total_price;
    -- Write a SELECT statement based on Sequence 2. The following statement is equivalent to the preceding statement. 
    from   sale_detail 
    where  total_price > 100.2 
    select shop_name
           ,total_price
           ,region 
    distribute by region 
    sort by total_price;

    The following result is returned:

    +------------+-------------+------------+
    | shop_name  | total_price | region     |
    +------------+-------------+------------+
    | s3         | 100.3       | china      |
    | s6         | 100.4       | shanghai   |
    | s7         | 100.5       | shanghai   |
    +------------+-------------+------------+

    Logic to execute clauses in the SELECT statement:

    1. Retrieves the data that meets the condition (WHERE total_price > 100.2) from the sale_detail table (FROM sale_detail).

    2. Retrieves data from the data obtained from Step a based on the values of the shop name, total price, and region columns (SELECT shop_name, total_price, region).

    3. Performs hash partitioning on the data obtained from Step b based on the values of the region column (DISTRIBUTE BY region).

    4. Sorts the data obtained from Step c in ascending order based on the values of the total_price column (SORT BY total_price).