Clauses in a SELECT statement that is written in compliance with the SELECT syntax of MaxCompute are executed in a different sequence from the clauses in a standard SELECT statement. This topic describes the sequence for executing clauses in a SELECT statement of MaxCompute and provides examples for reference.

Sequence for executing clauses in a SELECT statement

The SELECT syntax includes the following clauses:
  • SELECT
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY
  • DISTRIBUTE BY
  • SORT BY
  • LIMIT
The following clauses cannot be used with DISTRIBUTE BY or SORT BY: ORDER BY and GROUP BY. To address this issue, you can execute clauses in a SELECT statement in one of the following sequences:
  • Sequence 1: FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT
  • Sequence 2: FROM > WHERE > SELECT > DISTRIBUTE BY > SORT BY
To avoid confusion, MaxCompute allows you to write a SELECT statement in the preceding sequences. The syntax of a SELECT statement can be changed to the following form:
from <table_reference>
[where <where_condition>]
[group by <col_list>]
[having <having_condition>]
select [all | distinct] <select_expr>, <select_expr>, ...
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>] ]
[limit <number>]

Sample data

This topic provides source data and sample statements for generating source data. This helps you understand how to prepare source data. Sample statements:
-- Create a partitioned table named sale_detail.
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions to the sale_detail table.
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');

-- Insert data into the sale_detail table.
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);

Examples

  • Example 1: Clauses in a SELECT statement are executed in Sequence 1.
    -- Write a SELECT statement based on the SELECT syntax.
    select region,max(total_price) 
    from sale_detail 
    where total_price > 100
    group by region 
    having sum(total_price)>305 
    order by region 
    limit 5;
    -- Write a SELECT statement based on Sequence 1. The following statement is equivalent to the preceding statement.
    from sale_detail 
    where total_price > 100 
    group by region 
    having sum(total_price)>305 
    select region,max(total_price) 
    order by region 
    limit 5;
    The following result is returned:
    +------------+------------+
    | region     | _c1        |
    +------------+------------+
    | china      | 100.3      |
    +------------+------------+
    Logic to execute clauses in the SELECT statement:
    1. Retrieves the data that meets the condition (WHERE total_price > 100) from the sale_detail table (FROM sale_detail).
    2. Groups the data obtained from Step a based on the values of the region column (GROUP BY).
    3. Retrieves the data whose total_price is greater than 305 from the data obtained from Step b (HAVING sum(total_price)>305).
    4. Obtains the maximum value of the total_price column in each region (SELECT region,max(total_price)) from the data obtained from Step c.
    5. Sorts the data obtained from Step d based on the values of the region column (ORDER BY region).
    6. Displays the first five data records (LIMIT 5) of the data obtained from Step e.
  • Example 2: Clauses in a SELECT statement are executed in Sequence 2.
    -- Write a SELECT statement based on the SELECT syntax.
    select shop_name
           ,total_price
           ,region
    from   sale_detail
    where  total_price > 100.2
    distribute by region
    sort by total_price;
    -- Write a SELECT statement based on Sequence 2. The following statement is equivalent to the preceding statement.
    from   sale_detail 
    where  total_price > 100.2 
    select shop_name
           ,total_price
           ,region 
    distribute by region 
    sort by total_price;
    The following result is returned:
    +------------+-------------+------------+
    | shop_name  | total_price | region     |
    +------------+-------------+------------+
    | s3         | 100.3       | china      |
    | s6         | 100.4       | shanghai   |
    | s7         | 100.5       | shanghai   |
    +------------+-------------+------------+
    Logic to execute clauses in the SELECT statement:
    1. Retrieves the data that meets the condition (WHERE total_price > 100.2) from the sale_detail table (FROM sale_detail).
    2. Retrieves data from the data obtained from Step a based on the values of the shop name, total price, and region columns (SELECT shop_name, total_price, region).
    3. Performs hash partitioning on the data obtained from Step b based on the values of the region column (DISTRIBUTE BY region).
    4. Sorts the data obtained from Step c based on the values of the total price column (SORT BY total_price).