MaxCompute executes the clauses in a SELECT statement in a different order from how they appear in the written syntax. Understanding this execution order helps you write correct query logic — especially when using GROUP BY, HAVING, and MaxCompute-specific clauses like DISTRIBUTE BY and SORT BY.
Execution sequences
The SELECT syntax supports the following clauses: SELECT, FROM, WHERE, GROUP BY, HAVING, WINDOW, QUALIFY, ORDER BY, DISTRIBUTE BY, SORT BY, and LIMIT.
MaxCompute executes these clauses in one of two sequences, depending on which clauses you use.
ORDER BYandGROUP BYcannot be used together withDISTRIBUTE BYorSORT BY.
Sequence 1 — used when your query includes GROUP BY, ORDER BY, or LIMIT:
-
FROM -
WHERE -
GROUP BY -
HAVING -
SELECT -
ORDER BY -
LIMIT
Sequence 2 — used when your query includes DISTRIBUTE BY or SORT BY:
-
FROM -
WHERE -
SELECT -
DISTRIBUTE BY -
SORT BY
Although SELECT appears first in the written syntax, it executes only after FROM, WHERE, GROUP BY, and HAVING are processed.
Write statements in execution order
To make query logic easier to read, MaxCompute lets you write SELECT statements in execution order — with FROM first instead of SELECT first. The two forms are equivalent:
-- Standard form (SELECT first)
SELECT region, max(total_price)
FROM sale_detail
WHERE total_price > 100
GROUP BY region
HAVING sum(total_price) > 300.5
ORDER BY region
LIMIT 5;
-- Execution-order form (FROM first) — equivalent to the statement above
FROM sale_detail
WHERE total_price > 100
GROUP BY region
HAVING sum(total_price) > 300.5
SELECT region, max(total_price)
ORDER BY region
LIMIT 5;
The full execution-order syntax is:
FROM <table_reference>
[WHERE <where_condition>]
[GROUP BY <col_list>]
[HAVING <having_condition>]
[WINDOW <window_name> AS (<window_definition>)]
[QUALIFY <expression>]
SELECT [ALL | DISTINCT] <select_expr>, <select_expr>, ...
[ORDER BY <order_condition>]
[DISTRIBUTE BY <distribute_condition> [SORT BY <sort_condition>]]
[LIMIT <number>]
Sample data
The examples in this topic use a partitioned table called sale_detail. Run the following statements to create and populate the table:
-- Create a partitioned table named sale_detail
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name string,
customer_id string,
total_price double
)
PARTITIONED BY (sale_date string, region string);
-- Add partitions
ALTER TABLE sale_detail ADD
PARTITION (sale_date='2013', region='china')
PARTITION (sale_date='2014', region='shanghai');
-- Insert data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china')
VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai')
VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
To view the full table:
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail;+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
Examples
Example 1: Sequence 1 (GROUP BY and ORDER BY)
Querying a partitioned table without specifying partitions requires a full table scan. Add SET odps.sql.allow.fullscan=true; before the statement to enable it.
The following two statements are equivalent. Both use Sequence 1.
-- Standard form
SET odps.sql.allow.fullscan=true;
SELECT region, max(total_price)
FROM sale_detail
WHERE total_price > 100
GROUP BY region
HAVING sum(total_price) > 300.5
ORDER BY region
LIMIT 5;
-- Execution-order form
SET odps.sql.allow.fullscan=true;
FROM sale_detail
WHERE total_price > 100
GROUP BY region
HAVING sum(total_price) > 300.5
SELECT region, max(total_price)
ORDER BY region
LIMIT 5;
Result:
+------------+------------+
| region | _c1 |
+------------+------------+
| china | 100.3 |
+------------+------------+
How the clauses execute:
-
FROM
sale_detail— reads rows from the table. -
WHERE
total_price > 100— keeps only rows wheretotal_priceexceeds 100. -
GROUP BY
region— groups the remaining rows by region. -
HAVING
sum(total_price) > 300.5— keeps only groups whose total price exceeds 300.5. -
SELECT
region, max(total_price)— projects the region and maximum price from each group. -
ORDER BY
region— sorts the result by region. -
LIMIT
5— returns the first five rows.
Example 2: Sequence 2 (DISTRIBUTE BY and SORT BY)
The following two statements are equivalent. Both use Sequence 2.
-- Standard form
SET odps.sql.allow.fullscan=true;
SELECT shop_name,
total_price,
region
FROM sale_detail
WHERE total_price > 100.2
DISTRIBUTE BY region
SORT BY total_price;
-- Execution-order form
SET odps.sql.allow.fullscan=true;
FROM sale_detail
WHERE total_price > 100.2
SELECT shop_name,
total_price,
region
DISTRIBUTE BY region
SORT BY total_price;
Result:
+------------+-------------+------------+
| shop_name | total_price | region |
+------------+-------------+------------+
| s3 | 100.3 | china |
| s6 | 100.4 | shanghai |
| s7 | 100.5 | shanghai |
+------------+-------------+------------+
How the clauses execute:
-
FROM
sale_detail— reads rows from the table. -
WHERE
total_price > 100.2— keeps only rows wheretotal_priceexceeds 100.2. -
SELECT
shop_name, total_price, region— projects the three columns. -
DISTRIBUTE BY
region— distributes rows across reducers using hash partitioning on the region column. -
SORT BY
total_price— sorts rows within each reducer in ascending order bytotal_price.