All Products
Search
Document Center

MaxCompute:SELECT syntax

Last Updated:Mar 26, 2026

SELECT statements query data from tables. This topic covers the full SELECT syntax in MaxCompute, including clauses for filtering, grouping, sorting, and distributing data.

Before running SELECT statements, make sure you have the Select permission on the target table. For more information, see MaxCompute permissions.

Supported platforms:

Query operations

The following query operations are supported in SELECT statements.

OperationDescription
SubqueriesPerform further queries based on the result of a query
INTERSECT, UNION, and EXCEPTGet the intersection, union, or supplementary set of two datasets
JOINJoin tables based on join and query conditions
SEMI JOINFilter data in the left table using the right table; return data that appears only in the left table
MAPJOIN HINTSpecify MAPJOIN hints to optimize JOIN performance on one large table and one or more small tables
SKEWJOIN HINTHandle hot key values and long tail issues in JOIN operations
Lateral ViewUse LATERAL VIEW with a user-defined table-valued function (UDTF) to split one row into multiple rows
GROUPING SETSAggregate and analyze data across multiple dimensions
SELECT TRANSFORMStart a child process and use standard input/output for data I/O
Split Size hintModify the split size to control subtask parallelism
Time travel queries and incremental queriesQuery historical data (time travel) or historical incremental data (incremental) on Delta tables

Limitations

  • SELECT result display: A maximum of 10,000 rows can be displayed, and the returned result size must be less than 10 MB. This limit does not apply to SELECT clauses used inside larger queries — those return all results to the upper layer.

  • Full table scan on partitioned tables: Projects created after 20:00:00 on January 10, 2018 cannot perform a full table scan on partitioned tables. Always specify the partitions to scan. This reduces unnecessary I/O, conserves computing resources, and lowers costs when you use pay-as-you-go billing. To enable a full table scan for a specific query, run SET odps.sql.allow.fullscan=true; together with the query in the same commit:

    SET odps.sql.allow.fullscan=true;
    SELECT * FROM sale_detail;
  • Clustered table bucket pruning: Bucket pruning only takes effect when a single table scan covers 400 or fewer partitions. When pruning does not apply, more data is scanned — increasing costs under pay-as-you-go billing and degrading performance under subscription billing.

Syntax

[WITH <cte>[, ...] ]
SELECT [ALL | DISTINCT] <SELECT_expr>[, <EXCEPT_expr>][, <REPLACE_expr>] ...
       FROM <TABLE_reference>
       [WHERE <WHERE_condition>]
       [GROUP BY {<col_list>|ROLLUP(<col_list>)}]
       [HAVING <HAVING_condition>]
       [WINDOW <WINDOW_clause>]
       [ORDER BY <ORDER_condition>]
       [DISTRIBUTE BY <DISTRIBUTE_condition> [SORT BY <SORT_condition>]|[ CLUSTER BY <CLUSTER_condition>] ]
       [LIMIT <number>]

For the order in which clauses execute, see Sequence for executing clauses in a SELECT statement.

Sample data

The examples in this topic use the sale_detail table. Run the following statements to create the table and insert sample data:

-- Create a partitioned table named sale_detail.
CREATE TABLE IF NOT EXISTS sale_detail
(
shop_name     STRING,
customer_id   STRING,
total_price   DOUBLE
)
PARTITIONED BY (sale_date STRING, region STRING);

-- Add a partition.
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');

-- Insert sample data.
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);

Query all columns to confirm the data:

SELECT * FROM sale_detail;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

WITH clause (CTE)

The WITH clause defines one or more common table expressions (CTEs) as temporary tables that subsequent queries can reference.

Rules:

  • CTE names must be unique within a WITH clause.

  • A CTE can only reference other CTEs defined in the same WITH clause — not itself, and not in a cycle.

Recursive self-reference (invalid):

WITH
A AS (SELECT 1 FROM A)
SELECT * FROM A;

Error:

FAILED: ODPS-0130161:[1,6] Parse exception - recursive cte A is invalid, it must have an initial_part and a recursive_part, which must be connected by UNION ALL

Circular reference (invalid):

WITH
A AS (SELECT * FROM B),
B AS (SELECT * FROM A)
SELECT * FROM B;

Error:

FAILED: ODPS-0130071:[1,26] Semantic analysis exception - while resolving view B - [1,51]recursive function call is not supported, cycle is A->B->A

Valid usage:

WITH
A AS (SELECT 1 AS C),
B AS (SELECT * FROM A)
SELECT * FROM B;

Result:

+---+
| c |
+---+
| 1 |
+---+

Column expression (SELECT_expr)

SELECT_expr (required) specifies the columns or expressions to return. The format is col1_name, col2_name, column expression, ....

Specify column names

Read specific columns by name:

SELECT shop_name FROM sale_detail;

Result:

+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

Use * for all columns

* selects all columns. Combine with WHERE to filter:

-- Full table scan must be explicitly enabled for partitioned tables.
SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

With a WHERE filter:

SELECT * FROM sale_detail WHERE shop_name='s1';

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Use regular expressions

Wrap a regular expression in backticks to match column names. The backtick content is interpreted as a regex pattern, not a column name — this is MaxCompute-specific syntax.

Select all columns whose names start with sh:

SELECT `sh.*` FROM sale_detail;

Result:

+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

Exclude shop_name (select all other columns):

SELECT `(shop_name)?+.+` FROM sale_detail;

Result:

+-------------+-------------+------------+------------+
| customer_id | total_price | sale_date  | region     |
+-------------+-------------+------------+------------+
| c1          | 100.1       | 2013       | china      |
| c2          | 100.2       | 2013       | china      |
| c3          | 100.3       | 2013       | china      |
+-------------+-------------+------------+------------+

Exclude multiple columns (shop_name and customer_id):

SELECT `(shop_name|customer_id)?+.+` FROM sale_detail;

Result:

+-------------+------------+------------+
| total_price | sale_date  | region     |
+-------------+------------+------------+
| 100.1       | 2013       | china      |
| 100.2       | 2013       | china      |
| 100.3       | 2013       | china      |
+-------------+------------+------------+

Exclude all columns whose names start with t:

SELECT `(t.*)?+.+` FROM sale_detail;

Result:

+------------+-------------+------------+------------+
| shop_name  | customer_id | sale_date  | region     |
+------------+-------------+------------+------------+
| s1         | c1          | 2013       | china      |
| s2         | c2          | 2013       | china      |
| s3         | c3          | 2013       | china      |
+------------+-------------+------------+------------+
When excluding multiple columns where one name is a prefix of another, put the longer name first. For example, to exclude both ds and dshh, use ` (dshhds)?+.+ — not (dsdshh)?+.+ `.

Use DISTINCT and ALL

DISTINCT removes duplicate rows. ALL (the default) returns all rows including duplicates.

Return distinct values of region:

SELECT DISTINCT region FROM sale_detail;

Result:

+------------+
| region     |
+------------+
| china      |
+------------+

When DISTINCT is applied to multiple columns, it deduplicates based on the combination of all specified columns:

SELECT DISTINCT region, sale_date FROM sale_detail;

Result:

+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
+------------+------------+

DISTINCT can be used with window functions to deduplicate the computed results:

SET odps.sql.allow.fullscan=true;
SELECT DISTINCT sale_date, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total_price) AS rn FROM sale_detail;

Result:

+-----------+------------+
| sale_date | rn         |
+-----------+------------+
| 2013      | 1          |
+-----------+------------+
DISTINCT cannot be used with GROUP BY. The following statement returns an error:
SELECT DISTINCT shop_name FROM sale_detail GROUP BY shop_name;
-- Error: GROUP BY cannot be used with SELECT DISTINCT

Column exclusion (EXCEPT)

EXCEPT (optional) excludes specific columns when reading all other columns. Format: EXCEPT(col1_name, col2_name, ...).

Read all columns except region:

SELECT * EXCEPT(region) FROM sale_detail;

Result:

+-----------+-------------+-------------+-----------+
| shop_name | customer_id | total_price | sale_date |
+-----------+-------------+-------------+-----------+
| s1        | c1          | 100.1       | 2013      |
| s2        | c2          | 100.2       | 2013      |
| s3        | c3          | 100.3       | 2013      |
+-----------+-------------+-------------+-----------+

Column modification (REPLACE)

REPLACE (optional) replaces specified columns with expression results while reading all other columns unchanged. Format: REPLACE(exp1 [AS] col1_name, exp2 [AS] col2_name, ...).

Read all columns, replacing total_price with total_price + 100 and region with a fixed string:

SELECT * REPLACE(total_price+100 AS total_price, 'shanghai' AS region) FROM sale_detail;

Result:

+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 200.1       | 2013      | shanghai |
| s2        | c2          | 200.2       | 2013      | shanghai |
| s3        | c3          | 200.3       | 2013      | shanghai |
+-----------+-------------+-------------+-----------+--------+

Table reference (TABLE_reference)

TABLE_reference (required) specifies the table to query.

Query a table by name:

SELECT customer_id FROM sale_detail;

Result:

+-------------+
| customer_id |
+-------------+
| c1          |
| c2          |
| c3          |
+-------------+

Use a nested subquery as the table source:

SELECT * FROM (SELECT region,sale_date FROM sale_detail) t WHERE region = 'china';

Result:

+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
| china      | 2013       |
| china      | 2013       |
+------------+------------+

WHERE clause

WHERE (optional) filters rows. For partitioned tables, specifying partition columns in WHERE enables partition pruning, which reduces the amount of data scanned.

Supported relational operators:

  • >, <, =, >=, <=, <>

  • LIKE, RLIKE

  • IN, NOT IN

  • BETWEEN...AND

For more information, see Relational operators.

Filter by partition range to avoid a full table scan:

SELECT *
FROM sale_detail
WHERE sale_date >= '2008' AND sale_date <= '2014';
-- Equivalent to:
SELECT *
FROM sale_detail
WHERE sale_date BETWEEN '2008' AND '2014';

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+
Use the EXPLAIN statement to verify that partition pruning is working. UDFs or JOIN conditions may prevent pruning. For details, see Check whether partition pruning is effective.

Column aliases in WHERE

A column alias defined using a function cannot be referenced in WHERE. The following statement returns an error:

SELECT  task_name
        ,inst_id
        ,settings
        ,GET_JSON_OBJECT(settings, '$.SKYNET_ID') AS skynet_id
        ,GET_JSON_OBJECT(settings, '$.SKYNET_NODENAME') AS user_agent
FROM    Information_Schema.TASKS_HISTORY
WHERE   ds = '20211215' AND skynet_id IS NOT NULL
LIMIT 10;

UDF-based partition pruning

When you use UDFs in WHERE clauses, MaxCompute can execute the UDFs as small jobs to determine partition ranges. Two methods are available:

Method 1: Annotate the UDF class

Add the following annotation to your UDF class:

@com.aliyun.odps.udf.annotation.UdfProperty(isDeterministic=true)
The com.aliyun.odps.udf.annotation.UdfProperty annotation requires odps-sdk-udf version 0.30.X or later.

Method 2: Use a SET command

SET odps.sql.udf.ppr.deterministic = true;

This treats all UDFs in the statement as deterministic. Up to 1,000 partitions can be backfilled. To suppress errors when more than 1,000 partitions are affected (and disable UDF-based pruning):

SET odps.sql.udf.ppr.to.subquery = false;

Placement requirement: The UDF must appear in the WHERE clause of the source table query. Placing it in a JOIN ON condition does not trigger pruning.

Correct:

-- UDF in the WHERE clause of the source table query.
SELECT key, value FROM srcp WHERE udf(ds) = 'xx';

Incorrect (UDF in JOIN ON — pruning does not apply):

SELECT A.c1, A.c2 FROM srcp1 A JOIN srcp2 B ON A.c1 = B.c1 AND udf(A.ds) ='xx';

GROUP BY

GROUP BY (optional) groups rows by specified columns or expressions, typically used with aggregate functions.

Rules:

  • GROUP BY is evaluated before SELECT. Column references in GROUP BY can be input table column names, expressions, or output column aliases from SELECT.

  • All non-aggregate columns in SELECT must appear in GROUP BY.

  • If columns in GROUP BY are specified by a regular expression, use the complete expression.

Group by column name:

SELECT region FROM sale_detail GROUP BY region;

Result:

+------------+
| region     |
+------------+
| china      |
+------------+

Group by column name with aggregate:

SELECT region, SUM(total_price) FROM sale_detail GROUP BY region;

Result:

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
+------------+------------+

Group by output column alias:

SELECT region AS r FROM sale_detail GROUP BY r;
-- Equivalent to:
SELECT region AS r FROM sale_detail GROUP BY region;

Result:

+------------+
| r          |
+------------+
| china      |
+------------+

Group by column expression:

SELECT 2 + total_price AS r FROM sale_detail GROUP BY 2 + total_price;

Result:

+------------+
| r          |
+------------+
| 102.1      |
| 102.2      |
| 102.3      |
+------------+

Non-aggregate columns in SELECT must all be in GROUP BY (incorrect vs. correct):

-- Incorrect: total_price is not in GROUP BY.
SELECT region, total_price FROM sale_detail GROUP BY region;

-- Correct:
SELECT region, total_price FROM sale_detail GROUP BY region, total_price;

Result of the correct statement:

+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
+------------+-------------+

Position alias in GROUP BY

Run SET odps.sql.groupby.position.alias=true; (or SET hive.groupby.position.alias=true;) before the SELECT statement to treat integer constants in GROUP BY as column positions in SELECT:

SET odps.sql.groupby.position.alias=true;
-- 1 refers to the first column (region) in the SELECT list.
SELECT region, SUM(total_price) FROM sale_detail GROUP BY 1;

Result:

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
+------------+------------+

HAVING clause

HAVING (optional) filters grouped results using aggregate functions.

-- Insert additional data to demonstrate HAVING filtering.
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);

-- Return only groups where total sales are less than 305.
SELECT region, SUM(total_price) FROM sale_detail
GROUP BY region
HAVING SUM(total_price)<305;

Result:

+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
| shanghai   | 200.9      |
+------------+------------+

ORDER BY

ORDER BY (optional) sorts all rows by the specified column or constant.

  • Default order: ascending. Add DESC for descending.

  • ORDER BY must be followed by LIMIT <number> by default. See LIMIT to remove this requirement.

  • Columns in ORDER BY must reference output column aliases from SELECT. If no alias is specified, the column name is used.

  • ORDER BY cannot be used with DISTRIBUTE BY or SORT BY simultaneously.

Sort ascending, return the first 2 rows:

SELECT * FROM sale_detail ORDER BY total_price LIMIT 2;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Sort descending, return the first 2 rows:

SELECT * FROM sale_detail ORDER BY total_price DESC LIMIT 2;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s3         | c3          | 100.3       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Use an output column alias in ORDER BY:

SELECT total_price AS t FROM sale_detail ORDER BY total_price LIMIT 3;
-- Equivalent to:
SELECT total_price AS t FROM sale_detail ORDER BY t LIMIT 3;

Result:

+------------+
| t          |
+------------+
| 100.1      |
| 100.2      |
| 100.3      |
+------------+

NULL sort behavior

NULL is treated as the smallest value in ORDER BY — the same behavior as MySQL, not Oracle. As a result:

  • Ascending order (ASC): NULL rows appear first.

  • Descending order (DESC): NULL rows appear last.

OFFSET

Skip rows with ORDER BY...LIMIT m OFFSET n. LIMIT m returns m rows; OFFSET n skips the first n rows. The shorthand LIMIT n, m is equivalent.

Return 3 rows starting from the 3rd row (skip the first 2):

SELECT customer_id, total_price FROM sale_detail ORDER BY total_price LIMIT 3 OFFSET 2;
-- Equivalent to:
SELECT customer_id, total_price FROM sale_detail ORDER BY total_price LIMIT 2, 3;

Result:

+-------------+-------------+
| customer_id | total_price |
+-------------+-------------+
| c3          | 100.3       |
+-------------+-------------+

The table has only 3 rows, so skipping 2 returns only the 3rd row.

Position alias in ORDER BY

Run SET odps.sql.orderby.position.alias=true; (or SET hive.orderby.position.alias=true;) before the SELECT statement to treat integer constants in ORDER BY as column positions:

SET odps.sql.orderby.position.alias=true;
-- 3 refers to the third column (total_price) in the SELECT list.
SELECT * FROM sale_detail ORDER BY 3 LIMIT 3;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Global sorting acceleration

In standard ORDER BY queries, all data is routed to a single instance for sorting, which limits parallelism. Range clustering enables concurrent global sorting by sampling data, dividing it into ranges, sorting each range in parallel, and combining the results. For details, see Global sorting acceleration.

DISTRIBUTE BY

DISTRIBUTE BY (optional) performs hash partitioning on data based on specified column values. It controls how mapper output is distributed among reducers, ensuring that rows with the same column values go to the same reducer.

Column references in DISTRIBUTE BY must be output column aliases from SELECT. If no alias is specified, the column name is used as the alias.

Hash-partition by region:

-- All three statements below are equivalent.
SELECT region FROM sale_detail DISTRIBUTE BY region;
SELECT region AS r FROM sale_detail DISTRIBUTE BY region;
SELECT region AS r FROM sale_detail DISTRIBUTE BY r;

SORT BY

SORT BY (optional) is typically used with DISTRIBUTE BY.

  • Default order: ascending. Add DESC for descending.

  • SORT BY cannot be used with GROUP BY simultaneously.

With DISTRIBUTE BY: Sorts rows within each partition produced by DISTRIBUTE BY.

Hash-partition by region, then sort by total_price ascending within each partition:

-- Insert data for the shanghai partition.
INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
SELECT region, total_price FROM sale_detail DISTRIBUTE BY region SORT BY total_price;

Result:

+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | NULL        |
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
| shanghai   | 100.4       |
| shanghai   | 100.5       |
+------------+-------------+

Sort descending within each partition:

SELECT region, total_price FROM sale_detail DISTRIBUTE BY region SORT BY total_price DESC;

Result:

+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | NULL        |
+------------+-------------+

Without DISTRIBUTE BY: Sorts data within each reducer independently. This improves output compression and reduces data read during downstream sorting:

SELECT region, total_price FROM sale_detail SORT BY total_price DESC;

Result:

+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| shanghai   | NULL        |
+------------+-------------+

Usage notes

  • Columns in ORDER BY, DISTRIBUTE BY, and SORT BY must use output column aliases from SELECT. Column aliases can be Chinese characters.

  • ORDER BY, DISTRIBUTE BY, and SORT BY are evaluated after SELECT, so they must reference SELECT output aliases.

  • ORDER BY cannot be used simultaneously with DISTRIBUTE BY or SORT BY.

  • GROUP BY cannot be used simultaneously with DISTRIBUTE BY or SORT BY.

LIMIT \<number\>

LIMIT <number> (optional) restricts the number of rows returned. The value must be a 32-bit integer with a maximum of 2,147,483,647.

LIMIT scans and filters data in the distributed query engine but does not reduce computing costs.

Syntax

LIMIT <number>
LIMIT <offset>, <count>        -- shorthand for LIMIT count OFFSET offset
LIMIT <count> OFFSET <offset>  -- skip offset rows, then return count rows

For OFFSET examples, see OFFSET in the ORDER BY section.

ORDER BY with LIMIT

ORDER BY requires LIMIT by default to prevent a single node from sorting large datasets. To remove this requirement:

  • Project level: Run SETPROJECT odps.sql.validate.orderby.limit=false;

  • Session level: Run SET odps.sql.validate.orderby.limit=false; together with the query

Removing this limit may consume significantly more resources and time when a single node has large amounts of data to sort.

Maximum rows displayed

When no LIMIT is specified, or when LIMIT exceeds the display maximum, the number of rows shown is capped:

  • Project data protection disabled: Controlled by instance_tunnel_max_record in the odpscmd config.ini file (maximum: 10,000). Set use_instance_tunnel=true in config.ini. If instance_tunnel_max_record is not configured, no row limit applies.

  • Project data protection enabled: Controlled by the READ_TABLE_MAX_ROW parameter (maximum: 10,000).

Check whether project data protection is enabled:

SHOW SecurityConfiguration;

Disable project data protection (default: disabled):

SET ProjectProtection=false;

For more information, see Project data protection.

Window clause

For window clause syntax and usage, see Syntax.