Common Table Expression (CTE) is a temporary named result set that simplifies SQL queries. MaxCompute supports the standard SQL CTE, enhancing the readability and execution efficiency of SQL statements. This topic describes the features, command syntax, and usage examples of CTE.
Features
CTE serves as a temporary result set defined within the scope of a single DML statement. Similar to a derived table, it is not stored as an object and exists only for the duration of the query. CTEs enhance the maintainability and readability of complex SQL statements during development.
CTE is a statement-level clause that begins with WITH, followed by the expression name. It includes two types:
Non-recursive CTE: This type does not involve recursion or self-referential iteration.
Recursive CTE: This type allows for self-referential iteration, enabling SQL to perform recursive queries, typically for traversing hierarchical data.
Support for materialized CTEs: When defining a CTE, you can use the materialize hint in the SELECT statement to cache the CTE's calculation result into a temporary table. This allows subsequent access to the CTE to directly read from the cache, preventing memory over-limit issues in complex CTE nesting scenarios and enhancing CTE statement performance.
Non-recursive CTE
Command syntax
WITH
<cte_name> [(col_name[,col_name]...)] AS
(
<cte_query>
)
[,<cte_name> [(col_name[,col_name]...)] AS
(
<cte_query2>
)
,……]Parameters
Parameter | Required | Description |
cte_name | Yes | The name of the CTE, which must not be the same as any other CTE name in the |
col_name | No | The column name of the CTE output column. |
cte_query | Yes | A |
Usage example
The following code shows an example:
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL ) a
JOIN (
SELECT * FROM src2 WHERE value > 0 ) b
ON a.key = b.key
) c
UNION ALL
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL ) a
LEFT OUTER JOIN (
SELECT * FROM src3 WHERE value > 0 ) b
ON a.key = b.key AND b.key IS NOT NULL
) d;The top-level UNION combines two JOIN operations, with the left table of JOIN being the same query statement. Without CTE, this code would need to be duplicated in subqueries.
Using CTE to rewrite the statement, the command example is as follows:
WITH
a AS (SELECT * FROM src WHERE key IS NOT NULL),
b AS (SELECT * FROM src2 WHERE value > 0),
c AS (SELECT * FROM src3 WHERE value > 0),
d AS (SELECT a.key, b.value FROM a JOIN b ON a.key=b.key),
e AS (SELECT a.key, c.value FROM a LEFT OUTER JOIN c ON a.key=c.key AND c.key IS NOT NULL)
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM d UNION ALL SELECT * FROM e;With the rewrite, the subquery for a is written only once and can be reused. Multiple subqueries can be defined in the CTE's WITH clause and reused throughout the statement, eliminating the need for repeated nesting.
Recursive CTE
Command syntax
WITH RECURSIVE <cte_name>[(col_name[,col_name]...)] AS
(
<initial_part> UNION ALL <recursive_part>
)
SELECT ... FROM ...;Parameters
Parameter | Required | Description |
RECURSIVE | Yes | The recursive CTE clause must start with |
cte_name | Yes | The name of the CTE, which must not be the same as any other CTE name in the current |
col_name | No | The column name of the CTE output column. If the output column name is not explicitly specified, the system also supports automatic inference. |
initial_part | Yes | Used to calculate the initial dataset, it cannot recursively reference cte_name to point to the CTE itself. |
recursive_part | Yes | Used to calculate the results of subsequent iterations, you can recursively reference cte_name to define how to use the previous iteration result to recursively calculate the next iteration result. |
UNION ALL | Yes | initial_part and recursive_part must be connected by UNION ALL. |
Limits
Recursive CTE cannot be used in IN, EXISTS and SCALAR subqueries.
The default limit for the number of recursive operations in a recursive CTE is 10. This limit can be adjusted by setting
odps.sql.rcte.max.iterate.num, with a maximum of 100 iterations.Recursive CTE does not support the recording of intermediate results during iterations. In case of failure, the calculation restarts from the beginning. We recommend that you either control the number of recursions or store intermediate results in a temporary table if the computation is lengthy.
Recursive CTE is not supported in MaxCompute Query Acceleration. If used in MCQA mode, the task will either revert to normal mode execution when
interactive_auto_rerun=trueis set or fail otherwise.
Usage example
Example 1: Define a recursive CTE named cte_name.
-- Method 1: Define a recursive CTE named cte_name, containing two output columns named a and b WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L -- initial_part: the dataset of iteration 0 UNION ALL -- Connect the initial_part and recursive_part of the CTE by using UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) -- recursive_part: where cte_name points to the calculation result of the previous iteration SELECT * FROM cte_name ORDER BY a LIMIT 100; -- Output the recursive CTE result, that is, union all the data of all iterations together -- Method 2: Define a recursive CTE named cte_name without explicitly specifying the output column name WITH RECURSIVE cte_name AS ( SELECT 1L AS a, 1L AS b UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) SELECT * FROM cte_name ORDER BY a LIMIT 100;NoteIn the recursive_part, to prevent an infinite loop, set a termination condition for the iteration, such as
WHERE a + 1 <= 5, which ends the iteration when the dataset generated is empty.If the output column names are not specified, the system can infer them automatically. For example, in Method 2, the output column names from initial_part are used for the recursive CTE.
The following results is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | +------------+------------+Example 2: Recursive CTEs are not allowed in IN, EXISTS and SCALAR subqueries. The following query will fail to compile:
WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) SELECT x, x in (SELECT a FROM cte_name) FROM VALUES (1L), (2L) AS t(x);The following results is returned:
-- Return error, there is an in sub-query on line 5, and the recursive CTE cte_name is referenced in the sub-query FAILED: ODPS-0130071:[5,31] Semantic analysis exception - using Recursive-CTE cte_name in scalar/in/exists sub-query is not allowed, please check your query, the query text location is from [line 5, column 13] to [line 5, column 40]Example 3: Create an employees table to map company employees to their managers and insert data:
CREATE TABLE employees(name STRING, boss_name STRING); INSERT INTO TABLE employees VALUES ('zhang_3', null), ('li_4', 'zhang_3'), ('wang_5', 'zhang_3'), ('zhao_6', 'li_4'), ('qian_7', 'wang_5');Define a recursive CTE named
company_hierarchyto derive the organizational structure from this table. The CTE outputs three fields: the employee's name, their manager's name, and their level within the organizational structure.WITH RECURSIVE company_hierarchy(name, boss_name, level) AS ( SELECT name, boss_name, 0L FROM employees WHERE boss_name IS NULL UNION ALL SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h WHERE e.boss_name = h.name ) SELECT * FROM company_hierarchy ORDER BY level, boss_name, name LIMIT 1000;The initial_part represents records in the employees table where boss_name is null, assigning a level of 0 to these entries.
The fourth line, recursive_part, executes a
JOINoperation between the employees and company_hierarchy tables. The JOIN condition,e.boss_name = h.name, is designed to retrieve records from the employees table that correspond to employees whose managers were identified in the preceding iteration.
The following results is returned:
+------+-----------+------------+ | name | boss_name | level | +------+-----------+---------------+ | zhang_3 | NULL | 0 | | li_4 | zhang_3 | 1 | | wang_5 | zhang_3 | 1 | | zhao_6 | li_4 | 2 | | qian_7 | wang_5 | 2 | +------+-----------+------------+The calculation process is as follows:
Iteration 0: Select initial records from employees where
boss_name IS NULL, yielding one record.'zhang_3', null, 0Iteration 1: Perform the following query by using the results from iteration 0 as company_hierarchy.
SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h WHERE e.boss_name = h.nameThis yields two records with level = 1, both managed by zhang_3.
'li_4', 'zhang_3', 1 'wang_5', 'zhang_3', 1Iteration 2: Similar to iteration 1, but with company_hierarchy corresponding to iteration 1's results. This produces two records with
level = 2, managed by li_4 or wang_5.'zhao_6', 'li_4', 2 'qian_7', 'wang_5', 2Iteration 3: No further employees in the table have zhao_6 or qian_7 as managers, resulting in an empty set and ending the recursion.
Example 4: If an additional record is inserted into the employees table from Example 2, an error will be returned.
INSERT INTO TABLE employees VALUES('qian_7', 'qian_7');This record indicates that qian_7 is his own manager. Running the previously defined recursive CTE again would result in an infinite loop. The system imposes a limit on the maximum number of iterations. For more information, see Limits. This query eventually fail and terminate.
The following results is returned:
-- Return error FAILED: ODPS-0010000:System internal error - recursive-cte: company_hierarchy exceed max iterate number 10
Materialize CTE
Background introduction
For non-recursive CTEs, MaxCompute expands all CTEs when generating the execution plan.
An example is as follows:
WITH
v1 AS (SELECT SIN(1.0) AS a)
SELECT a FROM v1 UNION ALL SELECT a FROM v1;The following results is returned:
+------------+
| a |
+------------+
| 0.8414709848078965 |
| 0.8414709848078965 |
+------------+At the execution level, this is equivalent to the following SQL, where the function sin(1.0) is executed twice.
SELECT a FROM (SELECT SIN(1.0) AS a)
UNION ALL
SELECT a FROM (SELECT SIN(1.0) AS a);In complex scenarios with multilayer nested CTEs, if all CTEs are expanded into the most basic leaf nodes, the result is an enormous syntax tree. This can lead to memory over-limit issues and failure to generate the execution plan. An example is as follows:
WITH
v1 AS (SELECT 1L AS a, 2L AS b, 3L AS c),
v2 AS (SELECT * FROM v1 UNION ALL SELECT * FROM v1 UNION ALL SELECT * FROM v1),
v3 AS (SELECT * FROM v2 UNION ALL SELECT * FROM v2 UNION ALL SELECT * FROM v2),
v4 AS (SELECT * FROM v3 UNION ALL SELECT * FROM v3 UNION ALL SELECT * FROM v3),
v5 AS (SELECT * FROM v4 UNION ALL SELECT * FROM v4 UNION ALL SELECT * FROM v4),
v6 AS (SELECT * FROM v5 UNION ALL SELECT * FROM v5 UNION ALL SELECT * FROM v5),
v7 AS (SELECT * FROM v6 UNION ALL SELECT * FROM v6 UNION ALL SELECT * FROM v6),
v8 AS (SELECT * FROM v7 UNION ALL SELECT * FROM v7 UNION ALL SELECT * FROM v7),
v9 AS (SELECT * FROM v8 UNION ALL SELECT * FROM v8 UNION ALL SELECT * FROM v8)
SELECT * FROM v9;To address this, MaxCompute offers the materialize CTE feature, which caches the CTE calculation results, allowing SQL outside the WITH statement to reference the results without full expansion. This mechanism effectively prevents memory over-limit issues due to nested CTE expansion and improves CTE statement performance.
Usage example
When defining a CTE, you can use the materialize hint /*+ MATERIALIZE */ in the SELECT statement to cache the CTE's calculation result into a temporary table. This allows for direct reading from the cache in subsequent references, without the need for recalculation. An example is as follows:
WITH
v1 AS (SELECT /*+ MATERIALIZE */ SIN(1.0) AS a)
SELECT a FROM v1 UNION ALL SELECT a FROM v1;
-- The following results is returned.
+------------+
| a |
+------------+
| 0.8414709848078965 |
| 0.8414709848078965 |
+------------+With the materialize hint in effect, the Job Details tab in LogView will show that the intermediate result is stored, corresponding to multiple Fuxi Jobs submitted.

Limits
The materialize hint must be applied to the top-level SELECT statement of a non-recursive CTE. It is not applicable to recursive CTEs.
Incorrect example: In the following CTE, the top-level statement is UNION, not SELECT, so the materialize hint is ineffective.
WITH v1 AS (SELECT /*+ MATERIALIZE */ SIN(1.0) AS a UNION ALL SELECT /*+ MATERIALIZE */ SIN(1.0) AS a) SELECT a FROM v1 UNION ALL SELECT a FROM v1;In LogView, only one Fuxi Job is submitted, as illustrated below.

Correct example: By rewriting the incorrect example as a subquery, the issue is resolved.
WITH v1 AS (SELECT /*+ MATERIALIZE */ * FROM (SELECT SIN(1.0) AS a UNION ALL SELECT SIN(1.0) AS a) ) SELECT a FROM v1 UNION ALL SELECT a FROM v1;
If a non-deterministic function, such as RAND or a user-defined non-deterministic Java/Python UDF, is used in the CTE, converting it to a materialized CTE may alter the final result due to the caching mechanism.
Materialized CTEs are not supported in MaxCompute Query Acceleration mode. If used in MCQA mode and
interactive_auto_rerun=trueis set, the task may revert to normal mode for execution. Otherwise, the task will fail.