Common Table Expression (CTE) is a temporary named result set that simplifies SQL queries. MaxCompute supports the standard SQL CTE, enhancing the readability and execution efficiency of SQL statements. This topic describes the features, command syntax, and usage examples of CTE.
Features
CTE serves as a temporary result set defined within the scope of a single DML statement. Similar to a derived table, it is not stored as an object and exists only for the duration of the query. CTEs enhance the maintainability and readability of complex SQL statements during development.
CTE is a statement-level clause that begins with WITH, followed by the expression name. It includes two types:
Non-recursive CTE: This type does not involve recursion or self-referential iteration.
Recursive CTE: This type allows for self-referential iteration, enabling SQL to perform recursive queries, typically for traversing hierarchical data.
Non-recursive CTE
Command syntax
WITH
<cte_name> [(col_name[,col_name]...)] AS
(
<cte_query>
)
[,<cte_name> [(col_name[,col_name]...)] AS
(
<cte_query2>
)
,……]
Parameters
Parameter | Required | Description |
cte_name | Yes | The name of the CTE, which must not be the same as any other CTE name in the |
col_name | No | The column name of the CTE output column. |
cte_query | Yes | A |
Usage example
The following code shows an example:
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL ) a
JOIN (
SELECT * FROM src2 WHERE value > 0 ) b
ON a.key = b.key
) c
UNION ALL
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL ) a
LEFT OUTER JOIN (
SELECT * FROM src3 WHERE value > 0 ) b
ON a.key = b.key AND b.key IS NOT NULL
) d;
The top-level UNION
combines two JOIN
operations, with the left table of JOIN
being the same query statement. Without CTE, this code would need to be duplicated in subqueries.
Using CTE to rewrite the statement, the command example is as follows:
WITH
a AS (SELECT * FROM src WHERE key IS NOT NULL),
b AS (SELECT * FROM src2 WHERE value > 0),
c AS (SELECT * FROM src3 WHERE value > 0),
d AS (SELECT a.key, b.value FROM a JOIN b ON a.key=b.key),
e AS (SELECT a.key, c.value FROM a LEFT OUTER JOIN c ON a.key=c.key AND c.key IS NOT NULL)
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM d UNION ALL SELECT * FROM e;
With the rewrite, the subquery for a
is written only once and can be reused. Multiple subqueries can be defined in the CTE's WITH
clause and reused throughout the statement, eliminating the need for repeated nesting.
Recursive CTE
Command syntax
WITH RECURSIVE <cte_name>[(col_name[,col_name]...)] AS
(
<initial_part> UNION ALL <recursive_part>
)
SELECT ... FROM ...;
Parameters
Parameter | Required | Description |
RECURSIVE | Yes | The recursive CTE clause must start with |
cte_name | Yes | The name of the CTE, which must not be the same as any other CTE name in the current |
col_name | No | The column name of the CTE output column. If the output column name is not explicitly specified, the system also supports automatic inference. |
initial_part | Yes | Used to calculate the initial dataset, it cannot recursively reference cte_name to point to the CTE itself. |
recursive_part | Yes | Used to calculate the results of subsequent iterations, you can recursively reference cte_name to define how to use the previous iteration result to recursively calculate the next iteration result. |
UNION ALL | Yes | initial_part and recursive_part must be connected by UNION ALL. |
Limits
Recursive CTE cannot be used in IN, EXISTS and SCALAR subqueries.
The default limit for the number of recursive operations in a recursive CTE is 10. This limit can be adjusted by setting
odps.sql.rcte.max.iterate.num
, with a maximum of 100 iterations.Recursive CTE does not support the recording of intermediate results during iterations. In case of failure, the calculation restarts from the beginning. We recommend that you either control the number of recursions or store intermediate results in a temporary table if the computation is lengthy.
Recursive CTE is not supported in MaxCompute Query Acceleration. If used in MCQA mode, the task will either revert to normal mode execution when
interactive_auto_rerun=true
is set or fail otherwise.
Usage example
Example 1: Define a recursive CTE named cte_name.
-- Method 1: Define a recursive CTE named cte_name, containing two output columns named a and b WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L -- initial_part: the dataset of iteration 0 UNION ALL -- Connect the initial_part and recursive_part of the CTE by using UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) -- recursive_part: where cte_name points to the calculation result of the previous iteration SELECT * FROM cte_name ORDER BY a LIMIT 100; -- Output the recursive CTE result, that is, union all the data of all iterations together -- Method 2: Define a recursive CTE named cte_name without explicitly specifying the output column name WITH RECURSIVE cte_name AS ( SELECT 1L AS a, 1L AS b UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) SELECT * FROM cte_name ORDER BY a LIMIT 100;
NoteIn the recursive_part, to prevent an infinite loop, set a termination condition for the iteration, such as
WHERE a + 1 <= 5
, which ends the iteration when the dataset generated is empty.If the output column names are not specified, the system can infer them automatically. For example, in Method 2, the output column names from initial_part are used for the recursive CTE.
The following results is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | +------------+------------+
Example 2: Recursive CTEs are not allowed in IN, EXISTS and SCALAR subqueries. The following query will fail to compile:
WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) SELECT x, x in (SELECT a FROM cte_name) FROM VALUES (1L), (2L) AS t(x);
The following results is returned:
-- Return error, there is an in sub-query on line 5, and the recursive CTE cte_name is referenced in the sub-query FAILED: ODPS-0130071:[5,31] Semantic analysis exception - using Recursive-CTE cte_name in scalar/in/exists sub-query is not allowed, please check your query, the query text location is from [line 5, column 13] to [line 5, column 40]
Example 3: Create an employees table to map company employees to their managers and insert data:
CREATE TABLE employees(name STRING, boss_name STRING); INSERT INTO TABLE employees VALUES ('zhang_3', null), ('li_4', 'zhang_3'), ('wang_5', 'zhang_3'), ('zhao_6', 'li_4'), ('qian_7', 'wang_5');
Define a recursive CTE named
company_hierarchy
to derive the organizational structure from this table. The CTE outputs three fields: the employee's name, their manager's name, and their level within the organizational structure.WITH RECURSIVE company_hierarchy(name, boss_name, level) AS ( SELECT name, boss_name, 0L FROM employees WHERE boss_name IS NULL UNION ALL SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h WHERE e.boss_name = h.name ) SELECT * FROM company_hierarchy ORDER BY level, boss_name, name LIMIT 1000;
The initial_part represents records in the employees table where boss_name is null, assigning a level of 0 to these entries.
The fourth line, recursive_part, executes a
JOIN
operation between the employees and company_hierarchy tables. The JOIN condition,e.boss_name = h.name
, is designed to retrieve records from the employees table that correspond to employees whose managers were identified in the preceding iteration.
The following results is returned:
+------+-----------+------------+ | name | boss_name | level | +------+-----------+---------------+ | zhang_3 | NULL | 0 | | li_4 | zhang_3 | 1 | | wang_5 | zhang_3 | 1 | | zhao_6 | li_4 | 2 | | qian_7 | wang_5 | 2 | +------+-----------+------------+
The calculation process is as follows:
Iteration 0: Select initial records from employees where
boss_name IS NULL
, yielding one record.'zhang_3', null, 0
Iteration 1: Perform the following query by using the results from iteration 0 as company_hierarchy.
SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h WHERE e.boss_name = h.name
This yields two records with level = 1, both managed by zhang_3.
'li_4', 'zhang_3', 1 'wang_5', 'zhang_3', 1
Iteration 2: Similar to iteration 1, but with company_hierarchy corresponding to iteration 1's results. This produces two records with
level = 2
, managed by li_4 or wang_5.'zhao_6', 'li_4', 2 'qian_7', 'wang_5', 2
Iteration 3: No further employees in the table have zhao_6 or qian_7 as managers, resulting in an empty set and ending the recursion.
Example 4: If an additional record is inserted into the employees table from Example 2, an error will be returned.
INSERT INTO TABLE employees VALUES('qian_7', 'qian_7');
This record indicates that qian_7 is his own manager. Running the previously defined recursive CTE again would result in an infinite loop. The system imposes a limit on the maximum number of iterations. For more information, see Limits. This query eventually fail and terminate.
The following results is returned:
-- Return error FAILED: ODPS-0010000:System internal error - recursive-cte: company_hierarchy exceed max iterate number 10