All Products
Search
Document Center

MaxCompute:Common table expression (CTE)

Last Updated:Jan 27, 2025

Common Table Expression (CTE) is a temporary named result set that simplifies SQL queries. MaxCompute supports the standard SQL CTE, enhancing the readability and execution efficiency of SQL statements. This topic describes the features, command syntax, and usage examples of CTE.

Features

  • CTE serves as a temporary result set defined within the scope of a single DML statement. Similar to a derived table, it is not stored as an object and exists only for the duration of the query. CTEs enhance the maintainability and readability of complex SQL statements during development.

  • CTE is a statement-level clause that begins with WITH, followed by the expression name. It includes two types:

    • Non-recursive CTE: This type does not involve recursion or self-referential iteration.

    • Recursive CTE: This type allows for self-referential iteration, enabling SQL to perform recursive queries, typically for traversing hierarchical data.

Non-recursive CTE

Command syntax

WITH 
     <cte_name> [(col_name[,col_name]...)] AS
     (
        <cte_query>
     )
    [,<cte_name> [(col_name[,col_name]...)] AS
     (
     <cte_query2>
     )
    ,……]

Parameters

Parameter

Required

Description

cte_name

Yes

The name of the CTE, which must not be the same as any other CTE name in the WITH clause. Any place in the query that uses the cte_name identifier refers to the CTE.

col_name

No

The column name of the CTE output column.

cte_query

Yes

A SELECT statement. The result set of SELECT is used to fill the CTE.

Usage example

The following code shows an example:

INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM (
    SELECT a.key, b.value
    FROM (
        SELECT * FROM src WHERE key IS NOT NULL ) a
    JOIN (
        SELECT * FROM src2 WHERE value > 0 ) b
    ON a.key = b.key
) c
UNION ALL 
SELECT * FROM (
    SELECT a.key, b.value
    FROM (
        SELECT * FROM src WHERE key IS NOT NULL ) a
    LEFT OUTER JOIN (
        SELECT * FROM src3 WHERE value > 0 ) b
    ON a.key = b.key AND b.key IS NOT NULL 
) d;

The top-level UNION combines two JOIN operations, with the left table of JOIN being the same query statement. Without CTE, this code would need to be duplicated in subqueries.

Using CTE to rewrite the statement, the command example is as follows:

WITH 
  a AS (SELECT * FROM src WHERE key IS NOT NULL),
  b AS (SELECT * FROM src2 WHERE value > 0),
  c AS (SELECT * FROM src3 WHERE value > 0),
  d AS (SELECT a.key, b.value FROM a JOIN b ON a.key=b.key),
  e AS (SELECT a.key, c.value FROM a LEFT OUTER JOIN c ON a.key=c.key AND c.key IS NOT NULL)
INSERT OVERWRITE TABLE srcp PARTITION (p='abc') 
SELECT * FROM d UNION ALL SELECT * FROM e;

With the rewrite, the subquery for a is written only once and can be reused. Multiple subqueries can be defined in the CTE's WITH clause and reused throughout the statement, eliminating the need for repeated nesting.

Recursive CTE

Command syntax

WITH RECURSIVE <cte_name>[(col_name[,col_name]...)] AS
(
  <initial_part> UNION ALL <recursive_part>
)
SELECT ... FROM ...;

Parameters

Parameter

Required

Description

RECURSIVE

Yes

The recursive CTE clause must start with WITH RECURSIVE.

cte_name

Yes

The name of the CTE, which must not be the same as any other CTE name in the current WITH clause. Any place in the query that uses the cte_name identifier refers to the CTE.

col_name

No

The column name of the CTE output column. If the output column name is not explicitly specified, the system also supports automatic inference.

initial_part

Yes

Used to calculate the initial dataset, it cannot recursively reference cte_name to point to the CTE itself.

recursive_part

Yes

Used to calculate the results of subsequent iterations, you can recursively reference cte_name to define how to use the previous iteration result to recursively calculate the next iteration result.

UNION ALL

Yes

initial_part and recursive_part must be connected by UNION ALL.

Limits

  • Recursive CTE cannot be used in IN, EXISTS and SCALAR subqueries.

  • The default limit for the number of recursive operations in a recursive CTE is 10. This limit can be adjusted by setting odps.sql.rcte.max.iterate.num, with a maximum of 100 iterations.

  • Recursive CTE does not support the recording of intermediate results during iterations. In case of failure, the calculation restarts from the beginning. We recommend that you either control the number of recursions or store intermediate results in a temporary table if the computation is lengthy.

  • Recursive CTE is not supported in MaxCompute Query Acceleration. If used in MCQA mode, the task will either revert to normal mode execution when interactive_auto_rerun=true is set or fail otherwise.

Usage example

  • Example 1: Define a recursive CTE named cte_name.

    -- Method 1: Define a recursive CTE named cte_name, containing two output columns named a and b
    WITH RECURSIVE cte_name(a, b) AS (       
      SELECT 1L, 1L                                 -- initial_part: the dataset of iteration 0
      UNION ALL                                      -- Connect the initial_part and recursive_part of the CTE by using UNION ALL
      SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) -- recursive_part: where cte_name points to the calculation result of the previous iteration
    SELECT * FROM cte_name ORDER BY a LIMIT 100;    -- Output the recursive CTE result, that is, union all the data of all iterations together
    
    -- Method 2: Define a recursive CTE named cte_name without explicitly specifying the output column name
    WITH RECURSIVE cte_name AS (       
      SELECT 1L AS a, 1L AS b                            
      UNION ALL                                      
      SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) 
    SELECT * FROM cte_name ORDER BY a LIMIT 100;    
    Note
    • In the recursive_part, to prevent an infinite loop, set a termination condition for the iteration, such as WHERE a + 1 <= 5, which ends the iteration when the dataset generated is empty.

    • If the output column names are not specified, the system can infer them automatically. For example, in Method 2, the output column names from initial_part are used for the recursive CTE.

    The following results is returned:

    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 1          |
    | 2          | 2          |
    | 3          | 3          |
    | 4          | 4          |
    | 5          | 5          |
    +------------+------------+
  • Example 2: Recursive CTEs are not allowed in IN, EXISTS and SCALAR subqueries. The following query will fail to compile:

    WITH RECURSIVE cte_name(a, b) AS (
      SELECT 1L, 1L
      UNION ALL
      SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5)
    SELECT x, x in (SELECT a FROM cte_name) FROM VALUES (1L), (2L) AS t(x);

    The following results is returned:

    -- Return error, there is an in sub-query on line 5, and the recursive CTE cte_name is referenced in the sub-query
    FAILED: ODPS-0130071:[5,31] Semantic analysis exception - using Recursive-CTE cte_name in scalar/in/exists sub-query is not allowed, please check your query, the query text location is from [line 5, column 13] to [line 5, column 40]
  • Example 3: Create an employees table to map company employees to their managers and insert data:

    CREATE TABLE employees(name STRING, boss_name STRING);
    INSERT INTO TABLE employees VALUES 
      ('zhang_3',   null),
      ('li_4',      'zhang_3'),
      ('wang_5',    'zhang_3'),
      ('zhao_6',    'li_4'),
      ('qian_7',    'wang_5');

    Define a recursive CTE named company_hierarchy to derive the organizational structure from this table. The CTE outputs three fields: the employee's name, their manager's name, and their level within the organizational structure.

    WITH RECURSIVE company_hierarchy(name, boss_name, level) AS (
        SELECT name, boss_name, 0L FROM employees WHERE boss_name IS NULL 
        UNION ALL 
        SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h WHERE e.boss_name = h.name
      )
    SELECT * FROM company_hierarchy ORDER BY level, boss_name, name LIMIT 1000;
    • The initial_part represents records in the employees table where boss_name is null, assigning a level of 0 to these entries.

    • The fourth line, recursive_part, executes a JOIN operation between the employees and company_hierarchy tables. The JOIN condition, e.boss_name = h.name, is designed to retrieve records from the employees table that correspond to employees whose managers were identified in the preceding iteration.

    The following results is returned:

    +------+-----------+------------+
    | name | boss_name | level      |
    +------+-----------+---------------+
    | zhang_3 | NULL      | 0          |
    | li_4 | zhang_3   | 1          |
    | wang_5 | zhang_3   | 1          |
    | zhao_6 | li_4      | 2          |
    | qian_7 | wang_5    | 2          |
    +------+-----------+------------+

    The calculation process is as follows:

    • Iteration 0: Select initial records from employees where boss_name IS NULL, yielding one record.

      'zhang_3',   null,  0
    • Iteration 1: Perform the following query by using the results from iteration 0 as company_hierarchy.

      SELECT e.name, e.boss_name, h.level+1 FROM employees e JOIN company_hierarchy h  WHERE e.boss_name = h.name

      This yields two records with level = 1, both managed by zhang_3.

      'li_4',      'zhang_3', 1
      'wang_5',    'zhang_3',  1
    • Iteration 2: Similar to iteration 1, but with company_hierarchy corresponding to iteration 1's results. This produces two records with level = 2, managed by li_4 or wang_5.

      'zhao_6',    'li_4',   2
      'qian_7',    'wang_5', 2
    • Iteration 3: No further employees in the table have zhao_6 or qian_7 as managers, resulting in an empty set and ending the recursion.

  • Example 4: If an additional record is inserted into the employees table from Example 2, an error will be returned.

    INSERT INTO TABLE employees VALUES('qian_7', 'qian_7');

    This record indicates that qian_7 is his own manager. Running the previously defined recursive CTE again would result in an infinite loop. The system imposes a limit on the maximum number of iterations. For more information, see Limits. This query eventually fail and terminate.

    The following results is returned:

    -- Return error
    FAILED: ODPS-0010000:System internal error - recursive-cte: company_hierarchy exceed max iterate number 10