CTE materialization strategy - Hologres - Alibaba Cloud Documentation Center

A Common Table Expression (CTE) is a feature in SQL that simplifies complex queries and improves readability. When a CTE is used, the optimizer can materialize its result set. This process temporarily stores the result set for reuse in subsequent parts of the query. The CTE reuse strategy determines how the optimizer reuses the CTE to avoid repeated computations and improve query performance.

Set the CTE reuse strategy

In Hologres, the optimizer can automatically determine whether to reuse a CTE based on the SQL execution plan.

In versions earlier than Hologres V3.2, you can control CTE reuse using only the Grand Unified Configuration (GUC) parameter optimizer_cte_inlining.

In Hologres V3.2 and later, you can control CTE reuse using both the optimizer_cte_inlining and hg_cte_strategy GUC parameters.

Parameter descriptions

Parameter

Description

Supported versions

optimizer_cte_inlining

ON (default): Forces the CTE to be inlined.
OFF: Sets the CTE strategy to Reuse.

All versions

hg_cte_strategy

AUTO (default): The optimizer automatically determines whether to reuse the CTE.
INLINING: Forces inlining. The CTE is not reused, and the subquery is recomputed each time the CTE is referenced. This is suitable for scenarios where the CTE computation is simple and the result set is small.
REUSE: Forces reuse. The CTE is computed only once, and the result is cached for reuse. This is suitable for scenarios where the CTE computation is complex, the result set is large, and the CTE is referenced multiple times.

V3.2 and later

Parameter settings

Versions earlier than Hologres V3.2
```
SET optimizer_cte_inlining ={on|off}
```
Hologres V3.2 and later
```
SET optimizer_cte_inlining ={on|off}

SET hg_cte_strategy ={AUTO|INLINING|REUSE};
```
- Usage notes
  - The optimizer_cte_inlining parameter has a higher priority than the hg_cte_strategy parameter.
  - If you set optimizer_cte_inlining = off, the CTE strategy is forced to `REUSE`. In this case, the setting of the `hg_cte_strategy` parameter has no effect. You can only use the default value `AUTO` for `hg_cte_strategy` or explicitly set it to `REUSE`. You cannot explicitly set it to `INLINING`. Otherwise, an error is reported.
  - When optimizer_cte_inlining = on is set (either by default or explicitly), the CTE strategy is not automatically set to `INLINING`. Instead, the strategy is determined by the value of hg_cte_strategy.

Examples

The following examples show how the execution plan changes when you set the hg_cte_strategy parameter to different values.

Prepare sample data.

CREATE TABLE t1 (
    a INT,
    b INT,
    c INT
);

View the execution plans for different hg_cte_strategy settings.

When hg_cte_strategy is set to the default value `AUTO`, the execution plan is as follows. The plan shows that `cte1` is reused, but `cte2` is not.

EXPLAIN WITH cte1 AS (SELECT DISTINCT a,b,c FROM t1),cte2 AS (SELECT a,b,c FROM t1)
SELECT * FROM cte1 
UNION ALL 
SELECT * FROM cte1
UNION ALL
SELECT * FROM cte2
UNION ALL
SELECT * FROM cte2;

The following result is returned.

QUERY PLAN
Gather  (cost=0.00..25.00 rows=4 width=12)
  CTE cte1  (cost=0.00..5.00 rows=1 width=12)
      ->  Forward  (cost=0.00..5.00 rows=1 width=12)
            ->  HashAggregate  (cost=0.00..5.00 rows=1 width=12)
                  Group Key: t1_2.a, t1_2.b, t1_2.c
                  ->  Redistribution  (cost=0.00..5.00 rows=1 width=12)
                        Hash Key: t1_2.a, t1_2.b, t1_2.c
                        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
                              ->  Seq Scan on t1 t1_2  (cost=0.00..5.00 rows=1 width=12)
  ->  Append  (cost=0.00..20.00 rows=4 width=12)
        ->  CTE Scan on cte1  (cost=0.00..5.00 rows=1 width=12)
        ->  CTE Scan on cte1  (cost=0.00..5.00 rows=1 width=12)
        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
              ->  Seq Scan on t1  (cost=0.00..5.00 rows=1 width=12)
        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
              ->  Seq Scan on t1 t1_1  (cost=0.00..5.00 rows=1 width=12)
Query Queue: init_warehouse.default_queue
Optimizer: HQO version 3.2.0

When hg_cte_strategy is set to `INLINING`, the execution plan is as follows. The plan shows that no CTE is reused.

SET hg_cte_strategy = INLINING;
EXPLAIN WITH cte1 AS (SELECT DISTINCT a,b,c FROM t1), 
cte2 AS (SELECT a,b,c FROM t1) 
SELECT * FROM cte1 
UNION ALL 
SELECT * FROM cte1 
UNION ALL 
SELECT * FROM cte2 
UNION ALL 
SELECT * FROM cte2;

The following result is returned.

QUERY PLAN
Gather  (cost=0.00..20.00 rows=4 width=12)
  ->  Append  (cost=0.00..20.00 rows=4 width=12)
        ->  HashAggregate  (cost=0.00..5.00 rows=1 width=12)
              Group Key: t1.a, t1.b, t1.c
              ->  Redistribution  (cost=0.00..5.00 rows=1 width=12)
                    Hash Key: t1.a, t1.b, t1.c
                    ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
                          ->  Seq Scan on t1  (cost=0.00..5.00 rows=1 width=12)
        ->  HashAggregate  (cost=0.00..5.00 rows=1 width=12)
              Group Key: t1_1.a, t1_1.b, t1_1.c
              ->  Redistribution  (cost=0.00..5.00 rows=1 width=12)
                    Hash Key: t1_1.a, t1_1.b, t1_1.c
                    ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
                          ->  Seq Scan on t1 t1_1  (cost=0.00..5.00 rows=1 width=12)
        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
              ->  Seq Scan on t1 t1_2  (cost=0.00..5.00 rows=1 width=12)
        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
              ->  Seq Scan on t1 t1_3  (cost=0.00..5.00 rows=1 width=12)
Query Queue: init_warehouse.default_queue
Optimizer: HQO version 3.2.0

When hg_cte_strategy is set to `REUSE`, the execution plan is as follows. The plan shows that both `cte1` and `cte2` are reused.

SET hg_cte_strategy = REUSE;
EXPLAIN WITH cte1 AS (SELECT DISTINCT a,b,c FROM t1), 
cte2 AS (SELECT a,b,c FROM t1) 
SELECT * FROM cte1 
UNION ALL 
SELECT * FROM cte1 
UNION ALL
SELECT * FROM cte2 
UNION ALL
SELECT * FROM cte2;

The following result is returned.

QUERY PLAN
Gather  (cost=0.00..30.00 rows=4 width=12)
  CTE cte1  (cost=0.00..5.00 rows=1 width=12)
      ->  Forward  (cost=0.00..5.00 rows=1 width=12)
            ->  HashAggregate  (cost=0.00..5.00 rows=1 width=12)
                  Group Key: t1_1.a, t1_1.b, t1_1.c
                  ->  Redistribution  (cost=0.00..5.00 rows=1 width=12)
                        Hash Key: t1_1.a, t1_1.b, t1_1.c
                        ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
                              ->  Seq Scan on t1 t1_1  (cost=0.00..5.00 rows=1 width=12)
  CTE cte2  (cost=0.00..5.00 rows=1 width=12)
      ->  Forward  (cost=0.00..5.00 rows=1 width=12)
            ->  Local Gather  (cost=0.00..5.00 rows=1 width=12)
                  ->  Seq Scan on t1  (cost=0.00..5.00 rows=1 width=12)
  ->  Append  (cost=0.00..20.00 rows=4 width=12)
        ->  CTE Scan on cte1  (cost=0.00..5.00 rows=1 width=12)
        ->  CTE Scan on cte1  (cost=0.00..5.00 rows=1 width=12)
        ->  CTE Scan on cte2  (cost=0.00..5.00 rows=1 width=12)
        ->  CTE Scan on cte2  (cost=0.00..5.00 rows=1 width=12)
Query Queue: init_warehouse.default_queue
Optimizer: HQO version 3.2.0