Column pruning optimization for derived tables and views - PolarDB

When a query selects only a few columns from a derived table or view with many columns, performance can decrease. This occurs because the query reads and processes unnecessary data. This issue is common when you work with wide tables or reuse large, general-purpose views. The column pruning feature of PolarDB for MySQL automatically optimizes these queries. During the query parsing phase, the optimizer identifies and removes columns from the derived table or view that are not used in the final query. This significantly reduces the amount of data scanned, network transfer, and memory consumption. This also lowers query latency and improves system throughput.

Applicability

Product series: Cluster Edition, Standard Edition.
Kernel version: MySQL 8.0.2 with revision version 8.0.2.2.31.1 or later.

Enable column pruning

You can control the behavior of this optimization feature by setting the loose_derived_table_pruning_mode parameter.

The method for modifying PolarDB cluster parameters differs between the console and a database session. The differences are as follows:

In the PolarDB console
- Compatibility: Some cluster parameters in the PolarDB console have the loose_ prefix for compatibility with MySQL configuration files.
- Procedure: Find and modify the parameters that have the loose_ prefix.
In a database session (using the command line or a client)
- Procedure: When you connect to the database and use the SET command to modify a parameter, remove the loose_ prefix and use the original parameter name.

Parameter

Level

Description

loose_derived_table_pruning_mode

Global/Session

The main switch for this feature. Valid values are:

REPLICA_ON (default): Enables this feature only on read-only (RO) nodes.
ON: Enables the feature.
OFF: Disables the feature.

How it works and examples

The core of column pruning optimization is query rewrite. During the logical optimization phase of a query, the optimizer analyzes the entire search statement. It identifies columns in a derived table or view that are not referenced by the final SELECT list, WHERE clause, or JOIN clause. If a column is redundant, the optimizer rewrites the inner query. This avoids reading and processing the column's data at the source, which reduces I/O and CPU consumption.

Example 1: Optimize a derived table query

This example shows how column pruning optimizes a query on a derived table.

-- Before optimization
SELECT c1 FROM (SELECT c1 FROM (SELECT c1, c2 FROM t1 GROUP BY c1, c2) v) t;
-- After optimization
SELECT c1 FROM (SELECT c1 FROM t1 GROUP BY c1, c2) v

Prepare test data: Create a table named t1 with multiple columns in your database.

CREATE TABLE t1 (id INT PRIMARY KEY, c1 INT, c2 INT, c3 VARCHAR(100));
INSERT INTO t1 VALUES (1, 10, 100, 'data'), (2, 20, 200, 'data');

Run a query with optimization disabled: First, disable the column pruning feature in the session to see the execution plan before optimization.

SET derived_table_pruning_mode = 'OFF';
EXPLAIN SELECT c1 FROM (SELECT c1 FROM (SELECT c1, c2 FROM t1 GROUP BY c1, c2) v) t;

SHOW warnings;

+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                         |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `v`.`c1` AS `c1` from (/* select#3 */ select `testdb`.`t1`.`c1` AS `c1`,`testdb`.`t1`.`c2` AS `c2` from `testdb`.`t1` group by `testdb`.`t1`.`c1`,`testdb`.`t1`.`c2`) `v` |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Run a query with optimization enabled: Now, enable the column pruning feature in the session and run the same query.

SET derived_table_pruning_mode = 'ON';
EXPLAIN SELECT c1 FROM (SELECT c1 FROM (SELECT c1, c2 FROM t1 GROUP BY c1, c2) v) t;

SHOW warnings;

+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                              |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `v`.`c1` AS `c1` from (/* select#3 */ select `testdb`.`t1`.`c1` AS `c1` from `testdb`.`t1` group by `testdb`.`t1`.`c1`,`testdb`.`t1`.`c2`) `v` |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Interpret the execution plan: The execution plan changes significantly after optimization. The optimizer detects that the outer query only needs the c1 column, so it rewrites the query.

Example 2: Optimize a view query

Column pruning for views is similar to that for derived tables. It avoids unnecessary calculations.

CREATE VIEW v1 AS SELECT COUNT(*) AS a, (SELECT a FROM t2 WHERE a=FLOOR(COUNT(t1.a)/2)) AS s FROM t1;

-- Before optimization
SELECT a FROM v1;
-- After optimization
SELECT COUNT(*) AS a FROM t1;

Optimization limits

Column pruning optimization is triggered only if the rewritten query is guaranteed to be semantically equivalent to the original SQL statement. A column in a derived table or view that appears to be unreferenced is not pruned if it is used in any of the following operations:

It is used in an ORDER BY, GROUP BY, HAVING, or DISTINCT clause.
It is used as a partition key (PARTITION BY) or sort key (ORDER BY) in a window function.
The optimizer can prune columns generated by non-deterministic functions, such as RAND() or UUID(). In this case, the function is not executed, which avoids unnecessary computational overhead. This behavior is an intended optimization effect, not a limitation.

FAQ

Does modifying the derived_table_pruning_mode parameter require a cluster restart?

No, it does not. This parameter can be modified dynamically and takes effect immediately without a restart.

Why has query performance not improved after I set derived_table_pruning_mode=ON on the primary node?

Check the SQL scenario: Verify that your SQL statement meets the conditions for column pruning. The outer query must use only a subset of columns from the inner derived table or view.
View the execution plan: Use EXPLAIN to confirm whether the optimization occurred. If it did not, see the Optimization limits section to check for scenarios where pruning is not applied.