Redundant GROUP BY or ORDER BY clauses force the database to perform unnecessary sorting or hashing operations, consuming significant CPU and memory resources. This overhead is especially costly for analytical processing (AP) queries on large datasets. PolarDB for MySQL automatically detects and removes these redundant operations using functional dependency (FD) analysis — delivering up to 29% faster query performance on TPC-H benchmarks without any changes to your application code.
Scope
Product series: Cluster Edition, Standard Edition
Version: MySQL 8.0.2, revision 8.0.2.2.33 or later
Enable the feature
Two parameters control this optimization:
| Parameter | Level | Default | Valid values |
|---|---|---|---|
loose_groupby_elimination_mode | Global/Session | REPLICA_ON | REPLICA_ON, ON, OFF |
loose_orderby_elimination_mode | Global/Session | REPLICA_ON | REPLICA_ON, ON, OFF |
Valid values:
REPLICA_ON(default): Enables the feature on read-only (RO) nodes only.ON: Enables the feature on all nodes.OFF: Disabled.
Parameter naming by interface:
PolarDB console: Parameters include the
loose_prefix for compatibility with MySQL configuration files. Find and modifyloose_groupby_elimination_modeandloose_orderby_elimination_mode.Database session (using the
SETcommand): Remove theloose_prefix and use the original parameter name —groupby_elimination_modeandorderby_elimination_mode.
How it works
The optimizer eliminates redundant GROUP BY and ORDER BY clauses by inferring functional dependencies (FDs) from table metadata and query conditions.
What is a functional dependency?
If column A uniquely determines column B, B is functionally dependent on A — written as A -> B. For example, in a user table where user_id is the primary key, user_id -> nickname holds because each user_id maps to exactly one nickname.
Sources of functional dependency:
| Source | Example |
|---|---|
| Primary keys and unique keys | user_id (primary key) determines all other columns in the row |
| Equi-join conditions | t1.pk = t2.fk propagates the uniqueness of t1.pk into t2 |
| Constant filter conditions | WHERE status = 'active' makes status a constant in the result |
Optimization rules:
| Rule | Condition | Result |
|---|---|---|
GROUP BY elimination | GROUP BY columns include a primary key or uniquely identify each row | Entire GROUP BY clause removed |
GROUP BY elimination (constants) | All GROUP BY columns are constants | GROUP BY removed and LIMIT 1 added |
ORDER BY elimination | All ORDER BY columns are constants | Entire ORDER BY clause removed |
GROUP BY/ORDER BY simplification | FD exists between columns in the clause (e.g., GROUP BY x, y where x -> y) | Clause simplified (e.g., to GROUP BY x) |
Optimization scenarios
Scenario 1: Grouping by primary or unique key
Trigger conditions:
The
GROUP BYclause includes a table's primary key.Other columns in the
GROUP BYlist are functionally determined by that key.
Business scenario: Calculate the total order count per user and display their username.
Original SQL:
-- user_id is the primary key of the user table and uniquely determines user_name.
SELECT
u.user_id,
u.user_name,
COUNT(o.order_id)
FROM user u
JOIN orders o ON u.user_id = o.user_id
GROUP BY
u.user_id,
u.user_name;What the optimizer does: Because user_id is the primary key, the FD user_id -> user_name holds. The optimizer simplifies GROUP BY u.user_id, u.user_name to GROUP BY u.user_id, removing the redundant grouping on user_name.
Optimized SQL:
SELECT
u.user_id,
u.user_name,
COUNT(o.order_id)
FROM user u
JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id;Verify the optimization:
SHOW WARNINGS;The Message column shows the rewritten query with GROUP BY u.user_id only, confirming that u.user_name has been removed from the grouping key.
Scenario 2: Grouping by constants
Trigger conditions:
All columns in the
GROUP BYclause appear as equality constants in theWHEREclause.
Business scenario: Query a specific row by exact column values.
Original SQL:
SELECT a, b FROM t1 WHERE a = 1 AND b = 1 GROUP BY a, b;What the optimizer does: Both a and b are constants in the WHERE clause, so the result set contains at most one row. The optimizer removes the GROUP BY and adds LIMIT 1.
Optimized SQL:
SELECT a, b FROM t1 WHERE a = 1 AND b = 1 LIMIT 1;Verify the optimization:
SHOW WARNINGS;The Message column shows the rewritten query with LIMIT 1 and no GROUP BY clause.
Scenario 3: Multi-table join with complex grouping (TPC-H Q10)
Trigger conditions:
A multi-table join propagates a primary key's FD through equi-join conditions.
All non-key columns in the
GROUP BYlist are transitively determined by the primary key.
Original SQL (TPC-H Q10):
SELECT
c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment,
SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM customer, orders, lineitem, nation
WHERE c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND c_nationkey = n_nationkey
-- Other filter conditions...
GROUP BY
c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
ORDER BY revenue DESC
LIMIT 20;What the optimizer does:
Primary key dependency:
c_custkeyis the primary key of thecustomertable, so it determinesc_name,c_acctbal,c_phone,c_address,c_comment, andc_nationkey.Dependency propagation: The join condition
c_nationkey = n_nationkeyand the fact thatn_nationkeyis the primary key ofnationmeansc_custkeyalso determinesn_name.Simplification: All non-key columns in the
GROUP BYlist are functionally dependent onc_custkey. The optimizer simplifies the clause toGROUP BY c_custkey.
Optimized SQL:
SELECT
c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment,
SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM customer, orders, lineitem, nation
WHERE c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND c_nationkey = n_nationkey
-- Other filter conditions...
GROUP BY c_custkey
ORDER BY revenue DESC
LIMIT 20;Verify the optimization:
SHOW WARNINGS;The Message column shows the rewritten query with GROUP BY c_custkey only, confirming that all other columns have been removed from the grouping key.
Scenario 4: ORDER BY simplification with a computed column
Trigger conditions:
The
ORDER BYclause includes a column that is a deterministic function of other columns in the same clause.That column is defined as
UNIQUE NOT NULL.
Table schema:
CREATE TABLE t1 (
a INT,
b INT,
c INT AS (a + b) UNIQUE NOT NULL
);Original SQL:
EXPLAIN SELECT a, b FROM t1 ORDER BY a, b, c;What the optimizer does: Column c is computed from a and b (c = a + b) and defined as UNIQUE NOT NULL, establishing the FD (a, b) -> c. The optimizer simplifies ORDER BY a, b, c to ORDER BY a, b.
Verify the optimization:
SHOW WARNINGS;+-------+------+-------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` order by `test`.`t1`.`a`,`test`.`t1`.`b` |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------+The rewritten query in the Message column confirms that c has been removed from the sort key.
Performance benchmark
Tests on a TPC-H 100 GB standard dataset show that enabling GROUP BY elimination significantly improves Q10 query performance.
| Test scenario | With elimination | Without elimination | Improvement |
|---|---|---|---|
| In-Memory Column Index (IMCI) at 1 DOP (within 1 day of data generation) | 48 seconds | 68 seconds | 29% |
| In-Memory Column Index (IMCI) at 32 DOP (within 32 days of data generation) | 1.9 seconds | 2.6 seconds | 27% |
The TPC-H implementation described here is based on TPC-H benchmarking methodology and cannot be compared with published TPC-H benchmark results. These tests do not fully comply with all TPC-H requirements. Actual performance improvements may vary based on query complexity, data distribution, and cluster specifications.