A correlated subquery references a column from its outer query and runs once for every row the outer query processes. This can degrade performance significantly on large datasets.
AnalyticDB for MySQL supports correlated subqueries with specific constraints. Not every standard SQL form is supported.
When possible, rewrite correlated subqueries as JOIN statements for better performance.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL cluster running version V3.1.9.0 or later
To check or update the minor version, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page. For instructions, see Update the minor version of a cluster.
How it works
AnalyticDB for MySQL automatically decorrelates a correlated subquery into a common subquery during query planning. If the subquery uses a pattern that prevents decorrelation—such as referencing a correlated column in the SELECT clause—the query fails with an error.
Limitations
Limitations on correlated columns
A correlated column is a column in the subquery that comes from the outer query.
Correlated columns can only appear in the WHERE clause of a subquery.
Correlated columns cannot appear in a subquery that is nested inside another subquery:
-- Not supported: correlated column ts1.b in a doubly-nested subquery SELECT * FROM ts1 WHERE ts1.a > (SELECT MAX(ts3.c) FROM ts3 WHERE ts3.c IN (SELECT ts2.a FROM ts2 WHERE ts1.b = ts2.b));
Limitations on subqueries
A subquery cannot appear inside a JOIN statement.
A subquery cannot contain window functions.
A subquery cannot contain set operations such as UNION:
-- Not supported: UNION inside a correlated subquery SELECT * FROM ts1 WHERE ts1.a IN (SELECT ts2.a FROM ts2 WHERE ts1.b = ts2.b UNION SELECT ts3.a FROM ts3);A subquery used in an IN clause cannot contain LIMIT, GROUP BY, ORDER BY, or JOIN:
-- Not supported: LIMIT inside an IN subquery SELECT * FROM ts1 WHERE ts1.a IN (SELECT ts2.a FROM ts2 WHERE ts1.b = ts2.b LIMIT 10); -- Not supported: GROUP BY inside an IN subquery SELECT * FROM ts1 WHERE ts1.a IN (SELECT AVG(ts2.a) FROM ts2 WHERE ts1.b = ts2.b GROUP BY ts2.c);
Example
The following example queries the minimum discount for each product using a correlated subquery, then rewrites it as a JOIN for better performance.
Correlated subquery:
SELECT id,
(SELECT MIN(discount)
FROM item
WHERE goods.id = goods_id)
FROM goods;Rewritten as a JOIN (recommended):
SELECT id, t.min_discount
FROM goods
LEFT JOIN
(SELECT goods_id,
MIN(discount) AS min_discount
FROM item
GROUP BY goods_id) t
ON t.goods_id = goods.id;The JOIN version computes the minimum discount for all products in a single pass, which performs better than running a subquery for each row in goods.
Troubleshooting
"Given correlated subquery with correlation: [column_name] is not supported"
This error appears when a correlated column is referenced in the subquery's SELECT clause. For example, when ts1.a appears in the SELECT clause, the error message reads:
Given correlated subquery with correlation: ts1.a is not supportedCause: Correlated columns can only appear in the WHERE clause. When decorrelation fails because a correlated column is in the SELECT clause, AnalyticDB for MySQL returns this error.
The following query triggers this error because ts1.a is used inside the subquery's SELECT clause:
SELECT ts1.a,
(SELECT SUM(ts2.a) + ts1.a
FROM ts2
WHERE ts1.b = ts2.b)
FROM ts1;Solution:
Option 1: Move the correlated column out of the subquery's SELECT clause:
SELECT ts1.a, (SELECT SUM(ts2.a) FROM ts2 WHERE ts1.b = ts2.b) + ts1.a FROM ts1;Option 2: Rewrite the query as a JOIN:
SELECT ts1.a, tt.suma + ts1.a FROM ts1 LEFT JOIN (SELECT ts2.b AS b, SUM(ts2.a) AS suma FROM ts2 GROUP BY ts2.b) tt ON ts1.b = tt.b;
Input does not contain symbols from correlation
原因:AnalyticDB for MySQL无法从语义中识别外部查询中是否包含子查询中的关联列。
以如下代码为例,无法从外部查询中获取关联列ts1.b,即无法从ts1表中获取。
SELECT ts1.a,
(SELECT SUM(ts2.a)
FROM ts2
WHERE ts1.b = ts2.b )
FROM ts1
GROUP BY ts1.a;解决方法:
若关联列
ts1.b在分组聚合(GROUP BY)之前产生,您可以新增列,并按照该列分组。例如:SELECT ts1.a, (SELECT SUM(ts2.a) FROM ts2 WHERE ts1.b = ts2.b ) col FROM ts1 GROUP BY ts1.a, col; /*执行分组聚合的列col为关联查询生成的列, 明确关联列ts1.b在分组聚合(GROUP BY)之前产生*/若关联列在分组聚合(GROUP BY)之后产生,您可以按照关联列分组。例如:
SELECT ts1.a, (SELECT SUM(ts2.a) FROM ts2 WHERE ts1.b = ts2.b ) FROM ts1 GROUP BY ts1.a, ts1.b; /*按照ts1.b分组,明确关联列ts1.b在分组聚合(GROUP BY)之后产生*/
"Scalar sub-query has returned multiple rows"
Cause: If a correlated subquery returns multiple rows, an error occurs. If a correlated subquery does not return a row or returns only one row, no error occurs.
The following query triggers this error when the subquery returns multiple matching rows because an equality condition is used in the outer query:
SELECT ts1.a
FROM ts1
WHERE ts1.a =
(SELECT ts2.a
FROM ts2
WHERE ts1.b = ts2.b);Solution: Replace the equality condition with IN:
SELECT ts1.a
FROM ts1
WHERE ts1.a IN
(SELECT ts2.a
FROM ts2
WHERE ts1.b = ts2.b);