PolarDB-X 1.0 supports most MySQL subquery syntax. This topic describes the hard restrictions that cause query failures, the subquery patterns that run via the APPLY operator (row-by-row execution) and are significantly slower on large datasets, and the supported subquery types.
Limitations
The following subquery patterns are not supported in PolarDB-X 1.0:
HAVING clause: Subqueries cannot appear inside a
HAVINGclause.-- Not supported SELECT name, AVG(quantity) FROM tb1 GROUP BY name HAVING AVG(quantity) > 2 * ( SELECT AVG(quantity) FROM tb2 );JOIN ON clause: Subqueries cannot appear inside a
JOIN ONclause.-- Not supported SELECT * FROM tb1 p JOIN tb2 s ON (p.id = s.id AND p.quantity > ALL(SELECT quantity FROM tb3))ROW and scalar subquery on both sides of `=`: A ROW subquery and a scalar subquery cannot appear simultaneously on both sides of an equal sign.
-- Not supported SELECT * FROM tb1 WHERE ROW(id, name) = (SELECT id, name FROM tb2)UPDATE SET clause: Subqueries cannot appear inside an
UPDATE SETclause.-- Not supported UPDATE t1 SET c1 = (SELECT c2 FROM t2 WHERE t1.c1 = t2.c1) LIMIT 10
Performance considerations
In PolarDB-X 1.0, some subqueries can only run using the APPLY operator rather than being converted to a JOIN. The APPLY operator executes the subquery once per row of the outer table, so performance degrades significantly on large datasets. The patterns below trigger this behavior — rewrite them where possible.
OR operators combined with subqueries
When a WHERE clause mixes OR with a subquery, PolarDB-X cannot convert the subquery to a JOIN because the OR condition expands the result set in a way that is incompatible with JOIN semantics.
-- Efficient
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2)
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) AND id > 3
-- Inefficient: OR prevents JOIN conversion
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) OR id > 3Correlated items inside functions or with non-equal comparisons
When a correlated column is wrapped in a function or compared with an operator other than =, PolarDB-X cannot resolve the correlation via a JOIN — the function or non-equal operator prevents the optimizer from rewriting the subquery as an equi-join.
-- Efficient
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name = b.name)
-- Inefficient: correlated column inside a function
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE UPPER(a.name) = b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.decimal_test = ABS(b.decimal_test))
-- Inefficient: non-equal comparison operators
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name != b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name >= b.name)Correlated items connected by OR inside the subquery
When the correlated condition inside a subquery is joined to other conditions with OR, PolarDB-X cannot rewrite it as a JOIN. Only AND-connected correlations can be transformed into equi-join conditions.
-- Efficient: correlated condition uses AND
SELECT * FROM tb1 a WHERE id IN
(SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')
-- Inefficient: correlated condition uses OR
SELECT * FROM tb1 a WHERE id IN
(SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test < '2015-12-02')
SELECT * FROM tb1 a WHERE id IN
(SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test = a.date_test)Scalar subqueries with correlated items
A scalar subquery that references a column from the outer query cannot be converted to a JOIN. Without a correlated column, the scalar subquery executes once; with one, it executes once per outer row.
-- Efficient: no correlation to outer table
SELECT * FROM tb1 a WHERE id > (SELECT id FROM tb2 b WHERE b.date_test < '2015-12-02')
-- Inefficient: correlated item in scalar subquery
SELECT * FROM tb1 a WHERE id >
(SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')Correlated items that span correlation levels
In multi-level subqueries, each subquery's correlated columns must reference only its immediate outer query. When a column in an inner subquery references a table two or more levels up, the correlation skips a level and cannot be resolved via a JOIN.
-- Efficient: each subquery correlates with its direct outer query only
SELECT * FROM tb1 a WHERE id IN (
SELECT id FROM tb2 b
WHERE a.name = b.name
AND EXISTS (SELECT name FROM tb3 c WHERE b.address = c.address)
)
-- Inefficient: table c correlates with table a, skipping table b
SELECT * FROM tb1 a WHERE id IN (
SELECT id FROM tb2 b
WHERE a.name = b.name
AND EXISTS (SELECT name FROM tb3 c WHERE a.address = c.address)
)table a and table b, and table b and table c belong to the same correlation level. The correlation between table a and table c spans the correlation levels, which cannot be resolved with a JOIN.GROUP BY subqueries where correlated items don't match grouping columns
In a subquery with GROUP BY, each correlated column must appear in the GROUP BY clause. If it doesn't, the optimizer cannot rewrite the subquery as a JOIN because the grouping key and the correlation key are mismatched.
-- Efficient: correlated item b.pk matches the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
SELECT pk FROM tb2 b
WHERE a.pk = b.pk AND b.date_test = '2003-04-05'
GROUP BY pk
)
-- Inefficient: correlated item b.date_test does not match the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
SELECT pk FROM tb2 b
WHERE a.date_test = b.date_test AND b.date_test = '2003-04-05'
GROUP BY pk
)Supported subqueries
PolarDB-X 1.0 supports the following subquery types.
Comparisons using subqueries
Use a comparison operator to compare a value against the result of a subquery.
Syntax:
non_subquery_operand comparison_operator (subquery)
-- comparison_operator: = > < >= <= <> != <=> LIKEExample:
SELECT * FROM tb1 WHERE 'a' = (SELECT column1 FROM t1)Subqueries with ANY, ALL, IN/NOT IN, and EXISTS/NOT EXISTS
Syntax:
operand comparison_operator ANY (subquery)
operand comparison_operator ALL (subquery)
operand IN (subquery)
operand NOT IN (subquery)
operand EXISTS (subquery)
operand NOT EXISTS (subquery)
-- comparison_operator: = > < >= <= <> !=| Keyword | Behavior |
|---|---|
ANY | Returns TRUE if any row returned by the subquery satisfies the expression. |
ALL | Returns TRUE if every row returned by the subquery satisfies the expression. |
IN | Equivalent to = ANY. |
NOT IN | Equivalent to <> ALL. |
EXISTS | Returns TRUE if the subquery returns at least one row. |
NOT EXISTS | Returns FALSE if the subquery returns at least one row. Otherwise, TRUE is returned. |
Examples:
-- IN and = ANY are equivalent
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
-- NOT IN and <> ALL are equivalent
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);
-- EXISTS
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);EXISTS returns TRUE even if the subquery returns only NULL rows.ROW subqueries
ROW subqueries compare multiple columns simultaneously against the result of a subquery.
Supported operators: = > < >= <= <> != <=>
Example:
SELECT * FROM t1
WHERE (col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
SELECT * FROM t1
WHERE ROW(col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);The two statements above are equivalent. Rows from t1 are returned only when:
The subquery returns exactly one row. An error is reported if it returns multiple rows.
The values returned by the subquery (
col3,col4) equal the values in the primary table (col1,col2).
Correlated subqueries
A correlated subquery references a column from the outer query. The subquery is re-evaluated for each row of the outer query.
Example:
SELECT * FROM t1
WHERE column1 = ANY (SELECT column1 FROM t2
WHERE t2.column2 = t1.column2);In this example, t2.column2 = t1.column2 references t1.column2 from the outer query, making this a correlated subquery.
Derived tables (subqueries in a FROM clause)
A derived table is a subquery used in a FROM clause. It must have an alias.
Syntax:
SELECT ... FROM (subquery) [AS] tbl_name ...Constraints:
A derived table must have an alias.
A derived table can return a scalar, a column, a row, or a table.
Derived tables cannot be correlated subqueries — they cannot reference columns from outer queries.
Examples:
Create a table and insert data:
CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
INSERT INTO t1 VALUES (1, '1', 1.0);
INSERT INTO t1 VALUES (2, '2', 2.0);Query rows where sb1 > 1. The result is 2, '2', 4.0:
SELECT sb1, sb2, sb3
FROM (SELECT s1 AS sb1, s2 AS sb2, s3 * 2 AS sb3 FROM t1) AS sb
WHERE sb1 > 1;Aggregate functions cannot be nested directly. The following statement returns an error:
-- Error: aggregate functions cannot be nested
SELECT AVG(SUM(s1)) FROM t1 GROUP BY s1;Use a derived table to compute the average of grouped sums instead. The result is 1.5000:
SELECT AVG(sum_s1)
FROM (SELECT SUM(s1) AS sum_s1 FROM t1 GROUP BY s1) AS t1;