Usage limits and supported types of subqueries - PolarDB

PolarDB-X 1.0 supports most MySQL subquery syntax. This topic describes the hard restrictions that cause query failures, the subquery patterns that run via the APPLY operator (row-by-row execution) and are significantly slower on large datasets, and the supported subquery types.

Limitations

The following subquery patterns are not supported in PolarDB-X 1.0:

HAVING clause: Subqueries cannot appear inside a HAVING clause.

-- Not supported
SELECT name, AVG(quantity)
FROM tb1
GROUP BY name
HAVING AVG(quantity) > 2 * (
   SELECT AVG(quantity)
   FROM tb2
);

JOIN ON clause: Subqueries cannot appear inside a JOIN ON clause.

-- Not supported
SELECT * FROM tb1 p JOIN tb2 s ON (p.id = s.id AND p.quantity > ALL(SELECT quantity FROM tb3))

ROW and scalar subquery on both sides of `=`: A ROW subquery and a scalar subquery cannot appear simultaneously on both sides of an equal sign.
```
-- Not supported
SELECT * FROM tb1 WHERE ROW(id, name) = (SELECT id, name FROM tb2)
```

UPDATE SET clause: Subqueries cannot appear inside an UPDATE SET clause.

-- Not supported
UPDATE t1 SET c1 = (SELECT c2 FROM t2 WHERE t1.c1 = t2.c1) LIMIT 10

Performance considerations

In PolarDB-X 1.0, some subqueries can only run using the APPLY operator rather than being converted to a JOIN. The APPLY operator executes the subquery once per row of the outer table, so performance degrades significantly on large datasets. The patterns below trigger this behavior — rewrite them where possible.

OR operators combined with subqueries

When a WHERE clause mixes OR with a subquery, PolarDB-X cannot convert the subquery to a JOIN because the OR condition expands the result set in a way that is incompatible with JOIN semantics.

-- Efficient
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2)
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) AND id > 3

-- Inefficient: OR prevents JOIN conversion
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) OR id > 3

Correlated items inside functions or with non-equal comparisons

When a correlated column is wrapped in a function or compared with an operator other than =, PolarDB-X cannot resolve the correlation via a JOIN — the function or non-equal operator prevents the optimizer from rewriting the subquery as an equi-join.

-- Efficient
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name = b.name)

-- Inefficient: correlated column inside a function
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE UPPER(a.name) = b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.decimal_test = ABS(b.decimal_test))

-- Inefficient: non-equal comparison operators
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name != b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name >= b.name)

Correlated items connected by OR inside the subquery

When the correlated condition inside a subquery is joined to other conditions with OR, PolarDB-X cannot rewrite it as a JOIN. Only AND-connected correlations can be transformed into equi-join conditions.

-- Efficient: correlated condition uses AND
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')

-- Inefficient: correlated condition uses OR
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test < '2015-12-02')
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test = a.date_test)

Scalar subqueries with correlated items

A scalar subquery that references a column from the outer query cannot be converted to a JOIN. Without a correlated column, the scalar subquery executes once; with one, it executes once per outer row.

-- Efficient: no correlation to outer table
SELECT * FROM tb1 a WHERE id > (SELECT id FROM tb2 b WHERE b.date_test < '2015-12-02')

-- Inefficient: correlated item in scalar subquery
SELECT * FROM tb1 a WHERE id >
  (SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')

Correlated items that span correlation levels

In multi-level subqueries, each subquery's correlated columns must reference only its immediate outer query. When a column in an inner subquery references a table two or more levels up, the correlation skips a level and cannot be resolved via a JOIN.

-- Efficient: each subquery correlates with its direct outer query only
SELECT * FROM tb1 a WHERE id IN (
  SELECT id FROM tb2 b
  WHERE a.name = b.name
    AND EXISTS (SELECT name FROM tb3 c WHERE b.address = c.address)
)

-- Inefficient: table c correlates with table a, skipping table b
SELECT * FROM tb1 a WHERE id IN (
  SELECT id FROM tb2 b
  WHERE a.name = b.name
    AND EXISTS (SELECT name FROM tb3 c WHERE a.address = c.address)
)

Note In the inefficient example, table a and table b, and table b and table c belong to the same correlation level. The correlation between table a and table c spans the correlation levels, which cannot be resolved with a JOIN.

GROUP BY subqueries where correlated items don't match grouping columns

In a subquery with GROUP BY, each correlated column must appear in the GROUP BY clause. If it doesn't, the optimizer cannot rewrite the subquery as a JOIN because the grouping key and the correlation key are mismatched.

-- Efficient: correlated item b.pk matches the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
  SELECT pk FROM tb2 b
  WHERE a.pk = b.pk AND b.date_test = '2003-04-05'
  GROUP BY pk
)

-- Inefficient: correlated item b.date_test does not match the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
  SELECT pk FROM tb2 b
  WHERE a.date_test = b.date_test AND b.date_test = '2003-04-05'
  GROUP BY pk
)

Supported subqueries

PolarDB-X 1.0 supports the following subquery types.

Comparisons using subqueries

Use a comparison operator to compare a value against the result of a subquery.

Syntax:

non_subquery_operand comparison_operator (subquery)
-- comparison_operator: = > < >= <= <> != <=> LIKE

Example:

SELECT * FROM tb1 WHERE 'a' = (SELECT column1 FROM t1)

Note The subquery must appear on the right side of the comparison operator.

Subqueries with ANY, ALL, IN/NOT IN, and EXISTS/NOT EXISTS

Syntax:

operand comparison_operator ANY (subquery)
operand comparison_operator ALL (subquery)
operand IN (subquery)
operand NOT IN (subquery)
operand EXISTS (subquery)
operand NOT EXISTS (subquery)
-- comparison_operator: = > < >= <= <> !=

Keyword	Behavior
`ANY`	Returns `TRUE` if any row returned by the subquery satisfies the expression.
`ALL`	Returns `TRUE` if every row returned by the subquery satisfies the expression.
`IN`	Equivalent to `= ANY`.
`NOT IN`	Equivalent to `<> ALL`.
`EXISTS`	Returns `TRUE` if the subquery returns at least one row.
`NOT EXISTS`	Returns `FALSE` if the subquery returns at least one row. Otherwise, `TRUE` is returned.

Examples:

-- IN and = ANY are equivalent
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);

-- NOT IN and <> ALL are equivalent
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

-- EXISTS
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Note EXISTS returns TRUE even if the subquery returns only NULL rows.

ROW subqueries

ROW subqueries compare multiple columns simultaneously against the result of a subquery.

Supported operators: = > < >= <= <> != <=>

Example:

SELECT * FROM t1
  WHERE (col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

SELECT * FROM t1
  WHERE ROW(col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

The two statements above are equivalent. Rows from t1 are returned only when:

The subquery returns exactly one row. An error is reported if it returns multiple rows.
The values returned by the subquery (col3, col4) equal the values in the primary table (col1, col2).

Correlated subqueries

A correlated subquery references a column from the outer query. The subquery is re-evaluated for each row of the outer query.

Example:

SELECT * FROM t1
  WHERE column1 = ANY (SELECT column1 FROM t2
                       WHERE t2.column2 = t1.column2);

In this example, t2.column2 = t1.column2 references t1.column2 from the outer query, making this a correlated subquery.

Derived tables (subqueries in a FROM clause)

A derived table is a subquery used in a FROM clause. It must have an alias.

Syntax:

SELECT ... FROM (subquery) [AS] tbl_name ...

Constraints:

A derived table must have an alias.
A derived table can return a scalar, a column, a row, or a table.
Derived tables cannot be correlated subqueries — they cannot reference columns from outer queries.

Examples:

Create a table and insert data:

CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
INSERT INTO t1 VALUES (1, '1', 1.0);
INSERT INTO t1 VALUES (2, '2', 2.0);

Query rows where sb1 > 1. The result is 2, '2', 4.0:

SELECT sb1, sb2, sb3
  FROM (SELECT s1 AS sb1, s2 AS sb2, s3 * 2 AS sb3 FROM t1) AS sb
  WHERE sb1 > 1;

Aggregate functions cannot be nested directly. The following statement returns an error:

-- Error: aggregate functions cannot be nested
SELECT AVG(SUM(s1)) FROM t1 GROUP BY s1;

Use a derived table to compute the average of grouped sums instead. The result is 1.5000:

SELECT AVG(sum_s1)
  FROM (SELECT SUM(s1) AS sum_s1 FROM t1 GROUP BY s1) AS t1;