All Products
Search
Document Center

PolarDB:Subquery

Last Updated:Mar 28, 2026

PolarDB-X 1.0 supports most MySQL subquery syntax. This topic describes the hard restrictions that cause query failures, the subquery patterns that run via the APPLY operator (row-by-row execution) and are significantly slower on large datasets, and the supported subquery types.

Limitations

The following subquery patterns are not supported in PolarDB-X 1.0:

  • HAVING clause: Subqueries cannot appear inside a HAVING clause.

    -- Not supported
    SELECT name, AVG(quantity)
    FROM tb1
    GROUP BY name
    HAVING AVG(quantity) > 2 * (
       SELECT AVG(quantity)
       FROM tb2
    );
  • JOIN ON clause: Subqueries cannot appear inside a JOIN ON clause.

    -- Not supported
    SELECT * FROM tb1 p JOIN tb2 s ON (p.id = s.id AND p.quantity > ALL(SELECT quantity FROM tb3))
  • ROW and scalar subquery on both sides of `=`: A ROW subquery and a scalar subquery cannot appear simultaneously on both sides of an equal sign.

    -- Not supported
    SELECT * FROM tb1 WHERE ROW(id, name) = (SELECT id, name FROM tb2)
  • UPDATE SET clause: Subqueries cannot appear inside an UPDATE SET clause.

    -- Not supported
    UPDATE t1 SET c1 = (SELECT c2 FROM t2 WHERE t1.c1 = t2.c1) LIMIT 10

Performance considerations

In PolarDB-X 1.0, some subqueries can only run using the APPLY operator rather than being converted to a JOIN. The APPLY operator executes the subquery once per row of the outer table, so performance degrades significantly on large datasets. The patterns below trigger this behavior — rewrite them where possible.

OR operators combined with subqueries

When a WHERE clause mixes OR with a subquery, PolarDB-X cannot convert the subquery to a JOIN because the OR condition expands the result set in a way that is incompatible with JOIN semantics.

-- Efficient
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2)
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) AND id > 3

-- Inefficient: OR prevents JOIN conversion
SELECT * FROM tb1 WHERE id IN (SELECT id FROM tb2) OR id > 3

Correlated items inside functions or with non-equal comparisons

When a correlated column is wrapped in a function or compared with an operator other than =, PolarDB-X cannot resolve the correlation via a JOIN — the function or non-equal operator prevents the optimizer from rewriting the subquery as an equi-join.

-- Efficient
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name = b.name)

-- Inefficient: correlated column inside a function
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE UPPER(a.name) = b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.decimal_test = ABS(b.decimal_test))

-- Inefficient: non-equal comparison operators
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name != b.name)
SELECT * FROM tb1 a WHERE id IN (SELECT id FROM tb2 b WHERE a.name >= b.name)

Correlated items connected by OR inside the subquery

When the correlated condition inside a subquery is joined to other conditions with OR, PolarDB-X cannot rewrite it as a JOIN. Only AND-connected correlations can be transformed into equi-join conditions.

-- Efficient: correlated condition uses AND
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')

-- Inefficient: correlated condition uses OR
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test < '2015-12-02')
SELECT * FROM tb1 a WHERE id IN
  (SELECT id FROM tb2 b WHERE a.name = b.name OR b.date_test = a.date_test)

Scalar subqueries with correlated items

A scalar subquery that references a column from the outer query cannot be converted to a JOIN. Without a correlated column, the scalar subquery executes once; with one, it executes once per outer row.

-- Efficient: no correlation to outer table
SELECT * FROM tb1 a WHERE id > (SELECT id FROM tb2 b WHERE b.date_test < '2015-12-02')

-- Inefficient: correlated item in scalar subquery
SELECT * FROM tb1 a WHERE id >
  (SELECT id FROM tb2 b WHERE a.name = b.name AND b.date_test < '2015-12-02')

Correlated items that span correlation levels

In multi-level subqueries, each subquery's correlated columns must reference only its immediate outer query. When a column in an inner subquery references a table two or more levels up, the correlation skips a level and cannot be resolved via a JOIN.

-- Efficient: each subquery correlates with its direct outer query only
SELECT * FROM tb1 a WHERE id IN (
  SELECT id FROM tb2 b
  WHERE a.name = b.name
    AND EXISTS (SELECT name FROM tb3 c WHERE b.address = c.address)
)

-- Inefficient: table c correlates with table a, skipping table b
SELECT * FROM tb1 a WHERE id IN (
  SELECT id FROM tb2 b
  WHERE a.name = b.name
    AND EXISTS (SELECT name FROM tb3 c WHERE a.address = c.address)
)
Note In the inefficient example, table a and table b, and table b and table c belong to the same correlation level. The correlation between table a and table c spans the correlation levels, which cannot be resolved with a JOIN.

GROUP BY subqueries where correlated items don't match grouping columns

In a subquery with GROUP BY, each correlated column must appear in the GROUP BY clause. If it doesn't, the optimizer cannot rewrite the subquery as a JOIN because the grouping key and the correlation key are mismatched.

-- Efficient: correlated item b.pk matches the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
  SELECT pk FROM tb2 b
  WHERE a.pk = b.pk AND b.date_test = '2003-04-05'
  GROUP BY pk
)

-- Inefficient: correlated item b.date_test does not match the GROUP BY column pk
SELECT * FROM tb1 a WHERE EXISTS (
  SELECT pk FROM tb2 b
  WHERE a.date_test = b.date_test AND b.date_test = '2003-04-05'
  GROUP BY pk
)

Supported subqueries

PolarDB-X 1.0 supports the following subquery types.

Comparisons using subqueries

Use a comparison operator to compare a value against the result of a subquery.

Syntax:

non_subquery_operand comparison_operator (subquery)
-- comparison_operator: = > < >= <= <> != <=> LIKE

Example:

SELECT * FROM tb1 WHERE 'a' = (SELECT column1 FROM t1)
Note The subquery must appear on the right side of the comparison operator.

Subqueries with ANY, ALL, IN/NOT IN, and EXISTS/NOT EXISTS

Syntax:

operand comparison_operator ANY (subquery)
operand comparison_operator ALL (subquery)
operand IN (subquery)
operand NOT IN (subquery)
operand EXISTS (subquery)
operand NOT EXISTS (subquery)
-- comparison_operator: = > < >= <= <> !=
KeywordBehavior
ANYReturns TRUE if any row returned by the subquery satisfies the expression.
ALLReturns TRUE if every row returned by the subquery satisfies the expression.
INEquivalent to = ANY.
NOT INEquivalent to <> ALL.
EXISTSReturns TRUE if the subquery returns at least one row.
NOT EXISTSReturns FALSE if the subquery returns at least one row. Otherwise, TRUE is returned.

Examples:

-- IN and = ANY are equivalent
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);

-- NOT IN and <> ALL are equivalent
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

-- EXISTS
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
Note EXISTS returns TRUE even if the subquery returns only NULL rows.

ROW subqueries

ROW subqueries compare multiple columns simultaneously against the result of a subquery.

Supported operators: = > < >= <= <> != <=>

Example:

SELECT * FROM t1
  WHERE (col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

SELECT * FROM t1
  WHERE ROW(col1, col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

The two statements above are equivalent. Rows from t1 are returned only when:

  • The subquery returns exactly one row. An error is reported if it returns multiple rows.

  • The values returned by the subquery (col3, col4) equal the values in the primary table (col1, col2).

Correlated subqueries

A correlated subquery references a column from the outer query. The subquery is re-evaluated for each row of the outer query.

Example:

SELECT * FROM t1
  WHERE column1 = ANY (SELECT column1 FROM t2
                       WHERE t2.column2 = t1.column2);

In this example, t2.column2 = t1.column2 references t1.column2 from the outer query, making this a correlated subquery.

Derived tables (subqueries in a FROM clause)

A derived table is a subquery used in a FROM clause. It must have an alias.

Syntax:

SELECT ... FROM (subquery) [AS] tbl_name ...

Constraints:

  • A derived table must have an alias.

  • A derived table can return a scalar, a column, a row, or a table.

  • Derived tables cannot be correlated subqueries — they cannot reference columns from outer queries.

Examples:

Create a table and insert data:

CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
INSERT INTO t1 VALUES (1, '1', 1.0);
INSERT INTO t1 VALUES (2, '2', 2.0);

Query rows where sb1 > 1. The result is 2, '2', 4.0:

SELECT sb1, sb2, sb3
  FROM (SELECT s1 AS sb1, s2 AS sb2, s3 * 2 AS sb3 FROM t1) AS sb
  WHERE sb1 > 1;

Aggregate functions cannot be nested directly. The following statement returns an error:

-- Error: aggregate functions cannot be nested
SELECT AVG(SUM(s1)) FROM t1 GROUP BY s1;

Use a derived table to compute the average of grouped sums instead. The result is 1.5000:

SELECT AVG(sum_s1)
  FROM (SELECT SUM(s1) AS sum_s1 FROM t1 GROUP BY s1) AS t1;