Subquery - PolarDB - Alibaba Cloud Documentation Center

This topic describes the types of subqueries supported by PolarDB-X 1.0 and the limits and additional considerations when you use subqueries in PolarDB-X 1.0.

Limits

Compared with the native MySQL, PolarDB-X 1.0 has the following limits when you use subqueries:

Subqueries cannot be used in HAVING clauses. Example:

SELECT name, AVG( quantity )
FROM tb1
GROUP BY name
HAVING AVG( quantity ) > 2* (
   SELECT AVG( quantity )
   FROM tb2
 );

Subqueries cannot be used in JOIN ON clauses. Example:

SELECT * FROM tb1 p JOIN tb2 s on (p.id=s.id and p.quantity>All(select quantity from tb3))

ROW subqueries and scalar subqueries cannot be placed before and after equal signs (=) simultaneously. Example:
```
select * from tb1 where row(id, name) = (select id, name from tb2)        
```

Subqueries cannot be used in UPDATE SET clauses. Example:

UPDATE t1 SET c1 = (SELECT c2 FROM t2 WHERE t1.c1 = t2.c1) LIMIT 10

Additional considerations

In PolarDB-X 1.0, some subqueries can be executed by using only the APPLY operator and result in inefficient queries. Avoid the following inefficient SQL statements:

SQL statements whose WHERE clauses contain both OR operators and subqueries. The execution efficiency is reduced based on the data in the foreign tables. Examples:

Efficient: select * from tb1 where id in (select id from tb2)
Efficient: select * from tb1 where id in (select id from tb2) and id>3
Inefficient: select * from tb1 where id in (select id from tb2) or  id>3

Correlated subqueries whose correlated items are used in functions or used along with non-equal signs. Examples:

Efficient: select * from tb1 a where id in
      (select id from tb2 b where a.name=b.name)
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where UPPER(a.name)=b.name)
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where a.decimal_test=abs(b.decimal_test))
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where a.name! =b.name)
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where a.name>=b.name)

Correlated subqueries whose correlated items are connected with other conditions by using OR operators. Examples:

Efficient: select * from tb1 a where id in
      (select id from tb2 b where a.name=b.name
                                  and b.date_test<'2015-12-02')
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where a.name=b.name
                                  or b.date_test<'2015-12-02')
Inefficient: select * from tb1 a where id in
      (select id from tb2 b where a.name=b.name
                                  or b.date_test=a.date_test)

Scalar subqueries that have correlated items. Examples:

Efficient: select * from tb1 a where id >
        (select id from tb2 b where b.date_test<'2015-12-02')
Inefficient: select * from tb1 a where id >
        (select id from tb2 b where a.name=b.name 
                                    and b.date_test<'2015-12-02')

Subqueries whose correlated items span the correlation levels. Examples:
- An SQL statement has multiple correlation levels. The correlated items in each subquery are correlated only with the upper level. Such statements are efficient.
```
Efficient: select * from tb1 a where id in(select id from tb2 b 
        where a.name=b.name and 
        exists (select name from tb3 c where b.address=c.address))  
```
- An SQL statement has multiple correlation levels. The correlated items of subqueries in table c are correlated with columns in table a. Such statements are inefficient.
```
Inefficient: select * from tb1 a where id in(select id from tb2 b 
        where a.name=b.name and 
        exists (select name from tb3 c where a.address=c.address)) 
```
Note In the preceding example, both table a and table b, table b and table c belong to the same correlation level. The correlation between table a and table c spans the correlation levels.
Subqueries that contain GROUP BY clauses. Make sure that the correlated items are correlated to the grouping columns. Examples:
- An SQL subquery contains aggregate functions and correlated items. The b.pk correlated item is correlated to the pk grouping column. Such SQL statements are efficient.
```
Efficient: select * from tb1 a where exists 
    (select pk from tb2 b 
                where a.pk=b.pk and  b.date_test='2003-04-05' 
                group by pk);
```
- An SQL subquery contains aggregate functions and correlated items. The b.date_test correlated item is not correlated to the pk grouping column. Such SQL statements are inefficient.
```
Inefficient: select * from tb1 a where exists 
    (select pk from tb2 b 
                where a.date_test=b.date_test and b.date_test='2003-04-05' 
                group by pk);
```

Supported subqueries

PolarDB-X 1.0 supports the following types of subqueries:

Comparisons using subqueries
Comparisons using subqueries indicate subqueries that use comparison operators. These subqueries are commonly used.
- Syntax:
```
non_subquery_operand comparison_operator (subquery)
comparison_operator: = > < >= <= <> ! =  <=> like        
```
- Example:
```
select * from tb1 WHERE 'a' = (SELECT column1 FROM t1)  
```
  Note Subqueries can be placed only to the right of comparison operators.
Subqueries with ANY, ALL, IN/NOT IN, and EXISTS/NOT EXISTS
- Syntax:
```
operand comparison_operator ANY (subquery)
operand comparison_operator ALL (subquery)
operand IN (subquery)
operand NOT IN (subquery)
operand EXISTS (subquery)
operand NOT EXISTS (subquery)

comparison_operator:=  >  <  >=  <=  <>  ! =
```
- Examples
  - ANY: If any row returned by the subquery meets the expression before ANY, TRUE is returned. Otherwise, FALSE is returned.
  - ALL: If all rows returned by the subquery meet the expression before ALL, TRUE is returned. Otherwise, FALSE is returned.
  - IN: If IN is used before the subquery, IN is equivalent to =ANY. Example:
```
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);
```
  - NOT IN: If NOT IN is used before the subquery, NOT IN is equivalent to <>ALL. Example:
```
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);   
```
  - EXISTS: If the subquery returns any rows, TRUE is returned. Otherwise, FALSE is returned. Example:
```
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
```
    Note If a subquery contains any rows, the WHERE condition returns TRUE even if the subquery contains only NULL rows.
  - NOT EXISTS: If the subquery returns any rows, FALSE is returned. Otherwise, TRUE is returned.
ROW subqueries
- ROW subqueries support the following comparison operators:
```
comparison_operator: =  >  <  >=  <=  <>  ! =  <=>     
```
- Examples:
```
SELECT * FROM t1
  WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
SELECT * FROM t1
  WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);  
```
  The preceding two SQL statements are equivalent. Data rows in table t1 are returned only when the following conditions are met:
  - The subquery (SELECT col3, col4 FROM t2 WHERE id=10) returns only one row. An error is reported if multiple rows are returned.
  - col3 and col4 returned by the subquery are equal to col1 and col2 in the primary table.
Correlated subqueries
Correlated subqueries are subqueries that contain references to foreign tables in outer queries. Example:
```
SELECT * FROM t1
  WHERE column1 = ANY (SELECT column1 FROM t2
                       WHERE t2.column2 = t1.column2);
```
In the example, the subquery does not contain table t1 and its column column2. In this case, the subquery finds the table in the outer query.
Derived tables (subqueries in a FROM clause)
Derived tables are subqueries in a FROM clause.
- Syntax:
```
SELECT ... FROM (subquery) [AS] tbl_name ...
```
- Examples
  1. Prepare data:
    Execute the following statements to create table t1:
```
CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
INSERT INTO t1 VALUES (1,'1',1.0);
INSERT INTO t1 VALUES (2,'2',2.0);
```
    Execute the following statement. The query result is 2, '2', 4.0.
```
SELECT sb1,sb2,sb3
  FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb
  WHERE sb1 > 1;
```
  2. Query data: Query the average value of grouped data that is processed by the SUM function.
    If you execute the following SQL statement, an error is reported and no result is returned.
```
SELECT AVG(SUM(s1)) FROM t1 GROUP BY s1;
```
    You can execute the following statement that contains a derived table. The query result is 1.5000.
```
SELECT AVG(sum_s1)
  FROM (SELECT SUM(s1) AS sum_s1
        FROM t1 GROUP BY s1) AS t1;
```
    Note
    
    A derived table must have an alias, such as t1 in the previous statement.
    
    A derived table can return a scalar, a column, a row, or a table.
    
    Derived tables cannot be correlated subqueries. Derived tables cannot contain references to foreign tables in outer queries.