This topic describes the subquery categories supported by PolarDB-X. This topic also describes the limits on and the notes for subqueries in PolarDB-X.

Limits

Compared with the native MySQL, PolarDB-X provides the following limits on subqueries:

  • Subqueries cannot be used in HAVING clauses. The following code block provides an example on how to use subqueries in HAVING clauses:
    SELECT name, AVG( quantity )
    FROM tb1
    GROUP BY name
    HAVING AVG( quantity ) > 2* (
       SELECT AVG( quantity )
       FROM tb2
     );
  • Subqueries cannot be used in JOIN ON clauses. The following code block provides an example on how to use subqueries in JOIN ON clauses:
    SELECT * FROM tb1 p JOIN tb2 s on (p.id=s.id and p.quantity>All(select quantity from tb3))
  • PolarDB-X does not support the ROW functions in the scalar subqueries that use equal signs (=) as operators. The following code block provides an example on how to use the ROW functions:
    select * from tb1 where row(id, name) = (select id, name from tb2)        
  • Subqueries cannot be used in UPDATE SET clauses. The following code block provides an example on how to use subqueries in UPDATE SET clauses:
    UPDATE t1 SET c1 = (SELECT c2 FROM t2 WHERE t1.c1 = t2.c1) LIMIT 10

Note

In PolarDB-X, only the APPLY operator can be used to run specific subqueries. This makes queries inefficient. We recommend that you do not execute the following inefficient SQL statements:

  • SQL statements in which WHERE clauses contain OR operators and subqueries. The execution efficiency of the SQL statements is reduced based on the data in the foreign tables. The following code block provides an example on how to specify the SQL statements:
    Efficient statement: select * from tb1 where id in (select id from tb2)
    Efficient statement: select * from tb1 where id in (select id from tb2) and id>3
    Inefficient statement: select * from tb1 where id in (select id from tb2) or  id>3
  • Correlated subqueries in which correlated items contain functions or non-equal signs. The following code block provides examples on how to specify the statement that does not contain the correlated subquery and the statements that contain the correlated subqueries:
    Efficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name=b.name)
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where UPPER(a.name)=b.name)
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.decimal_test=abs(b.decimal_test))
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name!=b.name)
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name>=b.name) 
  • Correlated subqueries in which correlated items are connected with other conditions by using OR operators. The following code block provides examples on how to specify the statement that does not contain the correlated subquery and the statements that contain the correlated subqueries.
    Efficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name=b.name
                                      and b.date_test<'2015-12-02')
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name=b.name
                                      or b.date_test<'2015-12-02')
    Inefficient statement: select * from tb1 a where id in
          (select id from tb2 b where a.name=b.name
                                      or b.date_test=a.date_test)
  • Scalar subqueries that contain correlated items. The following code block provide examples on how to specify the statement that does not contain the correlated subquery and the statement that contains the correlated subquery:
    Efficient statement: select * from tb1 a where id >
            (select id from tb2 b where b.date_test<'2015-12-02')
    Inefficient statement: select * from tb1 a where id >
            (select id from tb2 b where a.name=b.name 
                                        and b.date_test<'2015-12-02')
  • Subqueries in which correlated items span the correlation levels.
    • An SQL statement has multiple correlation levels. The correlated items in each subquery are correlated only with the upper level. The statements that contain the subqueries are efficient. The following code block provides an example on how to specify the statements.
      Efficient statement: select * from tb1 a where id in(select id from tb2 b 
              where a.name=b.name and 
              exists (select name from tb3 c where b.address=c.address))  
    • An SQL statement has multiple correlation levels. The correlated items of subqueries in Table c are correlated with columns in Table a. The statements that contain the subqueries are inefficient. The following code block provides an example on how to specify the statements.
      Inefficient statement: select * from tb1 a where id in(select id from tb2 b 
              where a.name=b.name and 
              exists (select name from tb3 c where a.address=c.address)) 
    Note In the preceding examples, Table a and Table b belong to the same correlation level and Table b and Table c belong to the same correlation level. Table a and Table c are correlated across the correlation levels.
  • Subqueries that contain GROUP BY clauses. Make sure that the grouping columns specified in GROUP BY clauses contain the correlated items.
    • An SQL subquery contains aggregate functions and correlated items. The correlated item b.pk is correlated to the pk grouping column. The statements that contain the subqueries are efficient. The following code block provides an example on how to specify the statements.
      Efficient statement: select * from tb1 a where exists 
          (select pk from tb2 b 
                      where a.pk=b.pk and  b.date_test='2003-04-05' 
                      group by pk);
    • An SQL subquery contains aggregate functions and correlated items. The correlated item b.date_test is not correlated to the pk grouping column. The statements that contain the SQL subqueries are inefficient. The following code block provides an example on how to specify the statements.
      Inefficient statement: select * from tb1 a where exists 
          (select pk from tb2 b 
                      where a.date_test=b.date_test and b.date_test='2003-04-05' 
                      group by pk);

Supported subqueries

PolarDB-X supports the following types of subqueries:

  • Comparisons Using Subqueries

    Comparisons using subqueries indicate subqueries that use comparison operators. These subqueries are most commonly used.

    • Syntax
      non_subquery_operand comparison_operator (subquery)
      comparison_operator: =  >  <  >=  <=  <>  !=  <=> like        
    • Example
      select * from tb1 WHERE 'a' = (SELECT column1 FROM t1)  
      Note Subqueries can be specified only to the right of comparison operators.
  • Subqueries with ANY, ALL, IN/NOT IN, and EXISTS/NOT EXISTS
    • Syntax
      operand comparison_operator ANY (subquery)
      operand comparison_operator ALL (subquery)
      operand IN (subquery)
      operand NOT IN (subquery)
      operand EXISTS (subquery)
      operand NOT EXISTS (subquery)
      
      comparison_operator:=  >  <  >=  <=  <>  !=
    • Example
      • ANY: If a row returned by the subquery meets the expression before ANY, the ANY operator returns TRUE. Otherwise, the ANY operator returns FALSE.
      • ALL: If all rows returned by the subquery meet the expression before ALL, the ALL operator returns TRUE. Otherwise, the ANY operator returns FALSE.
      • IN: If IN is specified before the subquery, IN is used in the same manner as =ANY. The following code block provides an example on how to use the subquery.
        SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
        SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);
      • NOT IN: If NOT IN is specified before the subquery, NOT IN is used in the same manner as <>ALL. The following code block provides an example on how to use the subquery.
        SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
        SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);   
      • EXISTS: If the subquery returns a row, the EXISTS operator returns TRUE. Otherwise, the EXISTS operator returns FALSE. The following code block provides an example on how to use the subquery.
        SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
        Note If a subquery contains a row, the WHERE clause returns TRUE even if the subquery contains only rows in which the values are NULL.
      • NOT EXISTS: If the subquery returns a row, the NOT EXISTS operator returns FALSE. If the subquery returns NULL, the NOT EXISTS operator returns TRUE.
  • Row Subqueries
    • ROW subqueries support the following comparison operators:
      comparison_operator: =  >  <  >=  <=  <>  !=  <=>     
    • Example
      SELECT * FROM t1
        WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
      SELECT * FROM t1
        WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);  
      The preceding two SQL statements can be executed in the same manner. Data rows in the t1 table are returned only when the following conditions are met:
      • The subquery SELECT col3, col4 FROM t2 WHERE id=10 returns only one row. An error is reported if multiple rows are returned.
      • col3 and col4 returned by the subquery are equal to col1 and col2 in the primary table.
  • Correlated Subqueries

    Correlated subqueries are subqueries that contain references to tables that appear in the outer query. The following code block provides an example on how to use the correlated subqueries.

    SELECT * FROM t1
      WHERE column1 = ANY (SELECT column1 FROM t2
                           WHERE t2.column2 = t1.column2);

    In the example, the subquery does not contain the t1 table and its column column2. In this case, the subquery refers to the table in the outer query.

  • Derived tables (subqueries in a FROM clause)

    Derived tables are subqueries specified in a FROM clause.

    • Syntax
      SELECT ... FROM (subquery) [AS] tbl_name ...
    • Example
      1. Prepare data.

        Execute the following statements to create the t1 table:

        CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);
        INSERT INTO t1 VALUES (1,'1',1.0);
        INSERT INTO t1 VALUES (2,'2',2.0);
        Execute the following statement. The query result is 2, '2', 4.0.
        SELECT sb1,sb2,sb3
          FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb
          WHERE sb1 > 1;
      2. Query the average value of grouped data processed by the SUM function.

        If you execute the following SQL statement, an error is reported and no result is returned.

        SELECT AVG(SUM(s1)) FROM t1 GROUP BY s1;
        You can execute the following statement that contains a derived table. The query result is 1.5000.
        SELECT AVG(sum_s1)
          FROM (SELECT SUM(s1) AS sum_s1
                FROM t1 GROUP BY s1) AS t1;
        Note
        • A derived table must have an alias such as t1 specified in the example.
        • A derived table can return a scalar, a column, a row, or a table.
        • Derived tables cannot be correlated subqueries. Derived tables cannot contain references to foreign tables in outer queries.