When you write and optimize SQL statements in cloud-native Data Warehouse AnalyticDB for MySQL Edition , you must consider the distribution characteristics of data. This topic summarizes the skills in writing and optimizing SQL statements.

SQL development rules

To compile and optimize SQL statements of AnalyticDB for MySQL, take note of the following points:

  • Simple SQL statements

    In most cases, the performance of databases declines when the complexity of SQL queries increases. For example, databases that use single-table queries (redundant data) have better performance when compared with databases that use table join queries.

  • Reduced I/O as a core method to optimize SQL statements

    You can reduce column scans. Index scans can return the minimum amount of data and reduce the I/O operations and memory overheads.

  • Distributed computing, local computing, and parallel computing

    In scenarios that include big data computing, local computing fully uses multiple distributed computing resources to avoid the transmission of data across nodes.

  • High QPS and partition pruning

    Partition pruning must be used for tables and SQL statements because business systems require high queries per second (QPS) and response time (RT) in milliseconds.

SQL optimization rules

  • Remove unnecessary columns

    cloud-native Data Warehouse AnalyticDB for MySQL Edition provides hybrid row-column storage. The number of columns that are returned affects the SQL performance. When you write SQL statements, we recommend that you specify only the columns that are required in your business. We recommend that you do not use an asterisk (*) to execute statements on all columns.

    Examples
    • Incorrect syntax
      select * from tab1 where c1>100 and c1<1000;
    • Correct syntax
      select col1, col2 from table_name where c1>100 and c1<1000;
  • Indexes and scans

    When an SQL statement contains multiple query conditions, you can use indexes for advanced filter conditions and use scans for the remaining conditions.

    cloud-native Data Warehouse AnalyticDB for MySQL Edition provides hybrid row-column storage. You can use a single column to filter data in an efficient manner. Then, you can use internal record pointers to scan other column values. This way, the indexing and query overheads produced by other columns are reduced.

    Examples
    • Use internal scans for the time condition
      In the following example, the c1 and time fields are used in the following SQL statement to filter data. If c1=3 is used, a small number of records such as 10,000 are returned. However, if time>'2010-01-01 00:00:00' is used, a large number of records are returned.
      select c1,c2 from tab1 where c1=3 and time >='2010-01-01 00:00:00';
      To improve query efficiency and to return more valid records, you can use only c1 to index data and use the internal scan method for the time condition. Example:
      /* +no-index=[tab1.time] */
      select c1,c2 from tab1
      where c1=3 and time>='2010-01-01 00:00:00';
      /* +no-index=[tab1.time] */

      Hint indicates a forcible scan for the time>='2010-01-01 00:00:00' condition.

      The computing engine first retrieves the index of the c1 column to obtain the row set that meets the c1=3 condition. Then, the engine reads data from the time column of each row. If the time in the time column of a row meets the time >='2010-01-01 00:00:00' condition, the engine adds this row to the return result.

    • Use internal scans for the <> condition
      Queries that include the <> condition such as c2<>100 do not work the same as index scan. The c2<>100 condition cannot filter records. Example:
      select c1,c2 from tab1 where c1=3 and c2<>100;
      Add the no-index hint to perform an internal scan for the <> condition. Example:
      /* +no-index=[tab1.c2] */
      select c1,c2 from tab1 where c1=3 and c2<>100;
    • Use internal scans for the like condition

      Queries with an infix or suffix such as like '%abc' or like '%abc%'.

      Add the no-index hint to perform an internal scan for the like condition to quickly query valid records. Example:
      /* +no-index=[tab1.c3] */
      select c1,c2 from tab1 where c1=3 and c3 like '%abc%';
  • Index failures

    If the index used for a query fails, the SQL statement scans all records in the table. If the table contains a large number of records, the query performance degrades.

    For example, index failures may occur in the following scenarios:
    • Function conversion (column)
    • Type conversion.
    • The like condition such as like '%abc%'
    Examples
    • In the following SQL statement, the function conversion causes the index to fail. time is of the TIMESTAMP type and the storage time is 2017-12-10 10:00:23.
      select c1,c2 from tab1 where substr(cast(time as varchar),1,10)='2017-
      10-12';
    • Correct syntax
      select c1,c2 from tab1 where time>='2017-10-12 00:00:00' and time<='2017-10-12 23:59:59';
  • Remove unnecessary is not null filter conditions
    Examples
    • Incorrect syntax
      Select c1, c2 from tab1 where c1>100 and c1<1000 and c1 is not null;
    • In the preceding SQL statement, c1 is not null is an unnecessary filter condition. The following SQL statement is an optimized one:
      Select c1,c2 from tab1 where c1>100 and c1<1000;
  • Multi-table join

    Different SQL optimization rules are used in different multi-table join scenarios.

    • If fact tables are joined with fact tables, JOIN conditions of partition columns are included in the fact table. If not, filter out unnecessary data by using the WHERE condition.
    • No limits are imposed when dimension tables are joined with fact tables.
    In multi-table join queries, you must specify the filter conditions for each table in the WHERE condition. In traditional databases, the index field is used to join tables to quickly search for data. Example:
    Select count(*) 
    from customer_table C join 
    order_table O on C.customer_id= O.customer_id
    where O.order_time between'2018-07-20 10:00:11' 
    and '2018-09-30 10:00:11' 
    and O.order_amount=100;

    When you confirm that the t2 and t1 tables have the same time and type filters, we recommend that you modify the SQL statement to the following statement:

    Select count(*)
    from t1 join t2 on t1.id=t2.id
    where t1.time between '2017-12-10 00:00:00' and '2017-12-10 23:59:59'
    and t1.type= 100
    and t2.time between '2017-12-10 00:00:00' and '2017-12-10 23:59:59'
    and t2.type=100