This topic describes the cost-based SQL diagnostics engine provided by Database Autonomy Service (DAS).

Background information

Databases act as an important component in service systems. Most business scenarios require the use of databases. A small jitter that occurs in a database can affect services in a negative manner. For the optimal performance of your business, database performance must be stable and databases must be frequently optimized. This is always an important requirement of services. In most cases, applications, instances, and SQL statements can be optimized to improve database services.
  • Applications: The code logic of applications can be optimized. This way, applications can process data in a more efficient manner.
  • Instances: The values of environment parameters can be adjusted. This way, instances can work in a more efficient manner.
  • SQL statements: SQL statements can be optimized by using different methods. For example, physical databases are provided and SQL statements are rewritten. This way, data can be queried in a more efficient manner.

In most cases, developers are more familiar with the optimization of applications and instances than the optimization of SQL statements. They may argue about whether database administrators or application developers are in charge of SQL optimization. SQL optimization is an essential part of database optimization. If online SQL queries have performance issues, your services process data at a lower rate or even become unstable.

In most cases, approximately 80% of database issues can be resolved by SQL optimization. However, SQL optimization is a complex process that requires expertise and experience in databases.

Developers must identify the bottlenecks of execution plans and determine the optimal execution plan to be executed by the optimizer by using methods such as optimizing physical databases and rewriting SQL statements. SQL optimization is time-consuming because the workloads for executing SQL statements and the amount of data involved in SQL statements are large and frequently change. This makes SQL optimization complex and costly.

Challenges

The following two questions are frequently asked regarding the SQL diagnostics and optimization feature:
  • Is the feature reliable?
  • Is the feature comprehensive?
The answer to both questions is yes. To make sure that the SQL diagnostics and optimization feature is reliable and comprehensive, the following challenges must be tackled:
  • How can a reliable recommendation algorithm be chosen to generate reliable optimization suggestions?
    In the field of SQL diagnostics and optimization, the rule-based algorithm and the cost model-based algorithm are common recommendation algorithms for optimization suggestions.
    • Rule-based method
      The rule-based recommendation method is widely used in databases, especially in databases that lack the What-If Tool (WIT) capability, such as MySQL databases. The rule-based method is simple and easy to implement. However, the recommendations are not flexible and cannot meet relevant requirements. For example, the rule-based method is used to recommend indexes for the following simple SQL query:
      SELECT *
      FROM t1
      WHERE time_created >= '2017-11-25'
      AND consuming_time > 1000
      ORDER BY consuming_time DESC;
      The following candidate indexes are generated based on the rule-based method:
      IX1(time_created)
      IX2(time_created, consuming_time)
      IX3(consuming_time)
      IX4(consuming_time, time_created)

      You cannot determine which index or indexes are recommended. If index ORing or index ANDing is used, multiple indexes can be recommended. The cause for this issue is that the rule-based method cannot provide precise recommendations. In this example, this SQL statement is executed to query data from a single table. If multiple tables need to be joined and an SQL query contains complex subqueries, the recommended result is more difficult to determine.

    • Cost-based method
      The SQL diagnostics and optimization feature of DAS is implemented by using the cost model-based method. This allows DAS to evaluate optimization issues in the same way as the database optimizer. DAS can evaluate all possible candidates and provide the recommended result by using the cost model-based method. In extreme cases, DAS cannot evaluate all candidates but evaluates as many candidates as possible. This is because the optimal solution is used to solve non-deterministic polynomial (NP) problems. For open source databases such as MySQL databases, this feature can encounter the following issues:
      • Lack of the WIT capability: The WIT capability provided by the database optimizer of the kernel cannot be reused to evaluate the costs of candidate optimization solutions.
      • Lack of statistics: The costs of execution plans are calculated to evaluate the costs of candidate optimization solutions. If no statistics are available, the costs of candidate optimization solutions cannot be evaluated.
  • How can the SQL diagnostics and optimization feature be fully compatible with SQL statements?

    The SQL diagnostics and optimization feature that is developed must be fully compatible with SQL statements. The feature must support SQL parsing and SQL semantics verification. SQL compatibility is directly related to the comprehensiveness of the feature and the success rate of SQL diagnostics.

  • How can a comprehensive capability test set be built?

    SQL diagnostics and optimization have been considered a challenge for a long time. Database optimization expertise needs to be incorporated into the feature. A large test case library needs to be built to verify the core capabilities of the feature. The library can be used to measure and improve feature capabilities. A comprehensive and accurate test case library is crucial in improving the feature.

    The following requirements must be met to build a qualified test case library:
    • Comprehensiveness: SQL optimization can be affected by a number of factors. For example, index selection is affected by hundreds of factors, and the combinations of these factors form a large collection of case signatures. It is a demanding task to map these signatures to test cases one by one.
    • Expertise and a large amount of information: Expertise and a large amount of information are required to design a test case. For example, an index recommendation case contains the following information:
      • Schema design: The schema data includes tables, existing indexes, and constraints.
      • Multiple types of statistics.
      • Environment parameters.
  • How can the large-scale diagnostics capability be built?

    The SQL diagnostics and optimization feature must be capable of serving millions of cloud database instances. The online capabilities of this feature are faced with great challenges. For example, a complex computing architecture is required to implement service splitting and the compute nodes can be added or removed. The challenges also include the maximum number of concurrent query requests, concurrency control in the distributed resource access environment, valid task scheduling based on different emergency priorities, and peak value handling.

Solutions

The SQL diagnostics and optimization feature is one of the core features of DAS. DAS checks and analyzes SQL statements, and then provides professional optimization suggestions such as suggestions on indexes and SQL optimization, and information such as estimated performance improvement. This helps you maximize the performance of SQL statement execution.

The automatic SQL optimization feature of DAS changes your optimization process from a manpower-intensive process to an automated-intelligent process.

Capability building

This section describes how to build the core architecture and the capability test set of the SQL diagnostics and optimization engine of DAS. The capability test set is the same as the test case library of the SQL diagnostics and optimization feature.

Core architecture of the SQL diagnostics and optimization engine
The preceding figure shows the core architecture of the SQL diagnostics and optimization engine. The engine is an optimizer that is independent of databases. The engine can automatically collect complete statistics and can calculate the costs of execution plans. This way, the engine provides a capability that is similar to the WIT capability. The following list describes the workflow of the engine:
  1. SQL parsing and verification: The engine parses an SQL statement, verifies whether the syntax of the input query statement complies with the standard format, identifies the elements of the query statement, and then forms a syntax tree. The syntax tree includes predicates, predicate types, sorting fields, aggregation fields, query fields, and data types of fields in the query statement. The engine checks whether the tables and fields specified in the SQL statement comply with the schema of the specified database.
  2. Candidate index generation: The engine generates multiple combinations of candidate indexes based on the generated syntax tree.
  3. Cost-based evaluation: The built-in optimizer that is independent of the database kernel obtains database statistics. The statistics are cached in the diagnostics engine. The built-in optimizer for the diagnostics engine calculates the costs of candidate indexes based on the statistics and evaluates the costs of each index and the costs of each method used to rewrite the SQL statement. This way, the optimal index or the optimal method used to rewrite the SQL statement can be selected based on the costs.
  4. Index merging and optimal index selection: The engine can process one or more SQL statements or all SQL statements that are executed on the database instance. The engine generates index recommendations for multiple SQL statements. The existing physical indexes and the index recommendations for different SQL statements may include the same indexes, indexes with the same prefix, and similar indexes.

A comprehensive test set must be built to test and improve the capabilities of the SQL diagnostics and optimization engine. The SQL diagnostics and optimization feature is developed. The developed capabilities are centered around the use case system, as shown in the following figure.

Use case system

Test cases are described based on their signatures so that a comprehensive signature library for test cases can be created. This is the basic principle used to create a capability test case.

In Alibaba Group, all SQL statements executed on database instances are collected and stored in real time. Signature-based description is used based on the Alibaba ecosystem and various SQL scenarios to analyze and search for a large number of online full SQL resources, and accordingly to obtain real cases that meet the specified signatures and extract the information required for test cases. This way, the test case library can be built.
Note All data in the test case library is collected from the internal business of Alibaba Group. The involved online extracted information, such as statistical information, is encrypted and masked. This process is automated.
The features in the signature library of test cases are compared with the signatures included in the test case library to evaluate the comprehensiveness and coverage of test cases. You can evaluate test cases based on the following four aspects:
  • What test cases in the signature library are covered by test cases and what is the comprehensiveness?
  • What test cases in the signature library are not covered by the SQL diagnostics and optimization feature or failed SQL verification?
  • What test cases in the signature library can frequently encounter regression problems in a period of time?
  • What is the coverage rate of test cases at each capability level?

Optimizations

Before the SQL diagnostics and optimization feature of DAS is released on Alibaba Cloud, the feature has been running on Alibaba Group services for years. In this period, the feature diagnosed an average of approximately 50,000 SQL statements per day. This way, SQL statements for Alibaba Cloud applications are optimized in an efficient manner. The SQL diagnostics and optimization feature can be used in the following scenarios:
  • Custom optimization: The SQL diagnostics and optimization feature can be used to diagnose the abnormal SQL statements that are specified by Alibaba Group users and provide expert optimization suggestions for the SQL statements.
  • Automatic optimization: The SQL diagnostics and optimization feature automatically identifies slow queries based on the workloads of database instances, diagnoses the SQL statements, and then generates optimization suggestions. After the feature evaluates the suggestions, the feature schedules optimization tasks, completes the optimization online, and monitors the performance of the SQL statements. The optimization process is fully automated to improve database performance. This way, your database instances can always run in optimal conditions.

An active analysis and feedback system is provided for the SQL diagnostics and optimization feature. This way, rollback cases that are generated when SQL statements are automatically optimized, online failed diagnostics cases, and cases identified by user feedback are automatically sent to the use case system. This iterates and improves the feature.

Supported database engines

SQL diagnostics supports the following database engines: ApsaraDB RDS for MySQL, PolarDB for MySQL, ApsaraDB RDS for PostgreSQL, and PolarDB for PostgreSQL (Compatible with Oracle).

Note The cost-based SQL diagnostics engine can recommend optimal In-Memory Column Indexes (IMCIs) for the PolarDB for MySQL database engine. For more information about the IMCI feature, see Overview.