By Sigan, senior database expert from Alibaba Cloud ApsaraDB
Databases play an increasingly important role in our business systems. Like other companies, most Alibaba services rely heavily on databases. A minor database error may cause a significant impact on our business. Stabilizing and constantly optimizing the databases have always been crucially important.
Database optimization is usually implemented in three levels:
1) Application level: Optimizes application code logic to process data more efficiently.
2) Instance level: Adjusts environment parameters to improve the operating efficiency of instances.
3) SQL level: Ensures data is optimally retrieved through physical database design and SQL statement modification.
Developers are usually familiar with the first two levels and somewhat unfamiliar with SQL optimization, which may even lead to disputes over who should be responsible (the database administrator or application developer). However, SQL optimization is critical to database optimization. Poor SQL performance may lead to low execution efficiency or even interrupted services.
Based on experience, about 80% of database performance problems can be solved through SQL optimization. However, SQL optimization is a very complex process that requires a wide range of database expertise and experience.
For example, it requires knowledge about how to accurately identify bottlenecks in an execution plan and optimize the execution plan by improving the physical database design or rewriting SQL statements. In addition, SQL optimization is time-consuming and costly due to the large and constantly changing SQL workload and basic data.
SQL diagnosis and optimization is one of the most important services of Alibaba Cloud Database Autonomy Service (DAS). DAS performs diagnostic analysis by using SQL statements as the input and provides expert optimization suggestions such as indexing, statement optimization, and expected benefits. You can obtain expert suggestions on SQL diagnosis, rewriting, and optimization to maximize SQL execution performance, without profound knowledge about database optimization.
In addition, based on this capability, the DAS auto-SQL optimization service has pushed SQL optimization to a higher level. The service transforms passive optimization from relying on manual work into intelligence-based active optimization, enabling unattended SQL optimization through an autonomous process.
Next, we will explain the capability building process for the DAS SQL diagnosis and optimization service in detail.
When it comes to the diagnosis and optimization capabilities, we naturally think of two questions:
Answering these two questions perfectly poses several challenges:
In the field of SQL diagnosis and optimization, two types of optimization algorithms are commonly used: the rule-based and the cost model-based algorithms. Because of its simplicity and ease of implementation, rule-based recommendation is widely used in many products and services nowadays. It is perfect for databases that lack the WHAT-IF kernel capability, for example, MySQL databases. However, this recommendation algorithm is inflexible and thus cannot ensure the recommendation quality. Let's have a look at the example about recommending indexes for the following SQL query:
SELECT * FROM t1 WHERE time_created >= '2017-11-25' AND consuming_time > 1000 ORDER BY consuming_time DESC
In most cases, the following four candidate indexes are generated based on a specified rule:
IX1(time_created) IX2(time_created, consuming_time) IX3(consuming_time) IX4(consuming_time, time_created)
Which index (or indexes, depending on the AND or OR relationships between indexes) is (or are) finally recommended to users? It is difficult to give an exact answer. This example only uses a simple single-table SQL query. What should we do with complex SQL queries (for example join-table queries) and queries with complex subqueries? The recommendation accuracy is even lower.
To solve this problem, the DAS SQL diagnosis and optimization service is implemented based on cost models, in the same way as the database optimizer. It quantitatively evaluates all possible candidate recommendations (or at least most of them in some extreme cases, where a suboptimal solution is obtained for NP optimization problems) and finally makes recommendations based on the execution costs. Using this algorithm, we find ourselves faced with other challenges when providing support for open-source databases such as MySQL.
The method that the SQL diagnosis and optimization service uses to ensure SQL compatibility (which involves SQL parsing and SQL semantic verification) directly affects the comprehensiveness of the service and the success rate of diagnosis. Either incompatibility or insufficient compatibility is an issue.
Building SQL diagnosis and optimization capabilities have long been a challenging topic. The challenges lie not only in how to incorporate expert knowledge about database optimization but also how to build a comprehensive test-case library to verify the core capabilities. The test-case library is like a ruler that measures the capabilities and the driving force that accelerates the capacity building process. Therefore, an accurate test-case library with sufficient coverage is essential for capability building throughout the process.
However, building a desirable test-case library is very difficult. The two main challenges are as follows:
(1) Many factors affect SQL optimization. For example, hundreds of factors affect index selection, and the factors can form various combinations. As a result, we have an enormous feature set for test cases. It also requires significant effort to map these features to test cases.
(2) Professional knowledge and massive amounts of information are required for designing test cases, even for a single test case. For example, an index recommendation test case involves:
a）schema设计：如表、已有索引、约束等； b）各类统计信息数据； c）环境参数等等。
The SQL diagnosis and optimization service is expected to serve millions of database instances in the cloud. Its online service capabilities also face great challenges, such as service-oriented division of complex computing services, horizontal scaling of computing services, maximum parallelism, concurrency control in a distributed resource access environment, effective scheduling based on different priorities, and peak traffic buffering.
To address the aforementioned challenges, we built an SQL diagnosis and optimization engine and a capability test set for SQL diagnosis optimization. This section describes the core technical architecture of this engine, and how we built the test set.
Figure 1. Core technical architecture of the SQL diagnosis and optimization engine
Figure 1 shows the core technical architecture of the SQL diagnosis and optimization engine, which consists of optimizers independent of the database. This engine adaptively collects statistics and calculates the costs of the execution plans, to address the lack of the WHAT-IF kernel capability and statistical information. The engine works in the following process:
As mentioned earlier, our objective is to build a capability-based test set with sufficient coverage. We would use it as a ruler to measure the capabilities, and a driving force to accelerate the capability building process. In this process, we have built a development model centered on the test case system, as shown in Figure 2.
Figure 2. Test case system
The basic idea for building the capability-based test set is to formalize the feature-based description of test cases through feature extraction, to build a sufficiently complete, formal feature-based test-case library.
Within Alibaba Group, we have been collecting and storing all SQL statements executed in Alibaba's internal database instances in real-time. Based on Alibaba's diversified services and SQL scenarios and formalized feature descriptions, we have analyzed all our online SQL resources and searched for real cases that match the specific features, extract the required information, and finally build a test case library. Note that all data in the test case library was collected from Alibaba Group's internal business systems. All online data such as statistical information has been encrypted and desensitized. This process is fully automated.
Finally, we compare the formal feature-based test-case library with features of the actual test-case library to evaluate the test completeness and coverage. This step answers the following questions:
1) Which formal feature-based test cases are covered by the actual test-case library? What is the percentage?
2) Which formal feature-based test cases are not covered by the current diagnosis and optimization capabilities? Which test cases fail the verification?
3) Which formal feature-based test cases encounter frequent regression problems in a specific period of time?
4) What is the test case coverage of each capability?
Before the official release of DAS, the DAS SQL diagnosis and optimization service had been running stably in Alibaba Group for more than three years, diagnosing about 50,000 SQL statements on average every day. It effectively optimizes SQL statements for Alibaba Group's business applications and mainly applies to the following scenarios:
1) Self-service optimization: The service diagnoses exceptions reported by users and provides expert optimization suggestions.
2) Auto-optimization: The service automatically identifies slow queries in the workload of business database instances, proactively diagnoses these queries, and generates optimization suggestions. After a comprehensive evaluation, it schedules optimization tasks. Then it implements the optimization task and tracks the effectiveness of the solution. This is a fully automated, closed-loop optimization process that improves database performance and continuously keeps the database instances in the optimal state.
Over the past three years, the SQL diagnosis accuracy remained above 98%, and more than 75% of the diagnosed slow queries were provided with optimization suggestions.
By the end of March 2020, more than 42 million slow SQL queries had been optimized through auto-SQL optimization, reducing Alibaba Group's slow SQL queries by about 92%.
More importantly, the SQL diagnosis and optimization service has built an effective and proactive analysis and feedback system. Online diagnosis failures, user-reported cases, and cases rolled back during auto-optimization have been automatically sent to the case system to drive rapid iteration and evolution of the service.
We will post a series of articles on a weekly basis to introduce features of Alibaba Cloud DAS in detail, such as AutoScale, exception detection, auto-SQL optimization, workload-based SQL review, intelligent stress testing, and intelligent parameter tuning. Please stay tuned.
SQL diagnosis and optimization is free if you have an Alibaba Cloud DAS instance. Visit the Database Autonomy Service page to apply for a free trial of DAS.
Alibaba Clouder - December 21, 2020
ApsaraDB - July 8, 2020
Alibaba Cloud New Products - June 1, 2020
ApsaraDB - October 21, 2020
ApsaraDB - June 29, 2020
Alibaba Cloud New Products - June 1, 2020
Fully managed and less trouble database servicesLearn More
ET Brain is Alibaba Cloud’s ultra-intelligent AI Platform for solving complex business and social problemsLearn More
This solution provides you with Artificial Intelligence services and allows you to build AI-powered, human-like, conversational, multilingual chatbots over omnichannel to quickly respond to your customers 24/7.Learn More
Leverage cloud-native database solutions dedicated for FinTech.Learn More
More Posts by Alibaba Cloud New Products