All Products
Search
Document Center

Database Autonomy Service:Automatic SQL throttling

Last Updated:Mar 05, 2024

SQL throttling limits the concurrency of SQL statements that are executed on a database. This feature limits the concurrency of abnormal SQL statements and ensures that the database responds to business requests in a normal manner and most of the services run as expected. This way, you can trade off a small number of compromised services for the proper running of most of the services.

Background information

Due to the growth of technology, especially the wide adoption of cloud databases, database systems become increasingly stable and easier to maintain. Some O&M tasks such as version upgrades and instance migration can be automatically completed and the upper-layer services are nearly not affected. If hardware device or network failures occur, the inspection system can migrate services and restart the database system in a quick manner to ensure service stability. However, nearly all of the existing O&M methods are used to ensure the stability of servers. Issues that are caused by abnormal operations in the business must be manually handled. For example, slow SQL statements are introduced during business changes, or a traffic surge occurs. When these business exceptions occur, none of the preceding O&M methods can be used to handle these exceptions in a quick manner to prevent system failures.

Problems

  • Traffic: Traffic surges affect the normal services of databases. For example, cache penetration, abnormal calls, and promotions cause a sharp increase in the concurrency of SQL statements.

  • Data: Data skew caused by uneven distribution of SQL statements affects the normal services of databases. For example, a large number of SQL statements are submitted from a user who has placed a large order. These SQL statements consume a large number of database resources and result in slow database responses.

  • SQL statements: Resource-intensive SQL statements affect the normal services of databases. For example, a large number of SQL statements are submitted to query tables that are not indexed. In this case, a large number of resources are occupied and the entire system is affected.

FAQ

  • How can I immediately identify an exception when it occurs?

  • How can I identify SQL statements on which throttling must be implemented after the exception is identified?

  • How can I extract the keywords of SQL statements on which throttling must be implemented to restore services and minimize the negative effect on services?

  • How can I determine whether the throttling operation is performed as expected after throttling is implemented?

You may encounter other issues in actual situations. For example, you may fail to contact the personnel on duty. The personnel may not have available computers, a large amount of information results in significant difficulties in analysis, or improper operations may be performed because of high pressure and anxiety.

To handle these issues, make sure that the entire process is automated if possible. The entire process covers exception discovery, locating abnormal SQL statements, SQL throttling, tracking, and rollbacks.

Note

The solution of automatic SQL throttling was released to respond to this need. This solution has been applied within Alibaba Group for more than two years and was launched on Alibaba Cloud in February 2020. You can use this solution in Database Autonomy Service (DAS).

Interpretation

Entire process:Entire process

  • Collection of monitoring metrics: By default, host and engine performance metrics are collected for ApsaraDB RDS instances that are deployed on Alibaba Cloud. The performance metrics include CPUs, input/output operations per second (IOPS), queries per second (QPS), and active sessions. The real-time performance data is the basis for subsequent analysis and actions.

  • Anomaly detection: DAS uses machine learning algorithms to train a model offline by using historical performance data of database instances, and then uses this model to detect exceptions based on real-time metrics. Compared with threshold-based detection, the anomaly detection feature provides real-time notifications when system events trigger alerts. For more information about this feature, see the subsequent topics.

  • Root cause analysis: DAS subscribes to each exception of database instances and collects the information about current sessions if an exception occurs. Then, DAS uses the full request analysis feature in the SQL audit and the statistics in performance_schema to identify the root cause of the exception. DAS divides abnormal SQL statements into the following four types based on the root causes:

    • Blocking SQL statements: DAS analyzes sessions, locks, and running transactions in real time to detect SQL statements that cause DDL changes, lock waits, and large transactions. DAS also checks the number of sessions that are affected by the SQL statements and the execution time of the SQL statements. If a large number of sessions are affected or the execution time of SQL statements is excessively long, DAS closes the sessions instead of implementing throttling on the SQL statements.

    • Resource-intensive SQL statements: The number of concurrent SQL statements of this type may not be large. However, the SQL statements of this type consume a large number of CPU, I/O, or network resources, and are continuously submitted.

    • Traffic-intensive SQL statements: These SQL statements are normal statements. However, when a large number of such SQL statements are executed at the same time, they consume a large number of resources in databases. This causes performance bottlenecks and exceptions, such as slow responses to simple key-value queries.

    • Other SQL statements: The SQL statements of this type are the other SQL statements that cause database exceptions.

  • Automatic throttling: When DAS detects resource-intensive and traffic-intensive SQL statements, DAS automatically extracts the features of these SQL statements. After you grant required permissions to DAS, DAS automatically implements throttling on all the SQL statements that have these features. During this process, the greatest challenge is to extract accurate features of SQL statements for precise throttling to prevent the entire services from being compromised.

  • Feature extraction: Each time DAS detects abnormal SQL statements on which throttling must be implemented, DAS extracts the features of the SQL statements for precise throttling. In ideal scenarios, the extracted features are unique and throttling is implemented on only the identified abnormal SQL statements. DAS supports two types of SQL throttling:

    • SQL template-based throttling: An SQL template is an SQL statement where the parameters do not have specific values. If a large number of SQL statements that use a specific SQL template are executed at the same time, exceptions may occur, regardless of parameter values. Therefore, you need only to extract the features of the SQL template.

    • SQL statement-based throttling: This type of throttling is applicable to scenarios in which data skew is detected. If some SQL statements that use a specific SQL template cause exceptions, you must extract features from the SQL template and the SQL statements that contain specific parameter values.

    In SQL template-based throttling, DAS extracts the problematic template ID from SQL statements and implements throttling on the statements that have the template ID. The template ID may be an SQL ID that is automatically generated by the database middleware or an SQL hint that is added by developers to the SQL template.

    This allows for precise throttling and ensures that SQL statements that use other templates are not affected. However, identical SQL statements that do not contain the template ID, such as those submitted in the command line, are not affected by the throttling. These statements can still be executed.

    Therefore, if SQL statements do not contain the template ID, you must extract features from SQL statements. SQL templates are generated based on computing in the analysis process. In the following example, SQL Template 1 is generated based on SQL Statement 1, and SQL Template 2 is generated based on SQL Statement 2. Assume that you use SQL template 1 for SQL throttling. The features of SQL template 1 are select id, name, age from students where name.

    /* SQL Statement 1 */
    select id,name,age from students where name='Bob';
    
    /* SQL Template 1 */
    select id,name,age from students where name=?
    
    /* SQL Statement 2 */
    select id,name,age from students where name='Bob' and sid='Unique ID';
    
    /* SQL Template 2 */
    select id,name,age from students where name=? and sid=?

    If these features are used for throttling, throttling is implemented on the SQL statements that have these features, regardless of the submission methods. The drawback is that throttling may be implemented on some SQL statements in an unexpected way. For example, SQL template 2 has all the features of SQL template 1. Therefore, DAS also implements throttling on SQL statement 2 based on SQL template 1. This may fail to meet your expectation.

  • Automatic optimization: After DAS detects an abnormal SQL statement that can be optimized during root cause analysis, DAS implements throttling as an emergency response. DAS also automatically optimizes the SQL statement. DAS automatically creates indexes for the relevant tables for optimization. For more information, see the subsequent topics.

  • Tracking and rollback: After DAS starts automatic SQL throttling, DAS keeps tracking the performance of a database. If the workload of the database is not decreased or if the traffic is not reduced as much as expected, DAS rolls back the throttling operation. Then, DAS identifies the root cause again.