Database Autonomy Service (DAS) provides the abnormal SQL request identification feature. This feature helps detect abnormal SQL requests from a large number of SQL requests. This helps you identify problems in a more efficient manner. This topic describes the scenarios for which the abnormal SQL request identification feature is suitable and how to configure this feature.

Prerequisites

  • The database instance that is connected to DAS is an ApsaraDB RDS for MySQL instance. The instance runs MySQL 5.6, 5.7, or 8.0.
  • DAS Professional Edition is activated for the instance. For more information, see DAS Professional Edition.

Background information

When your workloads increase, more SQL requests are generated. As a result, the system must monitor more database metrics. Some SQL requests have similar characteristics. Therefore, the full SQL request analysis feature cannot detect each abnormal SQL statement. DAS provides the abnormal SQL request identification feature to aggregate and classify SQL statements based on machine learning methods. This helps you locate abnormal SQL statements in a faster, more efficient, and more accurate manner.

Scenarios

Scenario: A large number of SQL templates are used for your business requests. You cannot locate abnormal SQL requests in a short period of time after they are initiated regardless of how you sort the SQL templates. The queries per second (QPS) diagrams in the red box and the blue boxes illustrate two types of abnormal SQL requests.

Solutions: Use the abnormal SQL request identification feature to identify performance issues. You can use one of the following solutions:

  • Solution 1: If your database instance receives a large number of requests, you can use DAS to classify SQL templates based on SQL behavior. For example, if your business generates hundreds of pages of SQL templates, you cannot locate abnormal SQL templates in a quick manner by sorting all the templates. You can use the abnormal SQL request identification feature. The abnormal SQL request identification feature uses the backend algorithm of DAS to identify similar SQL behavior profiles. The abnormal SQL request identification feature aggregates and classifies a large number of SQL templates to help you identify problems in a more efficient manner.
  • Solution 2: Use the abnormal SQL request identification feature to troubleshoot issues related to metric exceptions based on the periods during which the exceptions occurred. The trend of an abnormal metric. In this figure, the time during which the CPU utilization and active sessions reach the peak values coincides with the time when the occurrences of the SQL templates of the current category abruptly decrease. This indicates that SQL requests that use SQL statements of this SQL template category are blocked when active sessions spike.

Procedure

  1. Log on to the DAS console.
  2. In the left-side navigation pane, click Instance Monitoring.
    Instance Monitoring page
  3. On the Instance Monitoring page, find the instance that you want to manage and click the instance ID in the Instance column. In the left-side navigation pane of the instance details page, click Autonomy Center. Specify a time range and click Search. Then, you can select Exceptional Events, Optimization Events, Auto-Scaling Events, or Others to view the metric values in the specified time range.
  4. The IOPS Usage, CPU Utilization, and Active Sessions metrics. Click the yellow line that is marked with the number 1 and check the results of time series exception detection in the lower part of the page.
    Note Each row in the area that is marked with the number 2 in the preceding figure shows the following information:
    • The first row shows the severity of the time series metric exception. The color varies based on the severity of the exception. Red, yellow, and blue indicate the severity in descending order. Red indicates the highest severity and blue indicates the lowest severity.
    • The second row shows the status of time series exception detection and slow query optimization events that are based on the results of time series exception detection.
  5. Click the Exception Detection area. In the Exception Detection list that appears, click the Anomaly Detection tab and view the causes in the Anomaly Analysis drop-down list. In the Exceptional Metric section, select a metric category and view how the time is distributed for the abnormal metrics of this category.
    Note In the preceding figure, the red anchor points represent the most severe exceptions. Red, yellow, and blue indicate the severity in descending order. Red indicates the highest severity and blue indicates the lowest severity.
  6. In the Anomaly Analysis section, click View Related SQL Statements in the Related Metrics column next to an exception cause. On the SQL Request Behavior Analysis page, select a metric and an associated SQL metric and click Analysis.
    Note The backend system of DAS analyzes the top 1,000 SQL templates. SQL requests that are generated based on these templates have consumed the largest amount of time. The analysis feature analyzes the full SQL data of the ApsaraDB RDS for MySQL instance. DAS Professional Edition retrieves the full SQL data of the instance. A large amount of time is required to retrieve full SQL data. The computing time varies based on the data size. In most cases, the system takes approximately 1 to 5 minutes to analyze the data.
  7. View the SQL Request Behavior Analysis chart and the analysis table.
    Note The first row shows that the value of cpu_usage Correlation is close to 1.0. This indicates that the category to which this SQL template belongs is most correlated with cpu_usage and that this SQL metric is most correlated with the exception metrics.

    The following table describes the parameters of SQL metrics.

    Parameter Description
    Aggregate Images Metrics in each aggregate image are aggregated.
    cpu_usage Correlation An SQL template category whose value of cpu_usage Correlation is closer to 1.00 is more correlated with the exception metric.
    SQL Templates The number of SQL templates of the current category. For more information about SQL templates, see Usage notes.
    Average Time Consumed The amount of time that is consumed to execute the SQL statements during the period of the exception. These SQL statements are generated based on the SQL templates of the current category. Unit: seconds.
    Execution Percentage The percentage of the number of times for which the SQL statements are executed in the total number of SQL executions. These SQL statements are generated based on the SQL templates of the current category. The value is displayed as a decimal number.
    Scanned Rows A value in the Scanned Rows column indicates the total number of database table rows that are scanned by SQL requests. These SQL requests use the SQL templates of the current category.
    Duration Percentage The percentage of the execution time of the SQL statements that use the SQL templates of the current category in relation to the total SQL execution time. The percentage is displayed as a decimal number.
  8. Click View to obtain the correlation results of each SQL template.

    The SQL template represented by a yellow line is closely correlated with mysql.cpu_usage, mysql.innodb_rows_read, and mysql.active_session. The SQL requests that are generated based on this SQL template are the main causes for the spike in CPU utilization.

Note

The abnormal SQL request feature provided by DAS aggregates and classifies a large number of SQL templates. Similar SQL templates have similar occurrence trends.

  • If SQL requests of a category spike for a short period of time but the spike time is different from the metric spike time, the templates of this category are not most correlated with the exception metric.
  • The point in time at which the CPU utilization and the number of active sessions reach the peak values coincides with the point in time at which the requests that use the SQL templates of the category suddenly decreases. This indicates that the requests that use the SQL templates of this category were blocked when the number of active sessions spiked.