This topic describes how to use Database Autonomy Service (DAS) for the business of a big sales promotion.

Background information

DAS is a cloud service that uses machine learning and expert experience to automate perception, healing, optimization, Operations and Maintenance (O&M), and security for databases. It simplifies database management and eliminates service failures that are caused by manual operations. This allows you to ensure the stability, security, and efficiency of your database service.

Background introduction

Since 2009, the business peaks of the big sales promotion and system peaks during Alibaba Double 11 have been rising every year. As the digital transformation of the retail industry and other traditional enterprises speeds up, more enterprises or platforms will launch the business of a big sales promotion in the future. Big promotions include shopping-oriented scenarios, such as Double 11, 618, and Juhuasuan, and other similar scenarios. The similar scenarios include online collaborative working for hundreds of millions of people and concurrent class attending for tens of millions of students.

In scenarios except these high concurrency scenarios for big sales promotions, personnel and systems of many enterprises cannot respond in a timely manner. This issue occurs when DBAs are not enough and a large number of basic issues and a large number of optimization tasks need to processed. This causes business issues. During the big sales promotions, the situations, such as a large number of accesses, connection storms, and complex queries, have extremely high requirements on database services.

DAS supports database engines such as MySQL, PostgreSQL, PolarDB, Redis, and MongoDB. DAS has been incubated and refined in Alibaba business scenarios and formed a powerful database autonomy capability. DAS has helped users automatically optimize more than 42 million SQL statements, automatically reclaim 4 PB of space, and automatically optimize 27 TB of memory.

DAS application in big sales promotion scenarios

During a promotion, database services assume an extremely important role. In actual business scenarios, many enterprises use multiple databases at the same time. This makes management difficult. In big sales promotion scenarios, a centralized and enterprise-level database management platform is required. This platform is expected to efficiently detect, locate, and diagnose issues, and perform batch management and phased control from a global perspective. From the application perspective, DAS automatically synchronizes tags and performs specific operations for specific clusters and groups. From the instance perspective, DAS performs direct operations on different instances, such as MySQL instances.

A big sales promotion is a great challenge for a database in various aspects such as performance and O&M. Before the promotion, you must make full preparations. For example, you must make a comprehensive survey on the database and make an emergency plan for the big sales promotion. During the promotion, you must monitor the entire business to respond to risks and handle issues in a timely manner. After the promotion, you must review and summarize the entire process. In all these scenarios, DAS can provide required solutions.
  • Before a promotion, DAS can help detect issues, assess capacities, eliminate risks, optimize space, and troubleshoot exceptions.
  • During the promotion, you can check whether the database is properly running during peak hours on the real-time performance dashboard of DAS. This helps you detect exceptions in time and handle emergencies.
  • After the promotion, to help enterprises review and summarize the entire big sales promotion, DAS provides diagnostic reports to summarize and present the database performance throughout the entire promotion. This accumulates experience and prepares for business for the next big sales promotion.

Before the big sales promotion

Use DAS through an ApsaraDB RDS database.

In the left-side navigation pane, choose each feature in Database Autonomy Service (CloudDBA).

(Recommended) Directly use the enterprise-level database service DAS.

  1. Log on to the DAS console.
  2. Perform the following operations:
    Inspection and Scoring
    1. In the left-side navigation pane, click Inspection and Scoring.
      Note Inspection and scoring help DBAs detect existing database issues. You can query information such as resource utilization, top slow SQL statements, SQL optimization suggestions, and database and table optimization information.
    2. Click Report to view instance details.
      Note For more information about the step, see Top slow SQL statements.
    Intelligent Stress Testing
    1. In the left-side navigation pane, click Intelligent Stress Testing. On the page that appears, create a stress testing task and generate a stress testing report.
      Note Intelligent Stress Testing allows you to assess the database performance, such as the database capacity and compatibility in actual business scenarios. It can automatically obtain the actual source traffic, allows you to increase stress testing traffic as needed, and supports maximal stress testing. It can also automatically create destination data snapshots and directly generate stress testing reports. For more information about the step, see Intelligent Stress Testing.
    SQL optimization
    1. In the left-side navigation pane, choose Request Analysis > Full Request to identify SQL statements.
    2. Click Optimize.
    3. View optimization suggestions.
    Optimization space
    1. In the left-side navigation pane, click Storage Analysis.
      Note You can predict the number of available remaining days, identify large tables, and identify and optimize tablespace fragmentation.
    Exception detection and diagnostics
    In the autonomy center, you can query events that occur within a specific time range. Such events include exception events, optimization events, and auto scaling events. DAS detects exceptions 24/7 for core metrics of a database. DAS checks sessions, SQL statements, and the capacity of the database to identify the cause. Then, DAS suggests operations for optimization or loss mitigation. You can authorize DAS to automatically perform these operations. For more information, see Autonomy center.

During the big sales promotion

  • Monitoring the screen: Check whether databases are normal in real time

    On the day of the big sales promotion, the most important task is to monitor the screen. This indicates that you must check whether databases are properly running all the time, especially during peak hours. DAS provides multiple screen monitoring methods. For example, you can use real-time performance dashboards and global custom monitoring dashboards for monitoring. This section describes how to use real-time performance dashboards to monitor database operations.

    DAS provides global real-time performance dashboards for you to view the real-time performance of all the accessed instances. If your business is in a big sales promotion or is experiencing major changes, you can use this feature to determine the database health in real time.

    1. Log on to the DAS console.
    2. In the left-side navigation pane, click Real-time Monitoring Dashboard.
      Note The Real-time Monitoring page displays real-time monitoring data of all the database instances in the Accessed state. The page is automatically refreshed.
    3. In the upper-right corner, click Definition to view the meaning of each metric.
      Note If an issue occurs in an instance, click the instance ID to go to the instance details page.
  • View abnormal SQL statements and abnormal sessions

    Databases are the core dependency of most applications. To alleviate pressure on databases, you can perform throttling operations on the application side. However, in the following scenarios, you must configure throttling at the database side.

    Scenario Description
    Sharp rise in concurrent SQL statements Cache penetration or abnormal calls may result in unexpected rise in concurrent SQL statements.
    SQL statements that result in data skew If a large amount of data is queried during a big sales promotion, the overall system is affected.
    Tables that are not indexed If a large number of SQL statements are executed on new database tables that are not indexed, the system slows down and normal services are affected.
    1. Log on to the DAS console.
    2. In the left-side navigation pane, click Instance Monitoring.
    3. Click an instance ID. On the instance details page, click Instance Sessions in the left-side navigation pane.
    4. On the Instance Sessions page, view detailed information in the Instance Sessions and Session Statistics sections of the instance.
      Note You can click a related session and view SQL analysis information.
  • Emergency measure: SQL throttling

    On the day of the big sales promotion, traffic-related exceptions may occur in a database. The exceptions include cache penetration, abnormal resource consumption of SQL statements in a released application, or a spike in the traffic from a specific application. In these scenarios, the database starts to terminate sessions. To restore the stability of the database, you may use common database O&M solutions, such as restarting the database instance or performing a primary/secondary switchover. However, these solutions may not work. The SQL throttling feature of DAS can automatically detect exceptions, find and throttle the problematic SQL statements.

    • Specify SQL throttling rules

      After the SQL throttling rule is created, the client receives error 1317 if the SQL statement submitted by a client contains all the specified keywords. The error message is query execution was interrupted.

      1. Log on to the DAS console.
      2. In the left-side navigation pane, click Instance Monitoring.
      3. Click an instance ID. On the instance details page, click Instance Sessions in the left-side navigation pane.
      4. On the Instance Sessions page, click SQL Throttling.
      5. In the SQL Throttling dialog box, click Create.
      6. Enter the required information and click Create.
        Parameter Description
        SQL Type The type of the SQL statement. Valid values: SELECT, UPDATE, and DELETE.
        Max Concurrency The maximum number of concurrent SQL statements. The throttling rule is triggered when the number of SQL statements that contain a specific keyword reaches the value that is specified for this parameter. If this parameter is set to 0, all the requests that contain the keyword are denied.
        Throttling Duration The duration within which the SQL throttling rule takes effect. SQL throttling is used in emergencies. We recommend that you specify the throttling duration based on your actual requirements and disable SQL throttling when it is no longer needed.
        SQL Keywords The SQL keyword that requires throttling. If you specify multiple keywords, SQL throttling is triggered only when the SQL statements contain all the specified keywords. Separate multiple keywords with tildes (~).
    • View SQL throttling suggestions
      1. Log on to the DAS console.
      2. In the left-side navigation pane, click Instance Monitoring.
      3. Click an instance ID. On the instance details page, click Autonomy Center in the left-side navigation pane.
      4. View SQL throttling suggestions for the exception duration.
      5. Click Throttling.
    • Enable automatic SQL throttling

      After automatic SQL throttling is enabled, the client receives error 1317 if the SQL statement submitted by a client meets the trigger conditions. The error message is query execution was interrupted.

      1. Log on to the DAS console.
      2. In the left-side navigation pane, click Instance Monitoring.
      3. Click an instance ID. On the instance details page, click Autonomy Center in the left-side navigation pane.
      4. In the upper-right corner of the page, click Autonomous function switch.
      5. In the Set dialog box, turn on the Enable Autonomy Service switch. Then, turn on the Automatic Throttling switch, and specify the conditions for triggering automatic throttling.
        Note Assume that you specify the following simultaneous conditions: The CPU utilization is greater than 80%. The number of active sessions is greater than 64. The duration that the situations can last before automatic throttling is triggered is greater than 5 minutes. If the preceding conditions are met within a specified throttling window, automatic SQL throttling is triggered, and the system starts to check whether the conditions are met again. If the issue persists, the system rolls back the throttling operation. After automatic SQL throttling is triggered, the throttling duration does not exceed the maximum throttling duration.
  • Focus on high-risk SQL statements and SQL injection

    On the day of the big sales promotion, you must check whether high-risk SQL statements and SQL injection exist.

    • High-risk SQL statements: DAS identifies the following three types of high-risk SQL statements based on preset rules:
      • Data definition language (DDL) statements such as the statements that are used to create a table, modify the schema of a table, modify an index, and rename a table.
      • Statements that are used to update or delete full tables.
      • Statements that are used for a large number of requests that meet one of the following conditions by default:
        • The number of scanned rows is at least 100,000.
        • The number of returned rows is at least 10,000.
        • The number of updated rows is at least 10,000.
    • SQL injection

      An SQL injection attack inserts malicious SQL statements into web forms, domain names, or URL requests and prompts the server to execute malicious statements. SQL injection can destroy your database. DAS continuously monitors and identifies whether SQL injection occurs in databases, and identifies access sources.

      1. Log on to the DAS console.
      2. In the left-side navigation pane, click Instance Monitoring.
      3. Click an instance ID. On the instance details page, click Security Audit in the left-side navigation pane.
      4. Click Enable. In the dialog box that appears, click OK.

After the big sales promotion

After the big sales promotion, the most important task is to summarize exceptions that occurred during the promotion, and the experience from the perspective of SQL capacity or SQL performance. This can accumulate experience for the next big sales promotion. This way, the next big sale promotion can be more smoothly implemented.

DAS provides diagnostic reports of instances to assist your summarization. You can view the diagnostic reports to understand the running information about instances during the big sales promotion, such as the CPU utilization and top five slow SQL statements.

  1. Log on to the DAS console.
  2. In the left-side navigation pane, click Instance Monitoring.
  3. Click an instance ID. On the instance details page, click Diagnostics in the left-side navigation pane.
  4. Click Create Reports to create a diagnostic report.
  5. Click the corresponding diagnostic report to view details of the report.

Summary

You can perform the preceding operations to complete database tasks, such as business preparation, screen monitoring, issue handling, SQL optimization, and final review and summary, throughout the big sales promotion. This helps enterprises better cope with the challenge of the big sales promotion.

Learn More

For more information about DAS, use your DingTalk to scan the QR code.

Follow the WeChat official account and reply with the keyword DAS.