Slow Sql Governance

Slow Sql Governance Introduction:

Slow Sql Governance. Cainiao Supply Chain Finance has been managing slow SQL for some time, and the application I am responsible for has not had a slow SQL warning for a long time. Here are some practices in the governance process to share.

one full table scan

Slow Sql Governance case


SELECT count ( *) AS tmp_count FROM (
SELECT * FROM `XXX_rules` WHERE 1 = 1 ORDER BY gmt_create DESC ) a

1 Slow Sql Governance Traceability

Slow Sql Governance.In the article on paging query management, the old paging query logic of our system has been introduced. The above query sql is obviously a paging query to obtain the total number of records. Through the paging query interface of the XXX_rules table, we trace the source and find the page that initiates the call. A page that operates merchant access. After the page is opened, the paging query interface is directly called. Except for the paging parameters, no other query parameters are passed in, resulting in scanning the whole table .

2.Slow Sql Governance Analysis


Soul torture: why scan the whole table ? The full table data is displayed on the page, is the fancy data useful?
Research: After chatting with operators who often use this page, I learned that the full table data queried by opening the page is useless for operations, and they don't look at these data at all. The operating habit of operation is to get the merchant id, enter the merchant id in the query box on the page, and operate after finding the merchant data.

3 Slow Sql Governance Solutions


From this, the optimization plan is very clear: do not directly query the full amount of data when opening the page, and after the operation enters the merchant id, the merchant id is used as a parameter to query. In the XXX_rules table, the common query condition of merchant id is set as an index, and combined with paging query optimization, the slow SQL of full table scan can be solved.
The optimized secondary background page is as follows:

When the page is opened, no data is queried, and the query condition merchant account is required.
The optimized sql is:
SELECT count ( *) AS tmp_count FROM (
SELECT * FROM ` xxx_rules` _ WHERE 1 = 1 AND ` rule_value ` = '2928597xxx ' ) a
Execute EXPLAIN to get the following results:

It can be seen that the index is hit, the number of scanned rows is 3, and the query speed is significantly improved.


4. Slow Sql Governance thoughts


Scan full table management is simply to add query conditions, hit indexes, and remove full table scan queries. Although it is a bit rude, it is not unreasonable. In actual business scenarios, it is rarely necessary to scan the entire table to obtain all data. It is limited to pass in query conditions to the upstream of the call, and the query conditions can hit the index, which can largely avoid slow SQL .
In addition, by extension, the initial purpose of XXX_rules is an access table, which records the access status of the financial consignor dimension, with a maximum of several thousand pieces of data, but many colleagues understand this table as a rule table and write many business-related rules. , causing the table to expand to more than one million pieces of data, and the table is not clean. This involves the design and use of the data table, the specification of the use of the table, and the avoidance of writing data indiscriminately, which can bring great convenience to later maintenance.

Secondary index confusion

1 Slow Sql Governance Analysis
In addition to the time and operator fields, the XXX_rules table has four fields: rule_name , rule_value , status, and product_code . The index of the table makes various permutations and combinations of these four fields. The following problems exist:
1. The dispersion of rule_name is not high, and it is not suitable to put it at the first place of the index;
2. The first three indexes have a high degree of coincidence;
Obviously, it is not enough to understand the hit rules of the index. Many businesses in the XXX_rules table have scheduled tasks to write and delete them, and the indexes are many and chaotic, which has a great impact on performance.
What are the high-performance indexes, let's review:
1. Independent column: The index column cannot be part of the expression;
2. Select a column with a high degree of discrimination as an index;
3. Select the appropriate index column order: put the index column with high selectivity in the first column ;
4. Covering index: the columns of the query are all in the index, and there is no need to check the clustered index;
5. Use index scan to sort;
6. Under the principle of observing the leftmost prefix, try to expand the index instead of creating the index.
If you remember the 3rd and 6th rules, the index will not be built like this.
2 Slow Sql Governance Governance
Integrate the index as follows:

There are many tasks in the system to pull the access records under the entire product, and then process them, so put the product_code with a higher degree of discrimination at the top of the index, and then add the rule_name and status fields to the index to further filter the data and reduce the number of scanned rows , to avoid slow sql . For the commonly used rule_value query conditions, UK can be hit, so there is no need to create a separate index.
Three non-essential sorting

1 Problem description
In many business logics, it is necessary to pull a list of records that meet a certain condition. The sql statement of the query has an order by. When there are many records, the sorting cost is often very high, but whether the queried records are ordered or not has no effect on the business logic. For example, the count statement discussed in paging management only needs to count the number of records, and order by has no effect on the number of records. For example, after the record list is found, it does not depend on the order of records to traverse the list to process data. At this time, order by is unnecessary.

2 Solutions
query sql has no limit statement, and the business processing logic does not depend on the order of the list records after order by, remove the order by statement in the query sql .
Four coarse-grained queries

1 Problem description
There are many scheduled tasks in the business, scan all data under a product in a table, and process the data, such as:

The three query conditions are all indistinguishable columns, and there are 27W pieces of data found, so adding an index is not meaningful.

2 Analysis

The actual business volume is not that large, at most a few thousand pieces of data, the data in the table is synchronized from the upstream, the best way is to let the upstream simplify the data, but because the business is too long, it is too difficult to find the upstream people to maintain, So I can only think of other ways.
The purpose of this timed task is to pull out the data under some products in the XXX_rules table, compare it with the data in another table, and update the data with differences. Processing in the early morning every day does not require high timeliness. Therefore, can we transfer the task processing place and not process so many pieces of data in real time on the application machine?

3 Solutions

synchronized from the offline task odps . The first thing that comes to mind is the dataWork data processing platform.
Create a data comparison task, put the data comparison logic done by the scheduled task on the dataWork and implement it with sql . The difference data is at most hundreds of pieces per day, and the result set contains highly discriminative columns. Write the difference data into the odps table, and then Data flows back to idb .
XXX_rules and update XXX_rules by using the highly discriminative column in the returned difference data as a query condition , which solves the problem of slow SQL .
The premise of this method is that the requirements for data validity are not high, and the result set produced offline is very small.

Five ORs lead to index invalidation
1 case
SELECT count(*)
FROM XXX_level_report
WHERE 1 = 1
AND EXISTS (
SELECT 1
FROM XXX_white_list t
WHERE (t.biz_id = customer_id
OR customer_id LIKE CONCAT(t.biz_id, '@%'))
AND t.status = 1
AND ( t .start _time < = CURRENT_TIME
OR t .start _time IS NULL)
AND ( t .end_time >= CURRENT_TIME
OR t .end_time IS NULL)
AND t .biz_type = 'GOODS_CONTROL_BLACKLIST'
)
2 Analysis
Explain the above query statement and get the following results:

Five ORs lead to index invalidation
1 case
SELECT count(*)
FROM XXX_level_report
WHERE 1 = 1
AND EXISTS (
SELECT 1
FROM XXX_white_list t
WHERE (t.biz_id = customer_id
OR customer_id LIKE CONCAT(t.biz_id, '@%'))
AND t.status = 1
AND ( t .start _time < = CURRENT_TIME
OR t .start _time IS NULL)
AND ( t .end_time >= CURRENT_TIME
OR t .end_time IS NULL)
AND t .biz_type = 'GOODS_CONTROL_BLACKLIST'
)
2 Analysis
Explain the above query statement and get the following results:

The XXX_white_list table has biz_id as an index. Here, the query XXX_white_list table has biz_id as the query condition. Why is the type in the explain result ALL, that is, the whole table is scanned? Index failed? What happens when an index fails?
Index failure scenarios
1. There are missing indexes on the left and right sides of the OR query;
2. The composite index does not satisfy the leftmost matching principle;
3. Like starts with %;
4. Type conversion is required;
5. The index column in where has operations;
6. The index column in where uses the function;
7. If mysql feels that the full table scan is faster (when there is less data)
line 8 of the above query statement, customer_id is a field in the XXX_level_report table, and the index of the XXX_white_list table is not hit , causing the index to fail.
3 Solutions
This statement is spliced together with fancy code of condition, enumeration and join. It is very troublesome to change, and it seems that the sentence "OR customer_id LIKE CONCAT( t.biz_id , '@%')" cannot be deleted directly. Finally, the query statement in this part is refactored, the or query is removed, and the slow sql is solved .

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00