Adolph
Engineer
Engineer
  • UID623
  • Fans2
  • Follows1
  • Posts72
Reads:928Replies:0

Solution of high CPU usage of Alibaba Cloud RDS SQL Server - non-SARG queries

Created#
More Posted time:May 4, 2017 9:21 AM
Abstract
The most-encountered issue for Alibaba Cloud RDS SQL Server customers is high CPU usage which leads to slow responses of SQL Server services and query timeout, and even service suspension. This article, as the fourth one in this series, analyzes the solution to high CPU usage because of non-SARG queries.
Introduction of issues
"New guy, have you heard of the RDBMS non-SARG query? I heard of it for the first time today," asked Laruence confusedly.
"Haha, Laruence. You are out of the loop. It's another important cause of high CPU usage in RDBMS databases. Let me explain what it is," New guy gets excited.
Analysis of scenarios
SARG is the abbreviation for Search Argument. Non-SARG, in simple words, means that the query predicate (especially the WHERE clause or the ON clause for connection operations) causes the index to be invalid. The query optimizer is not able to use the efficient Index Seek operation and has to switch to Index Scan operation with a lower efficiency, resulting in increased CPU usage and reduced query performance. Common non-SARG operations include the use of scalar functions in the WHERE statement such as Datediff, Dateadd, Year, Rtrim, Upper, Lower, and LIKE full fuzzy matching (in a format such as LIKE '% XXX%') or user-defined functions.
Solution
The core idea of solving non-SARG queries is to avoid the use of scalar functions in WHERE statements by means of equivalent rewriting of the query logic or optimization of functional designs. We will only illustrate the solution with the following common non-SARG operations as examples and other operations can follow suit.
DATEDIFF
For example, we need to query the order information records for the day. The non-SARG statement is as follows (the Datediff function is used in the WHERE statement):
USE TestDb
GO

CREATE INDEX IX_OrderDate
ON dbo.SalesOrder([OrderDate])
WITH (FILLFACTOR = 90);
GO

SET STATISTICS TIME ON
SET STATISTICS IO ON

-- find out sales info of today.
SELECT
    ItemID, OrderQty, Price,OrderDate
FROM dbo.SalesOrder
WHERE DATEDIFF(DAY, OrderDate, GETDATE()) = 0

SET STATISTICS TIME OFF
SET STATISTICS IO OFF


From the performance index, the I/O consumption is 8406, CPU consumption is 171 milliseconds, the execution time is 211 milliseconds, and the screenshot is shown below:


The execution plan window. The SQL Server optimizer adopts the Index Scan, and the screenshot is shown below:


In fact, we can change our thinking logic. The data for the day is actually the records of the time greater than or equal to 0:00 of the day, and smaller than or equal to the current time. Then the equivalent logic code can be rewritten as follows:
USE TestDb
GO
SET STATISTICS TIME ON
SET STATISTICS IO ON

DECLARE
    @today DATETIME = CONVERT(CHAR(10), GETDATE(), 120)
;
SELECT
    ItemID, OrderQty, Price,OrderDate
FROM dbo.SalesOrder
WHERE OrderDate >= CONVERT(CHAR(10), GETDATE(), 120)
    AND OrderDate <= GETDATE();
SET STATISTICS TIME OFF
SET STATISTICS IO OFF


Here is the optimized performance index. The I/O consumption is 3, CPU consumption is 0 milliseconds, and the execution time is 109 milliseconds.


The query optimizer adopted the more efficient Index Seek for execution plan operations and the screenshot is shown below:


From the performance index comparison, we can see that the I/O dropped from 8406 to 3, the CPU consumption from 171 milliseconds down to 0 milliseconds, and the execution time from 211 to 109 milliseconds. In particular, the I/O and CPU improvement is very remarkable.
UPPER/LOWER
Because the default sorting rule in SQL Server is case-insensitive, in other words, SQL Server deems UPPER as identical with upper, the UPPER or LOWER functions can be ignored during character string comparison operations to avoid non-SARG queries and increased CPU usage.
Of course, it's very important that you confirm that your database is case-insensitive before rewriting non-SARG queries, that is, case-insensitive keywords are contained. The method to check is as follows:
USE TestDb
GO
EXEC sys.sp_helpsort


The result is shown in the figure below:


RTRIM
Some people like to use the RTRIM function to cancel the space at the rightmost of the character string before comparison when writing character string comparison. In fact, this approach is superfluous and completely unnecessary. The reason is that SQL Server will automatically ignore the rightmost space when doing string comparisons. As a result, this operation will result in non-SARG queries which will increase CPU usage.
We can write a very simple example to prove this point. In the IF statement, the RTRIM on the left of the equal sign is closely followed by a space character, and the TRIM string on the right of the equal sign does not include a space, but the result of the IF statement is true. The code is shown as below:
IF 'TRIM ' = 'TRIM'
    PRINT 'equals'
ELSE
    PRINT 'not equal'


The result is shown in the figure below:


LIKE full fuzzy matching
The LIKE full fuzzy matching in the WHERE statement will also cause invalidity of the SQL Server indexes, so it is also a type of non-SARG query Optimization of this kind of scenario includes two methods: the first method is to optimize the design. For example, we often encounter the scenario of carrying out full fuzzy matching through the phone number, but the input telephone numbers in customers' queries are often complete ones. Here we can change the design to only support the query of complete phone numbers, that is, changing the LIKE full fuzzy matching into the equate operation, avoiding non-SARG operations. The second solution is that if full fuzzy matching queries are unavoidable in the design, we can choose the SQL Server Fulltext technology to solve the full fuzzy queries of LIKE statements, so as to avoid non-SARG operations and reduce CPU usage.
Summary
This article analyzes the reasons for non-SARG query statements to cause increased Alibaba Cloud RDS SQL Server CPU usage and determines that the core idea of solving such problems is to rewrite the semantic logic to avoid non-SARG queries that result in high CPU usage. It also illustrates several typical methods to rewrite non-SARG queries.
Reference
Alibaba Cloud RDS
[Adolph edited the post at Jul 13, 2017 16:12 PM]
Guest