Reads:41202Replies:0
Solution to high CPU usage of Alibaba Cloud RDS SQL Server - missing index
Abstract
High CPU usage will always lead to slow response of SQL Server services, query timeout, and even service suspension. It can be said that the high utilization rate of CPU is the No 1 killer of background process services such as databases. One article in this series - "missing index" - is one of the most common reasons of high utilization of CPU. Introduction of issues "New guy, the most frequently encountered problem during our services for Alibaba Cloud RDS SQL Server customers is the high utilization of CPU of RDS SQL Server databases in customers' feedback (sometimes more than 90%, or even to 100%), resulting in slow queries or even query timeout. How can we solve such problems?" Laruence is just bored by similar questions. "Laruence, the high CPU usage issue is the result of a variety of causes, which cannot be explained clearly with a sentence or two," The new guy began the suspense. "Well, what about this - I write a series of articles to analyze various scenarios to solve the high CPU usage in RDS SQL Server databases!". Analysis of scenarios In the relational database (RDBMS) system, lacking indexes is very common and will lead to high I/O reads, which further leads to high CPU usage. This is because when the query optimizer is evaluating the execution plan, it finds no suitable index to use and has to choose Table Scan or Clustered Index Scan, which is similar to full-table scan, to get the required data. This large-area data scanning will increase the frequency of reads to the I/O subsystem. SQL Server needs to read a lot of data and load the data into memory. These operations will eventually cause the CPU usage to soar. In such scenarios, to solve the high CPU usage problem actually turns into a problem to solve the missing indexes. We can check the following example to figure out how to discover and solve the problem of missing indexes. Solution Here, we break down this example into five small steps: Test environment: Build a simple test environment. Execute the query: Create query statements for performance comparison before and after the missing index Missing index: The method to search for the missing index Solve the problem: Create a missing index Efficiency comparison: Create performance comparison before and after the missing index Test environment Creating a test environment involves creating a test database and a test table object, and initializing 2 million records. -- Create testing database IF DB_ID('TestDb') IS NULL CREATE DATABASE TestDb; GO USE TestDb GO -- create demo table SalesOrder IF OBJECT_ID('dbo.SalesOrder', 'U') IS NOT NULL BEGIN DROP TABLE dbo.SalesOrder END GO CREATE TABLE dbo.SalesOrder ( RowID INT IDENTITY(1,1) NOT NULL , OrderID UNIQUEIDENTIFIER NOT NULL , ItemID INT NOT NULL , UserID INT NOT NULL , OrderQty INT NOT NULL , Price DECIMAL(8,2) NOT NULL , OrderDate DATETIME NOT NULL CONSTRAINT DF_OrderDate DEFAULT(GETDATE()) , LastUpdateTime DATETIME NULL , OrderComment NVARCHAR(100) NULL , CONSTRAINT PK_SalesOrder PRIMARY KEY( OrderID ) ); -- data init for 2 M records. ;WITH a AS ( SELECT * FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS a(a) ), RoundData AS( SELECT TOP(2000000) OrderID = NEWID() ,ItemIDRound = abs(checksum(newid())) ,Price = a.a * b.a * 10 ,OrderQty = a.a + b.a + c.a + d.a + e.a + f.a + g.a + h.a FROM a, a AS b, a AS c, a AS d, a AS e, a AS f, a AS g, a AS h ), DATA AS( SELECT OrderID ,ItemID = cast(ROUND((1300 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 101), 0) as int) ,UserID = cast(ROUND((500 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 10000), 0) as int) ,OrderQty ,Price = cast(Price AS DECIMAL(8,2)) ,OrderDate = dateadd(day, -cast(ROUND((50 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 1), 0) as int) ,GETDATE()) FROM RoundData ) INSERT INTO dbo.SalesOrder(OrderID, ItemID, UserID, OrderQty, Price, OrderDate, LastUpdateTime, OrderComment) SELECT OrderID , ItemID , UserID , OrderQty , Price , OrderDate , LastUpdateTime = OrderDate , OrderComment = N'User ' + CAST(UserID AS NVARCHAR(8)) + N' purchased item ' + CAST(ItemID AS NVARCHAR(8)) FROM DATA; GO EXEC sys.sp_spaceused 'dbo.SalesOrder' 2 million data records have been initialized, as shown below: ![]() Execute the query Query the purchase history of User 10057 in the last month (in order to get the performance comparison information, I enabled Time and I/O statistics). It is recommended to enable the actual execution plan to get the option before executing statements. Activate the actual execution plan by clicking on the icon in the box below in SSMS or using the shortcut CTRL+M: ![]() Execute the query statement: ----=== get User 10057 purchased Items Info USE [TestDb] GO SET STATISTICS TIME ON SET STATISTICS IO ON SELECT ItemID, OrderQty, Price FROM dbo.SalesOrder WHERE UserID = 10057 AND OrderDate >= DATEADD(MONTH, -1, GETDATE()) AND OrderDate <= GETDATE(); SET STATISTICS TIME OFF SET STATISTICS IO OFF The I/O, CPU and time consumption of the query statement execution, where the I/O logical reads consume 32295, the CPU consumption is 451 ms, and the execution time is 648 ms, as shown in the figure below: ![]() The execution plan adopted the Clustered Index Scan (the performance consumption is almost the same as with Table Scan). A warning message about index missing is displayed in green text as shown in the figure below. Right click on the text and select Missing Index Details..., and you can view the missing index details: ![]() Missing index In addition to viewing the missing index in the execution plan as illustrated above, we can also use three important system dynamic views to check the missing index (for specific meanings of each view, see the Help documentation): sys.dm_db_missing_index_group_stats sys.dm_db_missing_index_groups sys.dm_db_missing_index_details The method of using the three system dynamic views to look for missing indexes is described as follows: USE TestDb GO SELECT TOP 100 c.equality_columns , c.inequality_columns , c.included_columns , create_Index = 'CREATE INDEX IX_' + REPLACE(REPLACE(REPLACE(equality_columns, '[', ''), ']',''), ',', '') + '_' + REPLACE(REPLACE(REPLACE(inequality_columns, '[', ''), ']',''), ',', '') + '_' +REPLACE(REPLACE(REPLACE(included_columns, '[', '@'), ']',''), ', ', '_') + char(10) + 'ON ' + SCHEMA_NAME(tb.schema_id) + '.' + object_name(tb.object_id) + '(' + case when c.equality_columns is not null and c.inequality_columns is not null then c.equality_columns + ',' + c.inequality_columns when c.equality_columns is not null and c.inequality_columns is null then c.equality_columns when c.inequality_columns is not null then c.inequality_columns end + ')' + char(10) + case when c.included_columns is not null then 'INCLUDE (' + c.included_columns + ')' else '' end + char(10) + N'WITH (FILLFACTOR = 85);' FROM sys.dm_db_missing_index_group_stats a INNER JOIN sys.dm_db_missing_index_groups b ON a.group_handle = b.index_group_handle INNER JOIN sys.dm_db_missing_index_details c ON c.index_handle = b.index_handle INNER JOIN sys.tables as tb ON c.object_id = tb.object_id WHERE db_name(database_id) = db_name() AND equality_columns is not null AND tb.object_id = object_id('dbo.SalesOrder', 'U') ORDER BY a.avg_total_user_cost * a.avg_user_impact * (a.user_seeks + a.user_scans) DESC The query result after execution is shown in the figure below: ![]() Solve the problem Whether it is viewing missing indexes through the execution plan or getting the missing indexes through the three dynamic views, the ultimate goal is to solve the problem and to let us create this missing index: --=== Create Missing Indexes USE [TestDb] GO CREATE INDEX IX_UserID_OrderDate_@ItemID_@OrderQty_@Price ON dbo.SalesOrder([UserID],[OrderDate]) INCLUDE ([ItemID], [OrderQty], [Price]) WITH (FILLFACTOR = 85); GO Comparison of efficiency After this missing index is created, execute the query statement in the "Execute the query" in the earlier step. Make a comparison between the execution plan and performance consumption. The execution plan has adopted the more efficient Index Seek, as shown below: ![]() The I/O logical reads consume 126, the CPU consumption is 16 ms and the execution time is 198 ms, as shown in the screenshot below: ![]() After the index was created, the execution time consumption, the CPU consumption, and the I/O reads consumption improved by 3.27 times, 28.19 times and 256.3 times respectively, and the average performance increased by 95.92 times. The comparison result is shown in the figure below: ![]() Summary This article connects the theory with the actual scenario and introduces the solution to missing indexes, as one article of the series for high CPU usage solutions. From the final test results, after the index was created, the three aspects for specific queries, namely the CPU usage, time consumption and I/O reads, have all greatly improved. In particular, the I/O reading performance increased by 256.3 times, with the average performance increasing by 95.92 times, the overall effect being quite remarkable. Reference Alibaba Cloud RDS [Adolph edited the post at Jul 13, 2017 16:11 PM]
|
|