Adolph
Engineer
Engineer
  • UID623
  • Fans4
  • Follows1
  • Posts72
Reads:41202Replies:0

Solution to high CPU usage of Alibaba Cloud RDS SQL Server - missing index

Created#
More Posted time:Apr 26, 2017 16:44 PM
Abstract
High CPU usage will always lead to slow response of SQL Server services, query timeout, and even service suspension. It can be said that the high utilization rate of CPU is the No 1 killer of background process services such as databases. One article in this series - "missing index" - is one of the most common reasons of high utilization of CPU.
Introduction of issues
"New guy, the most frequently encountered problem during our services for Alibaba Cloud RDS SQL Server customers is the high utilization of CPU of RDS SQL Server databases in customers' feedback (sometimes more than 90%, or even to 100%), resulting in slow queries or even query timeout. How can we solve such problems?" Laruence is just bored by similar questions.
"Laruence, the high CPU usage issue is the result of a variety of causes, which cannot be explained clearly with a sentence or two," The new guy began the suspense. "Well, what about this - I write a series of articles to analyze various scenarios to solve the high CPU usage in RDS SQL Server databases!".
Analysis of scenarios
In the relational database (RDBMS) system, lacking indexes is very common and will lead to high I/O reads, which further leads to high CPU usage. This is because when the query optimizer is evaluating the execution plan, it finds no suitable index to use and has to choose Table Scan or Clustered Index Scan, which is similar to full-table scan, to get the required data. This large-area data scanning will increase the frequency of reads to the I/O subsystem. SQL Server needs to read a lot of data and load the data into memory. These operations will eventually cause the CPU usage to soar. In such scenarios, to solve the high CPU usage problem actually turns into a problem to solve the missing indexes. We can check the following example to figure out how to discover and solve the problem of missing indexes.
Solution
Here, we break down this example into five small steps:
 Test environment: Build a simple test environment.
Execute the query: Create query statements for performance comparison before and after the missing index
 Missing index: The method to search for the missing index
 Solve the problem: Create a missing index
 Efficiency comparison: Create performance comparison before and after the missing index
Test environment
Creating a test environment involves creating a test database and a test table object, and initializing 2 million records.
-- Create testing database
IF DB_ID('TestDb') IS NULL
    CREATE DATABASE TestDb;
GO

USE TestDb
GO

-- create demo table SalesOrder
IF OBJECT_ID('dbo.SalesOrder', 'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.SalesOrder
END
GO

CREATE TABLE dbo.SalesOrder
(
    RowID INT IDENTITY(1,1) NOT NULL
    , OrderID UNIQUEIDENTIFIER NOT NULL
    , ItemID INT NOT NULL
    , UserID INT NOT NULL
    , OrderQty INT NOT NULL
    , Price DECIMAL(8,2) NOT NULL
    , OrderDate DATETIME NOT NULL
        CONSTRAINT DF_OrderDate DEFAULT(GETDATE())
    , LastUpdateTime DATETIME NULL
    , OrderComment NVARCHAR(100) NULL
    , CONSTRAINT PK_SalesOrder PRIMARY KEY(
        OrderID
    )
);

-- data init for 2 M records.
;WITH a
AS (
    SELECT *
    FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) AS a(a)
), RoundData
AS(
SELECT TOP(2000000)
    OrderID = NEWID()
    ,ItemIDRound = abs(checksum(newid()))
    ,Price = a.a * b.a * 10
    ,OrderQty = a.a + b.a + c.a + d.a + e.a + f.a + g.a + h.a
FROM a, a AS b, a AS c, a AS d, a AS e, a AS f, a AS g, a AS h
), DATA
AS(
SELECT
    OrderID
    ,ItemID = cast(ROUND((1300 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 101), 0) as int)
    ,UserID = cast(ROUND((500 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 10000), 0) as int)
    ,OrderQty
    ,Price = cast(Price AS DECIMAL(8,2))
    ,OrderDate = dateadd(day, -cast(ROUND((50 * (ItemIDRound*1./cast(replace(ItemIDRound, ItemIDRound, '1' + replicate('0', len(ItemIDRound))) as bigint)) + 1), 0) as int) ,GETDATE())
FROM RoundData
)
INSERT INTO dbo.SalesOrder(OrderID, ItemID, UserID, OrderQty, Price, OrderDate, LastUpdateTime, OrderComment)
SELECT
    OrderID
    , ItemID
    , UserID
    , OrderQty
    , Price
    , OrderDate
    , LastUpdateTime = OrderDate
    , OrderComment = N'User ' + CAST(UserID AS NVARCHAR(8)) + N' purchased item ' + CAST(ItemID AS NVARCHAR(8))
FROM DATA;
GO

EXEC sys.sp_spaceused 'dbo.SalesOrder'

2 million data records have been initialized, as shown below:



Execute the query
Query the purchase history of User 10057 in the last month (in order to get the performance comparison information, I enabled Time and I/O statistics). It is recommended to enable the actual execution plan to get the option before executing statements. Activate the actual execution plan by clicking on the icon in the box below in SSMS or using the shortcut CTRL+M:

Execute the query statement:

----=== get User 10057 purchased Items Info
USE [TestDb]
GO
SET STATISTICS TIME ON
SET STATISTICS IO ON

SELECT
    ItemID, OrderQty, Price
FROM dbo.SalesOrder
WHERE UserID = 10057
    AND OrderDate >= DATEADD(MONTH, -1, GETDATE())
    AND OrderDate <= GETDATE();

SET STATISTICS TIME OFF
SET STATISTICS IO OFF


The I/O, CPU and time consumption of the query statement execution, where the I/O logical reads consume 32295, the CPU consumption is 451 ms, and the execution time is 648 ms, as shown in the figure below:


The execution plan adopted the Clustered Index Scan (the performance consumption is almost the same as with Table Scan). A warning message about index missing is displayed in green text as shown in the figure below. Right click on the text and select Missing Index Details..., and you can view the missing index details:


Missing index
In addition to viewing the missing index in the execution plan as illustrated above, we can also use three important system dynamic views to check the missing index (for specific meanings of each view, see the Help documentation):
 sys.dm_db_missing_index_group_stats
 sys.dm_db_missing_index_groups
 sys.dm_db_missing_index_details
The method of using the three system dynamic views to look for missing indexes is described as follows:
USE TestDb
GO

SELECT TOP 100
    c.equality_columns
    , c.inequality_columns
    , c.included_columns
    , create_Index = 'CREATE INDEX IX_' + REPLACE(REPLACE(REPLACE(equality_columns, '[', ''), ']',''), ',', '')
        + '_' + REPLACE(REPLACE(REPLACE(inequality_columns, '[', ''), ']',''), ',', '')
        + '_' +REPLACE(REPLACE(REPLACE(included_columns, '[', '@'), ']',''), ', ', '_')  + char(10)
        + 'ON ' + SCHEMA_NAME(tb.schema_id) + '.' + object_name(tb.object_id)
        + '('
            + case
                when c.equality_columns is not null and c.inequality_columns is not null
                    then c.equality_columns + ',' + c.inequality_columns
                when c.equality_columns is not null and c.inequality_columns is null
                    then c.equality_columns
                when c.inequality_columns is not null
                    then c.inequality_columns
            end
        + ')' + char(10)
        + case
            when c.included_columns is not null
            then 'INCLUDE (' + c.included_columns + ')'
            else ''
        end + char(10)
        + N'WITH (FILLFACTOR = 85);'
FROM sys.dm_db_missing_index_group_stats a
    INNER JOIN sys.dm_db_missing_index_groups b
        ON a.group_handle = b.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details c
        ON c.index_handle = b.index_handle
    INNER JOIN sys.tables as tb
        ON c.object_id = tb.object_id
WHERE db_name(database_id) = db_name()
    AND equality_columns is not null
    AND tb.object_id = object_id('dbo.SalesOrder', 'U')
ORDER BY a.avg_total_user_cost * a.avg_user_impact * (a.user_seeks + a.user_scans) DESC

The query result after execution is shown in the figure below:



Solve the problem
Whether it is viewing missing indexes through the execution plan or getting the missing indexes through the three dynamic views, the ultimate goal is to solve the problem and to let us create this missing index:
--=== Create Missing Indexes
USE [TestDb]
GO
CREATE INDEX IX_UserID_OrderDate_@ItemID_@OrderQty_@Price
ON dbo.SalesOrder([UserID],[OrderDate])
INCLUDE ([ItemID], [OrderQty], [Price])
WITH (FILLFACTOR = 85);
GO


Comparison of efficiency
After this missing index is created, execute the query statement in the "Execute the query" in the earlier step. Make a comparison between the execution plan and performance consumption.
The execution plan has adopted the more efficient Index Seek, as shown below:


The I/O logical reads consume 126, the CPU consumption is 16 ms and the execution time is 198 ms, as shown in the screenshot below:


After the index was created, the execution time consumption, the CPU consumption, and the I/O reads consumption improved by 3.27 times, 28.19 times and 256.3 times respectively, and the average performance increased by 95.92 times. The comparison result is shown in the figure below:


Summary
This article connects the theory with the actual scenario and introduces the solution to missing indexes, as one article of the series for high CPU usage solutions. From the final test results, after the index was created, the three aspects for specific queries, namely the CPU usage, time consumption and I/O reads, have all greatly improved. In particular, the I/O reading performance increased by 256.3 times, with the average performance increasing by 95.92 times, the overall effect being quite remarkable.

Reference
Alibaba Cloud RDS
[Adolph edited the post at Jul 13, 2017 16:11 PM]
Guest