Community Blog Troubleshooting SQL Server Performance Issues

Troubleshooting SQL Server Performance Issues

In this tutorial, you will learn to troubleshoot SQL Server common performance issues like disaster recovery and high CPU usage.

SQL Server is an open-source relational database management system (RDBMS). Its primary function is to store and retrieve data when required by other applications. It runs on various operating systems, such as Linux, UNIX, and Windows and across a wide range of applications.

The SQL Server AlwaysOn feature was made available in the SQL Server 2012 release. The AlwaysOn availability group boasts of a high-availability and disaster recovery solution. It replaces enterprise-level database image solutions to maximize the availability of a set of user databases for the enterprise. This feature sparked a revolutionary change in the industry.

The solution implements multiple readable copies as well as convenient read and write separation schemes. It outperforms Database Mirroring + Replication for achieving read and write separation with respect to availability and reliability. Several features may have restrictions in versions earlier than SQL Server 2016. However, post release of the 2016 version, SQL Server became more user-friendly and compliant with the "customer first" principle. For example, you can now achieve no-domain-control deployment and enjoy as many as nine available copies.

Alibaba Cloud ECS Instances

The Alibaba Cloud ECS allows you to load applications with multiple operating systems and manage network access rights and permissions. Within the user console, you can also access the latest storage features, including auto snapshots, which is perfect for testing new tasks or operating systems as it allows you to make a quick copy and restore later. It offers a variety of configurable CPU, memory, data disk and bandwidth variations allowing you to tailor each Instance to your specific needs.

Let us see how to implement AlwaysOn availability groups with no domain control on Alibaba Cloud ECS instances.

1. Prerequisites

Before we begin detailing the architecture solution, let us look at the prerequisites for the deployment process.

1.1 Hardware Conditions:

Container ECS: Recommended: 4-core CPU or above, 8GB memory or above. Mount SSD cloud disks as ECS.

1.2 Software conditions:

You would need the following software with the listed specifications:

● .NET Framework 4.0 or above
● Powershell 5.0 or above
● Windows Server 2016 64-bit Data Center Edition (either Chinese or English edition)
● SQL Server 2016 64-bit Enterprise Edition (It requires Enterprise Edition. Standard Edition can only implement Basic Availability)

Architecture Diagram

The following figure shows the simple architecture of implementation on ECS instances. It also demonstrates a typical software solution for SQL Server AlwaysOn availability groups.

SQL Server AlwaysOn availability groups

Overview of Core Components

Let us now discuss the core components of the solution.

1. Architecture Overview

The figure shows a "2 + 3" high-availability and disaster-tolerant solution. The "2" refers to primary and secondary database copies. These primary and secondary databases use sync mode to enable automatic failover between availability groups of the database. The condition for automatic failover is such that the server must utilize the synchronous replication mode. Also, the databases must have automatic failover settings enabled.

The "3" refers to read-only copies that use async, or asynchronous replication mode, which takes into account the performance of multiple copies. For writing data, two copies (primary and secondary) are adequate. You can set all other copies to asynchronous replication. Additionally, there is an access issue to consider for ECS to deploy listeners. Therefore, to meet the solution requirements, the server allows only two synchronous copies. In fact, AlwaysOn availability group can only have up to three synchronous copies, which is enough to prove that Microsoft cares about this performance problem quite a bit.

2. Virtual Private Cloud (VPC)

Being VPC-based is an underlying network environment requirement. If you deploy databases in a classic network, you will run into serious issues due to the absence of Intranet IP addresses. However, if it is a VPC, resolving the problem is possible. AlwaysOn deployment will produce two virtual IP addresses. As a result, the two IP addresses must be private and remain unoccupied. Otherwise, the cluster and the listener may fail to work normally, and HA switch won't function properly.

3. Highly Available (HA)

Users might often ask, why do we need HA? If you deploy AlwaysOn on ECS instances, you may run into listener failures.

The inability to use the listener IP address to access database instances from a non-primary node causes listener failure. The listener failure is an obstacle preventing cloudization. In fact, it is associated with VPC network implementation. While the listener happens to take the Address Resolution Protocol (ARP) protocol, its resolution is not possible. HAVIP (Highly Available Virtual IP) is an HA solution with primary-secondary high-availability architecture under VPC. It provides the capability to access instances using VIP (Virtual IP) between primary and secondary instances without switching the IP address. However, you need to implement HA and heartbeat check on your own. This coincides with AlwaysOn's failover detection, diagnosis and switch mechanisms.

Furthermore, HAVIP + AlwaysOn can solve problems caused by listener failures. However, the system allows HAVIP to use only two nodes.

We use two synchronous nodes (primary and secondary) to implement write HA. Also, we use three asynchronous read-only copies to implement reads, which fits into this scenario appropriately. The number of read-only copies may vary from one to two or further to seven. However, the number of nodes available for failover must allow votes of greater than or equal to three (nodes with domain control can use the shared folder to act as a VOTE).

This may give rise to the question - How can we access so many read-only copies? Read further for an answer to this question.

4. Server Load Balancer

Server Load Balancer is designed to address read-only nodes. Ideally, this applies if you do not need Server Load Balancer to assign weights or implement load balancing automatically. You can also directly connect to the IP addresses of read-only copies. This is particularly in the case of deployment of web apps in a distributed approach. Therefore, it is possible to connect different web apps to different read-only nodes. Nodes for writes are restricted to do this as write nodes only have one copy.

5. Elastic IP Address (EIP)

An Elastic IP address is a public IPv4 address designed for dynamic cloud computing. Using an EIP, users can mask the failure of an instance or software. However, in this scenario, an EIP is not necessary. If you want to access database instances from outside the VPC, you can bind an EIP to an HAVIP and an ECS IP address. Note that it is impossible to bind Server Load Balancer to EIP. We do not recommend EIP since access in VPC is relatively safe.

6. Elastic Compute Service (ECS)

Elastic Compute Service (ECS) is a core service that enables users to launch new instances immediately to meet with real-time demand. ECS is the most basic instance with no special requirements.

7. Windows Cluster

Windows Cluster is crucial to this solution because operations for creating clusters for nodes with no domain control are often different. Also, DNS servers may even be absent. This is a new feature of Windows Server 2016. It brings with itself the advantage of independence from the complexity of domain control. Additionally, it also adds complexity to AlwaysOn deployment. Domain control not only has to consider deployment and HA problems but also needs to consider O&M issues, which often is a bother. Windows Cluster with no domain control, despite limited functionality, is sufficient to meet conditions for deploying AlwaysOn. You can refer to the deployment plan, discussed further in this article, for more details.

8. AlwaysOn Availability Group

AlwaysOn deployment uses SQL Server 2016 since this version supports non-domain-control deployment. The difference lies in whether domain control exists. While the security authentication modes of corresponding nodes will change - mutual authentication is required using security certificates.

Domain control is fairly straightforward, relying on the security authentication mechanism. The complexity of deployment manifests the same. Another factor that is worth mentioning is the listener, which cannot normally function in a VPC. Therefore, implementation of read and write separation requires HAVIP along with Server Load Balancer. Listener access to database instances is not a very elegant issue. If it is read-only access, there will be a read-only option on the connection string. This might confuse you in considering read and write as two different strings.

Similar to HAVIP + Server Load Balancer, many may mistakenly assume that a string solves the problem. However, it is practically impossible.

Let us now look into the deployment plan in this link.

Related Blogs

Troubleshooting High CPU Usage on Alibaba Cloud SQL Server

A primary issue with SQL Server is its sensitivity to latency, often resulting in performance issues.

SQL Server is an open-source relational database management system (RDBMS). Its primary function is to store and retrieve data when required by other applications. It runs on various operating systems, such as Linux, UNIX, and Windows and across a wide range of applications. Alibaba Cloud ApsaraDB for RDS fully supports SQL Server, provides automated monitoring, backup and disaster recovery capabilities.

The Impact of High CPU Usage in SQL Server

Despite its several functionality and advantages, SQL Server has its own set of concerns. A primary issue with SQL Server is its sensitivity to latency. This often results in performance issues. The reason for latency issue may vary depending on the workload to which the CPU is subjected.

Let us take a deep dive into seven possible causes of high CPU usage in SQL Server, including debugging app design. They are as follows:

  1. Missing indexes
  2. Index fragmentation
  3. Data type conversion
  4. Non-SARG query
  5. Parameter sniffing
  6. Statistics
  7. Top CPU-consuming queries

Missing Indexes

While troubleshooting, we discovered that the top contributor to high CPU usage is missing indexes. Let us first understand what indexes are. Index structure is the projection of one or more columns of data in the base table. It uses a particular structure sorted in ascending or descending order. It enables higher query efficiency, especially for frequently used queries.

The particular sorted structure of indexes makes searching efficient and allows you to find the desired data without consuming too much I/O. Consequently, reduced I/O consumption leads to minimized CPU usage. Now, let us look at the methods for finding missing indexes.

Methods for Finding Missing Indexes

How can we find out the missing indexes in the respective tables?

The first method is by using Database Tuning Advisor (DTA). The second method is through issuing warnings for missing indexes in the execution plan. During a statement execution, the execution plan issues a warning that an index is missing. With such a warning, you can locate the missing index and create it. The third method is to access the system's dynamic view. There are typically three views: sys.dm_db_missing_index_group_stats, sys.dm_db_missing_index_group_stats and sys.dm_db_missing_index_details. For specific use, please refer to our help documentation.

Q&A – High CPU Usage on Alibaba Cloud SQL Server

A primary issue with SQL Server is its sensitivity to latency, often resulting in performance issues.
In our previous blog post, we talked about troubleshooting high CPU usage on Alibaba Cloud ApsaraDB for RDS SQL Server.

SQL Server is an open-source relational database management system (RDBMS). Its primary function is to store and retrieve data when required by other applications. Because SQL Server is sensitive to latency, higher CPU latency might result in performance issues. Therefore, it is essential to bring down the CPU usage to the extent possible by looking into the component that is causing this challenge. This article highlights seven possible causes of high CPU usage and their probable remedies.

In this article, we will look at some frequently asked questions about SQL Server delays and their answers.

1. How can we use XML check in DMV to check for conversion?

Open the actual execution plan > execute the query > open the execution plan XML, and you will find implicit data type conversion in the two places shown in the figure below.


execution plan XML



2. The execution of stored procedures takes as long as 100 minutes, but CPU usage is not high. How can we solve this problem?

This is not a high CPU usage problem, but an overlong execution issue. It also deserves extreme care and attention. I encountered a similar case before in which the developer wrote a dead loop inside the stored procedure, resulting in a very long execution time and a failure to exit.

3. Currently, our largest table with GUID as the primary key contains 40 million data records. Will there be any performance problems after the number of data records exceeds 100 million? Is there any solution?

We would not recommend using GUID as the primary key very often for the following reasons:

First, the GUID field is too wide, specifically char (36), resulting in a high primary key band. It will indirectly lead to increased space for other non-primary key indexes (since non-primary key indexes need to record the value of the primary key). The larger the data space, the higher I/O used for queries, and the greater the CPU usage and the lower the efficiency.

Second, the size of each GUID value is unpredictable. When you insert new data into the table, if the GUID primary key is clustered, the data may be inserted between two rows of data. For example, the primary keys of the previous two rows of data are a and c, when b data comes in, it will insert data between a and c. This may lead to data movement.

Third, using GUID value as the primary key leads to primary key index fragmentation. You can follow the methods described above to understand index fragmentation.

Related Products

Simple Application Server

Simple Application Server is a new generation computing service for stand-alone application scenarios.

Elastic Compute Service

Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency.

Related Courses

MySQL for Beginners - Basic Queries

This course is the first class of Alibaba Cloud MySQL for Beginners series. It mainly introduces the usage of SELECT statement and its clauses,It also describes how to use warning and error messages to troubleshoot problems in MySQL and how to set SQL mode and how the SQL mode affects the intepretation of SQL syntax.

MySQL Database Concepts and Operations

After learning this MySQL series courses, the trainees will master the MySQL relevant basic concept, deployment procedure, HA solution, commonly used backup and recovery method and daily inspection skills, thus having a comprehensive understanding of MySQL database.

Related Documentation

Upgrade a local SSD-based instance from SQL Server 2008 R2 to SQL Server 2012 or 2016

This topic describes how to upgrade a local SSD-based instance from SQL Server 2008 R2 to SQL Server 2012 or 2016. During the upgrade, you can also migrate the instance to other zones.

Alibaba Cloud has stopped providing security updates for ApsaraDB RDS for SQL Server 2008 R2 instances purchased on and after July 9, 2019. For more information, see Supplementary service agreement. We recommend that you upgrade your version as soon as possible.

For information about the functional differences between different versions and editions, see Functions supported by different versions and editions of SQL Server.

Upgrade an instance from SQL Server 2012 to SQL Server 2016

After the upgrade is completed, you must switch over services. The downtime caused by the switchover varies depending on the instance size. In most cases, switchover can be completed within 20 minutes. We recommend that you switch over services during system maintenance. Make sure each application can be reconnected in the event of disconnection.

Related Market Products

Jenkins powered by Miri Infotech

Jenkins is an automation server with an unparalleled plugin ecosystem for supporting practically every tool as a part of the delivery pipelines.

3 Major Scenarios You Should Know to Migrate to Alibaba Cloud

Through this course, you will fully understand server migration, database migration, storage migration solutions provided by Alibaba Cloud.

0 0 0
Share on

Alibaba Clouder

2,606 posts | 737 followers

You may also like