edit-icon download-icon

Analysis on slow query after the database is migrated to ApsaraDB for RDS

Last Updated: Jul 13, 2016

Why does my ApsaraDB for RDS get slower suddenly?This problem may be nagging many customers during usage of ApsaraDB for RDS. Next, I will analyze the reasons for slow ApsaraDB for RDS services through real world cases:


Case 1: After the user migrates databases to ApsaraDB for RDS from PGSQL, the ApsaraDB for RDS gets slower.


Description: After the user's databases (PGSQL) are moved to ApsaraDB for RDS (MySQL), the user finds the same query (million level data) takes approximately 0.015s in the previous PostgreSQL, while it takes six minutes and 20 seconds in ApsaraDB for RDS. The execution is too slow, which has severely impaired the user's confidence in ApsaraDB for RDS.

Possible cause: Why does it need only 0.015s in PostgreSQL, but six minutes and 20 seconds in ApsaraDB for RDS? Experience tells us the SQL execution plan may have changed, which leads to the dramatic increase in execution time length.

Troubleshooting: Run "explain" command to view SQL execution plan, and optimize the plan step by step.

Through analysis, we identify a full table scan on Table B (the last row of the execution plan), but there is no tid index in Table B. Here we can optimize the plan to reduce number of associated rows during the query:
---------------------------------------------------------------------------------------------------------------------

We can see the rows in the execution plan has changed from 452 to 2 (the last row).
As MySQL only has "nest loop join" of tables, we can estimate the effects of optimization here:
Old execution plan 1: 1055789*1*1*1*1*452 scanned rows
New execution plan 2: 1055789*1*1*1*1*2 scanned rows
Execution time:

We can see the execution time has reduced from the six minutes and 20 seconds to 10 seconds. We can continue the optimization.
There are only eight rows in the SQL result set, but the scanned rows are numerous (1055789*1*1*1*1*2). One key in optimizing SQL query is to optimize SQL execution routes, t=s/v. If we can improve s, the speed will go up sharply. Let's take a look at the last row in the query:
-> WHERE EXISTS
-> (SELECT 1 FROM xxxx_test5 b WHERE a.tid = b.tid);
The query aims to get the details about every order, so it has to associate other tables, but the last "exist" condition limits the final result scope. Let's check out the size of the Table xxxx_test5:
mysql> SELECT COUNT(*) FROM xxxx_test5;
+----------+
| COUNT(*) |
+----------+
| 403 |
+----------+
1 ROW IN SET (0.00 sec)
mysql> SELECT COUNT(*) FROM xxxx_test5 b ,xxxx_test a WHERE
a.tid = b.tid ;
+----------+
| COUNT(*) |
+----------+
| 8 |
+----------+
1 ROW IN SET (0.42 sec)
There are eight rows of records after the two tables join. If we join the order table xxxx_test and the qualified table first, and then connect them with other order information tables, the number of associated rows will be greatly decreased. We can modify the query further:

From the execution plan analysis, we find the qualified table xxxx_test5 was used as the driving table, and the change to the driving table is the root cause of the problem. Scanned rows: 452*1*1*1*1. Now the SQL query is very fast:
Mysql-->;
SELECT a.oi.............
........ indicates the omitted results.
8 ROWS IN SET (0.13 sec)

To sum up: the SQL execution plan changes due to the environment migration, which is the root cause of slow ApsaraDB for RDS services.

Case 2: Frequent connection timeouts during usage of ApsaraDB for RDS (MS SQL)


Description: When I use the ApsaraDB for RDS (MS SQL), there will be frequent errors as follows. Is it because the connections exceed the limit?
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error:26 – Error Locating Server/Instance Specified)

Possible cause: The user's applications may be poorly designed, which leads to many cases of database lock contention; or because the improper indexing leads to full table scans, thus the database waits;

Troubleshooting: From the database monitoring index, we find a high occurrence of full table scans during a time period, and the database lock contention times out, with sessions markedly increased.
---------------------------------------------------

We can find many full table scans and many lock contention timeouts at 15:00. A lot of sessions exist in the database at the same time, and it is just the time when user reports the error. Check the user's SQL query and some views of MS SQL, we can identify the Top 5 resource-consuming queries. We find the user is frequently querying a view with multiple tables joined. But there is no index added to the fields of the table connection, which results in full table scans. The user view is as follows:
CREATE VIEW [dbo].[Vi_xxx]
AS
SELECT ..........
..........
FROM dbo.xxxx_test6 INNER JOIN
dbo.xxxx_test1 ON dbo.xxxx_test5.docID = dbo.xxxx_test1.docID
INNER JOIN
dbo.xxxx_test2 ON dbo.xxxx_test5.typeID = dbo.xxxx_test2.typeID
INNER JOIN
dbo.xxxx_test3 ON dbo.xxxx_test5.docID = dbo.xxxx_test3.docID
INNER JOIN
dbo.xxxx_test4 ON dbo.xxxx_test5.categoryID =
dbo.xxxx_test4.categoryID
WHERE (dbo.xxxx_test5.isDelete = 0)
Check the execution plan and we can find the index in many joined fields of the tables is absent, which leads to many full table scans in the execution plan:


To sum up: The user queries a view frequently, but the view has no index in the joined fields of the tables, which leads to a lot of full table scans and many sessions are piled up. This results in lowered performance of the database and connection errors between the applications and databases.

Case 3: Implicit conversion leads to full table scans


Descriptions: The user's website is very slow to open, and the user doubts the ApsaraDB for RDS performance.
Possible cause: The user's data are stored in ApsaraDB for RDS and website accesses to the database are slow. It is primarily related to poor web application design, unoptimized SQL queries or poor indexing;
Troubleshooting: Through checking the slow query logs, we find a lots of slow queries which take longer than two seconds.
UPDATE USER SET xx=xx+N.N WHERE
account=130000870343 LIMIT 10
SELECT * FROM USER WHERE
account=13056870 LIMIT 10
Maybe the indexing is absent on the user table:
CREATE TABLE `user` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`account` char(11) NOT NULL COMMENT ‘???’,
…………………….
…………………….
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`account`),
…………………….
) ENGINE=InnoDB CHARSET=utf8 ;
From the execution plan, we find full table scans: db@3027 16:55:06>explain
select * from user where account=13056870343;
+—-+————-+——–+——+—————+——+———+——+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
+—-+————-+——–+——+—————+——+———+——+——+————-+
| 1 | SIMPLE | t_user | ALL | username | NULL | NULL | NULL | 799 |
Using where |
+—-+————-+——–+——+—————+——+———+——+——+————-+
1 row in set (0.00 sec)
Why full table scans are used here?Isn't indexing in place on the account? After a further look,
we find the account is defined as a string, while the condition for passing in is number. We all know that the number is more accurate than the string, so here the implicit conversion is adopted: to_number(account)=13056870343
(to_number means to convert the string into number). In this way, even the account has an index in place, it cannot be used. So here we need to change the passed in number into the string:
db@3027 16:55:13>EXPLAIN SELECT * FROM USER WHERE
account='13056870343';
+----+-------------+--------+-------+---------------+----------+------
| id | select_type | TABLE | TYPE | possible_keys | KEY |
key_len | REF | ROWS | Extra |
+----+-------------+--------+-------+---------------+----------+------
| 1 | SIMPLE | t_user | const | username | username | 33
| const | 1 | |
+----+-------------+--------+-------+---------------+----------+------
1 ROW IN SET (0.00 sec)
Now we can see the index username and indexing is now working.

To sum up: The user defined to use the string in the field when designing the table structure, but the number was conditioned for passing in, which leads to the implicit conversion. This is a typical and recurrent issue during database usage. The ApsaraDB for RDS is stable enough, but even the most advanced databases won't stand the challenges of poor queries. So the SQL query optimization is a long-term task.

From the above three cases, we can sum up that in most cases, the query timeouts, low performance, connection timeouts and other issues of ApsaraDB for RDS are results of the application design, unoptimized query or poor indexing. ApsaraDB for RDS usually works unless the instance is unavailable (host goes down, instance service gets stopped, or instance is locked due to too large size), and for instance faults, the ApsaraDB for RDS will monitor the status and issue alerts.
Thank you! We've received your feedback.