edit-icon download-icon

Analysis on 100% of CPU utilization on ApsaraDB for RDS instance

Last Updated: Mar 06, 2017

Regarding the issue of ApsaraDB for RDS instance CPU exceeding 100%, it is usually caused by poor query performance. I will analyze the cause with a case study:
The user reports that the instance xxx CPU exceeds 100%, and sometimes the instance gets stuck

High CPU utilization is usually caused by slow query. Slow query indicates SQL statement that conducts full table scan, data sorting (especially disk sorting) and lock contention wait and so on.
2. Manifestation: the thread state is: sending data, copying to tmp table, copying to tmp
table on disk, sorting result, locked;
3. Solutions: the user can log into ApsaraDB for RDS, identify the poor performance query with show processlist command, then improve the query performance by adjusting index or query rewrite.
A. Sending data: indicating the query is in process in the table. Without a proper index in place for the query conditions, the query may take long.
B. Copying to tmp table on disk: this status usually indicates the temporary result set is too big in size and exceeds the specified capacity of the temporary memory. As a result, the temporary result set will be copied to a disk. In such cases, the user needs to optimize the query;
C. Sorting result, Using filesort: indicating the data sorting is in process. The sorting will consume much CPU. A common solution is to add a proper index to remove the sorting, or downsize the sorting result set;
We can find the following query with the show processlist command:
Sql A.
| 2815961 | sanwenba | 10.241.142.197:55190 | sanwenba |
Query | 0 | Sorting result | select z.aid,z.subject from
www_zuowen z right join www_zuowenaddviews za on za.aid=z.aid order by
za.viewnum desc limit 10;
Performance query:
select z.aid,z.subject from www_zuowen z right join www_zuowenaddviews za
on za.aid=z.aid order by za.viewnum desc limit 10;
View the execution plan with explain command:
sanwenba@3018 10:00:54>explain select z.aid,z.subject from www_zuowen z
right join www_zuowenaddviews za on za.aid=z.aid order by za.viewnum desc
limit 10;
+——+——————-+———-+————+———————-+————-+————-+————————-+———
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
+——+——————-+———-+————+———————-+————-+————-+————————-+———
| 1 | SIMPLE | za | index | NULL | viewnum | 6 |
NULL | 537029 | Using index; Using filesort |
| 1 | SIMPLE | z | eq_ref | PRIMARY | PRIMARY | 3 |
sanwenba.za.aid | 1 | |
Remove sorting by adding a proper index:
sanwenba@3018 10:02:33>alter table www_zuowenaddviews add index
ind_www_zuowenaddviews_viewnum(viewnum);
sanwenba@3018 10:03:27 AM>explain select z.aid,z.subject from www_zuowen z
right join www_zuowenaddviews za on za.aid=z.aid order by za.viewnum desc
limit 10;
+——+——————-+———-+————+———————-+————————————————+————-+-
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+——+——————-+———-+————+———————-+————————————————+————-+-|
1 | SIMPLE | za | index | NULL |
ind_www_zuowenaddviews_viewnum | 3 | NULL | 10 | Using index |
| 1 | SIMPLE | z | eq_ref | PRIMARY PRIMARY | 3 | sanwenba.za.aid
| 1 | |
+——+——————-+———-+————+———————-+————————————————+————-+-
Sql B:
| 2825321 | netzuowen | 10.200.120.41:44172 | netzuowen |
Query | 2 | Copying to tmp table on disk |
SELECT FROM www_article WHERE 1=1 ORDER BY rand() LIMIT 0,30
Such statements with sql order by rand() will also trigger sorting;
netzuowen@3018 10:23:55>explain SELECT FROM www_zuowensearch
WHERE checked = 1 ORDER BY rand() LIMIT 0,10 ;
+——+——————-+—————————+———+———————-+————+————-+———-+———+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
+——+——————-+—————————+———+———————-+————+————-+———-+———+
| 1 | SIMPLE | www_zuowensearch | ref | newest | newest | 1 |
const | 1443 | Using temporary; Using filesort |
+——+——————-+—————————+———+———————-+————+————-+———-+———+
Such practices of extracting the logs at random are very inefficient. The larger the data size in the table, the lower the performance:
Solutions are as follows:
The first scheme, i.e., the original Order By Rand() method:
$sql=”SELECT FROM content ORDER BY rand() LIMIT 12”;
$result=mysql_query($sql,$conn);
$n=1;
$rnds=’’;
while($row=mysql_fetch_array($result)){
$rnds=$rnds.$n.”.
href=’show”.$row[‘id’].”-“.strtolower(trim($row[‘title’])).”‘>”.$row[‘title’].”
/>”;
$n++;
}
It takes 0.125 seconds to check 12 rows of logs at random from 30,000 rows of data. With the data size increasing, the efficiency will drop.
The second scheme, i.e., the improved Join method:
for($n=1;$n<=12;$n++){
$sql=”SELECT FROM content AS t1
JOIN (SELECT ROUND(RAND() (SELECT MAX(id) FROM content)) AS id) AS t2
WHERE t1.id >= t2.id ORDER BY t1.id ASC LIMIT 1”;
$result=mysql_query($sql,$conn);
$yi=mysql_fetch_array($result);
$rnds = $rnds.$n.”.
href=’show”.$yi[‘id’].”-“.strtolower(trim($yi[‘title’])).”‘>”.$yi[‘title’].”
“;
}
It takes 0.004 seconds to check 12 rows of logs at random from 30,000 rows of data. The efficiency is greatly enhanced than the first scheme
by about 30 times. Deficiency: multiple select queries, many IO required
The third scheme, first arrange the random ID sequence of SQL statements, and query through IN(recommended as it requires low IO
and is the fastest):
$sql=”SELECT MAX(id),MIN(id) FROM content”;
$result=mysql_query($sql,$conn);
$yi=mysql_fetch_array($result);
$idmax=$yi[0];
$idmin=$yi[1];
$idlist=’’;
for($i=1;$i<=20;$i++){
if($i==1){ $idlist=mt_rand($idmin,$idmax); }
else{ $idlist=$idlist.’,’.mt_rand($idmin,$idmax); }
}
$idlist2=”id,”.$idlist;
$sql=”select from content where id in ($idlist) order by field($idlist2) LIMIT
0,12”;
$result=mysql_query($sql,$conn);
$n=1;
$rnds=’’;
while($row=mysql_fetch_array($result)){
$rnds=$rnds.$n.”.
href=’show”.$row[‘id’].”-“.strtolower(trim($row[‘title’])).”‘>”.$row[‘title’].”
/>”;
$n++;
}
It takes 0.001 seconds to check 12 rows of logs at random from 30,000 rows of data. The efficiency is enhanced than the second scheme by about four times, and than the first scheme by 120 times. Note that here we use order by field($idlist2) for no sorting, or else, the IN command will conduct auto sorting. Deficiency: some ID may be deleted, so you need to select several more ID.
C. Sending data:
| 2833185 | sanwenba | 10.241.91.81:45964 | sanwenba | Query
| 1 | Sending data | SELECT FROM www_article WHERE
CONCAT(subject,description) like ‘%??%’ ORDER BY aid desc LIMIT 75,15
Performance query:
SELECT FROM www_article WHERE CONCAT(subject,description) like
‘%??%’ ORDER BY aid desc LIMIT 75,15
Such query is a typical example of nonstandard page breaks. The query needs to be rewritten:
select * from www_article t1,(select aid from www_article where
CONCAT(subject,description) like ‘%??%’ ORDER BY aid desc LIMIT 75,15)t2 where t1.aid=t2.aid;
Attention: here the covering index should be utilized: aid+ subject+description
Optimized result:

To sum up:

optimization pyramid

The SQL optimization is the last step of performance optimization. Although placed on the pyramid top, it affects the user experience most and is the easiest to achieve with the most direct effects. For ApsaraDB for RDS for MySQL, due to the resource isolation, different instance specifications own different IOPS capacities. For example, the new Type 1 instance can offer 150 IOPS, i.e., 150 random disk input/output operations per second. If a user has a large data size but a small memory, due to the IOPS limit, a slow query may consume all the IO resources and affect other queries. For the database, the result is all the queries will take a long time to return the result, and for the application, the result is a slow response overall.

Thank you! We've received your feedback.