All Products
Search
Document Center

Query and case analysis Linux CPU load

Last Updated: Apr 21, 2022

Disclaimer: This topic may contain information about third-party products. The information is for reference only. Alibaba Cloud does not make a guarantee in any form of the performance and reliability of the third-party products, and potential impacts of operations on these products.

Overview

This topic describes the queries and case studies of Linux CPU loads.

Description

Take note of the following items:

  • Before you perform high-risk operations such as modifying the specifications or data of an Alibaba Cloud instance, we recommend that you check the disaster recovery and fault tolerance capabilities of the instance to ensure data security.
  • Before you modify the specifications or data of an Alibaba Cloud instance, such as an Elastic Compute Service (ECS) instance or an ApsaraDB RDS instance, we recommend that you create snapshots or enable backups for the instance. For example, you can enable log backups for an ApsaraDB RDS instance.
  • If you have granted specific users the permissions on sensitive information, such as usernames and passwords, or submitted sensitive information in the Alibaba Cloud Management Console, we recommend that you modify the sensitive information at the earliest opportunity.

Note: The configurations and descriptions in this topic have been tested in CentOS 6.5 64-bit operating systems. The configurations of other distributions may vary. For more information, see the official documentation of the corresponding distribution.

If the CPU utilization of the Linux ECS instance continues to be high, the system stability and business operation will be affected. You can follow these steps.

  1. Locate the problem. Find the specific process that affects the high CPU usage.
  2. Analysis and processing. Check whether the processes that affect the high CPU usage are normal and classify them for processing.

    • For normal processes, you must optimize the program or upgrade server configurations.
    • For abnormal processes: You can manually check and kill processes or use third-party security tools to check and kill processes.

Query and analysis of CPU load

In the Linux, the following command is used to view the process. This topic describes vmstat and top.

  • vmstat
  • top
  • ps -aux
  • ps -ef

Use the vmstat command to view

Run the vmstat command to view the CPU resource usage from the system dimension. The command format is similar to the following, indicating that the result is refreshed every second.

vmstat -n 1

The following figure shows an example.

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2684984 310452 2364304 0 0 5 17 19 35 4 2 94 0 0
0 0 0 2687504 310452 2362268 0 0 0 252 1942 4326 5 2 93 0 0
0 0 0 2687356 310460 2362252 0 0 0 68 1891 4449 3 2 95 0 0
0 0 0 2687252 310460 2362256 0 0 0 0 1906 4616 4 1 95 0 0

Note: The main data columns in the returned results are described as follows.

  • r: indicates the thread that the CPU waits for processing in the system. A CPU can only process one thread at a time, so the larger the value, the slower the system runs.
  • us: the percentage of CPU time consumed in user mode. If this value is high, the user process consumes more CPU time. If the value exceeds 50% for a long time, you need to optimize the program algorithm or code.
  • sy: the percentage of CPU time consumed in kernel mode.
  • wa: the percentage of CPU time consumed by IO waiting. If this value is high, it indicates that the IO wait is serious, which may be caused by a large number of random access to the disk, or it may be a bottleneck in disk performance.
  • id: the percentage of CPU time that is idle. If the value is continuously 0 and sy is twice that of us, it usually indicates that the system is facing a shortage of CPU resources.

Use the top command to view

  1. Log on to a Linux instance. For more information about how to log on to a Linux instance, see Connect to a Linux instance by using a management terminal.
  2. Run the following command to view the usage of CPU and memory resources from the process latitude: The command format is similar to the following.
    top
    If an output similar to the following one is returned, one of the solutions is applicable to your system kernel version:
    top - 17:27:13 up 27 days,  3:13,  1 user,  load average: 0.02, 0.03, 0.05
    Tasks:  94 total,   1 running,  93 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.1 st
    KiB Mem:   1016656 total,   946628 used,    70028 free,   169536 buffers
    KiB Swap:        0 total,        0 used,        0 free.   448644 cached Mem
    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    1 root      20   0   41412   3824   2308 S  0.0  0.4   0:19.01 systemd
    2 root      20   0       0      0     0 S  0.0  0.0   0:00.04 kthreadd
  3. For load problems, you only need to focus on the first and third lines of information. The detailed description is as follows.
    1. The first line of the top command displays the contents of the up 27 days at 17:27:13, 3:13, 1 user, load average: 0.02, 0.03, and 0.05, which are the current system time, the time that the system has been running so far, the number of users currently logged in to the system, and the system load. This is consistent with the query result of directly executing the uptime command.
    2. The third line of the top command displays the overall usage of current CPU resources. The resource usage of each process is displayed below.
  4. You can use the P key to sort the CPU usage in reverse order, and then locate the processes in the system that occupy a high amount of CPU.
    Tip: You can use the M key to sort the system memory usage. If there is a multi-core CPU, the number key 1 can display the load status of each core CPU.
  5. Run the following command to view the program file corresponding to each process ID:
    ll /proc/PID/exe

Example

Case 1: Use the top command to terminate a process that consumes a large amount of CPU

Run the top command to view the load problem of the system, locate the process that consumes more CPU resources, and quickly terminate the corresponding abnormal process on the running interface.

  1. To terminate a process, type a lowercase k first.
  2. Enter the PID of the process you want to terminate. The default is the first PID of the output result. As shown in the following figure, if you want to terminate the process with PID 23, enter 23 and press Enter.
  1. After the operation succeeds, a similar Send pid 23 signal [15/sigterm] message appears on the interface for the user to confirm. Press Enter to confirm.

Case 2: Low CPU usage but high load

Problem description

No business programs are currently running Linux. The top command shows that the CPU is idle, but the load average is very high, as shown in the following figure.

Solution

The load average is used to evaluate the CPU load. A higher value indicates a longer task queue and more tasks waiting to be executed. When this happens, it may be caused by a rigid process. You can use the ps -axjf command to check whether there is a D+ state process, which refers to an uninterruptible sleep state. Processes in this state cannot be terminated or exited on their own. This can only be resolved by restoring the resources on which it depends or restarting the system.

Case 3: The kswapd0 process consumes a high amount of CPU.

The operating system uses a paging mechanism to manage physical memory. The operating system deletes a part of the disk as virtual memory. Since the memory speed is much faster than the disk, the operating system needs to change the unnecessary pages to the disk according to some page change mechanism and adjust the required pages to the memory. Due to the continuous lack of memory, this page change action continues. Kswapd0 is a page-changing process in virtual memory management. When the server is out of memory, kswapd0 will perform a page-changing operation, which consumes CPU resources of the host. If the top command finds that the process is continuously in a non-sleep state and has a long up time, it can be preliminarily determined that the system is continuously changing pages, and the problem can be turned to the cause of insufficient memory for troubleshooting.

Problem description

The kswapd0 process occupies a large amount of CPU resources in the system.

Solution

  1. Run the following command to view the kswapd0 process:
    top
    The system display is similar to the following. It is found that the kswapd0 process continues to be in a non-sleep state, has a long up time and continuously occupies higher CPU resources, which is usually due to the continuous page change operation of the system.

  2. Use free and ps commands to further query the memory usage of the system and processes in the system for further troubleshooting and analysis.
  3. To solve the problem of insufficient memory, you can restart some services to release memory.
    Tip: In the long run, you need to upgrade the memory size.

Applicable scope

  • Elastic Compute Service (ECS)