All Products
Search
Document Center

Query and case analysis of CPU load in Linux

Last Updated: Sep 22, 2020

Disclaimer: This article may contain information about third-party products. Such information is for reference only. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

 

Overview

This article describes how to query and analyze the CPU load in Linux.

 

Description

Note: The configurations and descriptions in this article have been tested in the CentOS 6.5 64-bit operating system. The configurations of other releases may be different. For more information, see the official documentation of the corresponding releases.

If the CPU usage of a Linux-based ECS instance keeps highWill have an impact on system stability and business operations. Follow these steps.

  1. Locate the problem. Find the process that affects the high CPU usage.
  2. Analyze and process. Check whether the processes that affect the high CPU usage are normal and handle them by type.

    • For normal processes: you need to optimize the program orUpgrade the server configuration.
    • For abnormal processes, you can manually or use third-party security tools to kill them.

 

Query and analysis of CPU load

The following command is commonly used to view processes in Linux. This article introduces vmstat and top.

  • vmstat
  • top
  • ps -aux
  • ps -ef

 

Run vmstat

Run the vmstat command to view the usage of CPU resources from the system perspective. The command format is similar to the following, indicating that the result is refreshed once a second.

vmstat -n 1

An example is as follows:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2684984 310452 2364304 0 0 5 17 19 35 4 2 94 0 0
0 0 0 2687504 310452 2362268 0 0 0 252 1942 4326 5 2 93 0 0
0 0 0 2687356 310460 2362252 0 0 0 68 1891 4449 3 2 95 0 0
0 0 0 2687252 310460 2362256 0 0 0 0 1906 4616 4 1 95 0 0

Note: The primary data columns in the returned results are described as follows.

  • R: indicates the thread in the system where the CPU is waiting for processing. A cpu can process only one thread at a time, so a larger value generally indicates the slower the system runs.
  • Us: the percentage of CPU time consumed by the user mode. A large value indicates that the user process consumes a large amount of CPU time. If the value exceeds 50% for a long time, you need to optimize the program algorithm or code.
  • Sy: Percentage of CPU time consumed by the kernel mode.
  • Wa: Percentage of CPU time consumed by I/O wait. If this value is too high, it indicates that the I/O wait is relatively heavy. This may be caused by a large amount of random access to the disk, or it may be a disk performance bottleneck.
  • Id: Percentage of CPU time in the idle state. If the value is always 0 and sy is twice the value of us, it generally indicates that the system is facing a shortage of CPU resources.

 

Use the top command to view

  1. Log on to a Linux instance. For more information about how to log on to a Linux instance, seeConnect to a Linux instance by using a management Terminal.
  2. Run the following command to view the usage of CPU, memory, and other resources from the process dimension. The command format is as follows.
    top
    A similar output is displayed.
    top - 17:27:13 up 27 days,  3:13,  1 user,  load average: 0.02, 0.03, 0.05
    Tasks: 94 total, 1 running, 93 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 0.3 us, 0.1 sy, 0.0 ni, 99.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.1 st
    KiB Mem: 1016656 total, 946628 used, 70028 free, 169536 buffers
    KiB Swap: 0 total, 0 used, 0 free.   448644 cached Mem
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    1 root 20 0 41412 3824 2308 S 0.0 0.4 0:19.01 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd
  3. To address load issues, you only need to check the information in the first and third rows of the response.
    1. The first line of the top command shows the content 17:27:13 up 27 days, 1 user, load average: 0.02, 0.03, 0.05, the information includes the current system time, the time that the system has run so far, the number of users currently logged on to the system, and the system load, which is consistent with the query result of directly running the uptime Command.
    2. The third row of the top command displays the overall usage of the current CPU resources, and the resource usage of each process is displayed below.
  4. PassPTo locate the processes that consume the CPU in the system.
    Tip: PassMTo sort the system memory usage. If a multi-core CPU exists, the numeric key 1 shows the load status of each CPU core.
  5. Run the following command to view the program file corresponding to each process ID.
    Ll/proc/PID/exe

 

Operation case

Case 1: run the top command to terminate processes that consume a large amount of CPU resources.

Run the top command to view the system load, locate the processes that consume a large amount of CPU resources, and quickly terminate the abnormal processes on the running interface.

  1. To stop a process, enter a lower-case letter.k.
  2. Enter the PID of the process to be terminated. By default, it is the first PID in the output result. As shown in the following figure, if you want to stop the process with PID 23, enter 23 and press enter.
  1. After the operation is successful, the interface displaysSend pid 23 signature [15/sigterm]For the user to confirm. Press enter to confirm.

 

Case 2: Low CPU usage but high load

Problem description

No services are running on the Linux system. The top command shows that the CPU is idle, but the load average is very high, as shown in the following figure.

 

Solution

Load average evaluates the CPU load. A higher value indicates a longer task queue and more tasks are waiting to be executed. This error may be caused by a deadlocked process. You can usePs-axjfCommand to check whether it existsD +The status of the process. A process in this state cannot be stopped or automatically exited. You can only restore the resources that it depends on or restart the system.

 

Case 3: The kswapd0 process occupies a large amount of CPU

The operating system uses the page sharding system to manage the physical memory. The operating system allocates a part of a disk as the virtual memory. Because the memory speed is much faster than the disk, the operating system needs to redirect unnecessary pages to the disk according to a page changing mechanism, set the required page to memory. The page changing operation continues due to insufficient memory. Kswapd0 is the page change process in the virtual memory management. When the server memory is insufficient, kswapd0 performs the page change operation, which consumes a lot of host CPU resources. If you run the top command to find that the process remains in a non-sleep state and runs for a long time, you can initially determine that the system is continuously changing pages, we can turn the problem to the cause of insufficient memory for troubleshooting.

 

Problem description

Kswapd0 processes occupy a large amount of CPU resources.

 

Solution

  1. Run the following command to view the kswapd0 process.
    top
    If a similar output is displayed, the kswapd0 process continues to be in a non-sleep state, runs for a long time, and occupies a large amount of CPU resources, this is generally because the system is constantly changing pages.

  2. Use free and ps commands to further query the system and the memory usage of processes in the system for further troubleshooting and analysis.
  3. You can restart some services to release the memory.
    Tips: In the long run, you need to upgrade the memory size.

 

Application scope

  • ECS