All Products
Search
Document Center

:Troubleshoot high CPU utilization or load on a Linux instance

Last Updated:Mar 25, 2026

Symptoms

High CPU utilization or system load can cause the following symptoms:

  1. Application service issues

    • SSH remote connections are slow or unresponsive, and may fail in severe cases.

    • The response time for your website or application increases significantly, and pages load slowly.

    • Requests frequently time out, API calls fail, and the overall processing capacity of your services is noticeably reduced.

  2. System resource anomalies

    • The instance CPU utilization consistently exceeds 80%, or even approaches 100%.

    • The system load (load average) is consistently higher than the number of logical CPU cores. For example, the load average is greater than 4 on a 4-core instance.

    • CloudMonitor has triggered alerts for high system load by SMS or email.

Causes

  • CPU-intensive processes: This issue is often caused by flawed application logic, such as infinite loops, complex computational tasks, or high-concurrency requests, which can include both legitimate traffic and malicious attacks.

  • I/O performance bottleneck: Frequent disk read/write operations or insufficient storage performance cause processes to spend a long time waiting for I/O, which increases the system load average.

  • Kernel or system calls: Frequent context switches, kernel tasks, or driver issues lead to high CPU utilization in kernel mode.

  • Abnormal or malicious programs: The instance is infected with crypto-mining software, a Trojan horse, or a rootkit with hidden processes that consume a large amount of computing resources.

Resolution

Use the top tool to identify the primary cause of high CPU utilization or system load, such as user space, kernel space, or I/O wait. Then, use tools like perf, iotop, or vmstat to perform a deeper analysis and resolve the issue.

Step 1: Identify the CPU bottleneck

  1. Log on to an ECS instance using a VNC connection.

    1. Go to ECS console - Instances. In the top navigation bar, select the target region and resource group.

    2. Go to the details page of the target instance. Click Connect and select VNC. Enter the username and password to log on to the ECS instance.

  2. Check the system load and process activity.

    sudo top
  3. Identify the cause of the problem.

    In the top interactive interface, press P to sort processes by CPU utilization in descending order. Identify the process ID (PID) and command name (COMMAND) of the process that consumes the most CPU.

    • If an application process such as java, python, or php-fpm consistently uses more than 80% of the CPU, see Handle busy application processes.

    • If the I/O wait (wa) value in the %Cpu(s) line is consistently above 20%, while user space (us) and kernel space (sy) values are low, and the load average is much higher than the number of CPU cores, it indicates that the CPU is spending a lot of time idle while waiting for disk I/O. In this case, see Handle disk I/O bottlenecks.

      When a process waits for a disk I/O operation to complete, it enters an uninterruptible sleep state (D state) and cannot be terminated. A large number of processes in the D state indicates a slow disk response, which forces the CPU to wait and increases the system load.
    • If the system (sy) value in the %Cpu(s) line is consistently above 30%, it usually indicates that the kernel is frequently executing system calls or handling interrupts. In this case, see Handle busy kernel or system calls.

    • If the softirq (si) value in the %Cpu(s) line is consistently above 15%, it indicates high network traffic. In this case, see Handle busy network interrupts.

Step 2: Analyze and resolve the issue

Busy application processes

  • Analyze and optimize your code:

    Use performance profiling tools to locate hot spots in your code.

    • Java applications: Use jstack <PID> to export thread stacks. Search for threads in the RUNNABLE state and check if the call stack is stuck in a specific method for a long time.

    • C/C++ applications: Use perf top -p <PID> to view the specific function symbols that are consuming the most CPU.

    Based on the analysis, optimize algorithms, fix infinite loops, or reduce unnecessary computations.

  • Mitigate application-layer attacks: If you are experiencing a malicious application-layer CC attack, which is characterized by a large number of unusual HTTP requests, deploy a web application firewall (WAF) for protection. For more information, see Protect an ECS instance from CC attacks by using WAF.

  • Upgrade resources: If the bottleneck is caused by normal business growth, upgrade the instance type.

Disk I/O bottlenecks

  1. Identify the process that causes high I/O: Troubleshoot high disk I/O load on a Linux system.

  2. Check for a buildup of processes in the D state:

    sudo ps -axjf | grep " D"
    • Application optimization: Reduce logging levels and add indexes to database queries to decrease disk read/write operations.

    • Upgrade storage: You can upgrade the cloud disk category (for example, from ESSD PL1 to ESSD PL2/PL3) to improve IOPS and throughput. The maximum IOPS of a cloud disk is also constrained by the attached instance type. If the IOPS limit of the instance type is lower than the capability of the cloud disk, you must upgrade the instance type.

    • Restart the system: A system restart can clear a buildup of processes in the D state.

Busy kernel or system calls

  1. Check context switches: Run the vmstat 1 command and observe the value in the cs (context switch) column. If this value consistently exceeds 100,000, the rate of context switches is excessive. Check whether your application is creating and destroying too many threads.

  2. Check kernel tasks: If the kswapd0 process shows high CPU utilization, the physical memory is insufficient and the kernel is frequently reclaiming memory. To resolve this, upgrade the instance type.

    When physical memory is low, kswapd0 frequently scans, reclaims, and swaps out memory pages. These compute-intensive tasks consume significant CPU resources and cause high CPU utilization.

Busy network interrupts

  1. Analyze traffic: Use tools such as iftop or iptraf-ng to analyze the source and type of network traffic.

  2. Check the configuration: For high network workloads, you can enable multi-queue for the network interface card (NIC) to distribute interrupts across multiple CPU cores.

  3. Respond to network attacks:

Next steps

  • Configure monitoring and alerting: Set reasonable alert thresholds for metrics such as CPU utilization, system load, and I/O wait to receive early warnings. To perform historical analysis of Linux system metrics, you can use the atop tool to monitor Linux system metrics.

  • Perform regular security checks: Use Security Center to periodically perform vulnerability scans, virus scans, and baseline checks on your hosts to fix potential security risks.

  • Perform regular reviews and optimizations: Periodically review the performance and code of your systems and applications to identify and resolve potential performance bottlenecks.

  • Capacity planning: Plan resource capacity based on business growth trends to ensure your system can handle future load.