All Products
Search
Document Center

The average system load is high after the Server Guard process is started on an ECS instance of the Ubuntu 18.04 version

Last Updated: May 16, 2022

Disclaimer: This document may contain information about third-party products. This information is for reference only. Alibaba Cloud does not make a guarantee in any form of the performance and reliability of the third-party products, and potential impacts of operations on these products.

Problem description

In an ECS instance of the Ubuntu 18.04 version, the average system load is high after the Server Guard process (AliYunDun) is started. After you disable the Server Guard process (AliYunDun), the system load average returns to normal.

Cause

In some kernel (Kernal) versions of the Ubuntu system, the CPU usage calculation method for nanosleep() functions and usleep() functions has defects. When a program process calls these two functions, the average system load is displayed higher. Currently known kernel versions with this defect are as follows:

  • 4.15.0-72
  • 4.15.0-74

The Server Guard calls the preceding functions. Therefore, after the Server Guard process is started, the average system load is high in the ECS instance. Ubuntu systems that use the preceding kernel versions have the same defects, such as Ubuntu 18.04. For more information about this defect, see the following documents:

Solution

Take note of the following items:

  • Before you perform high-risk operations such as modifying the specifications or data of an Alibaba Cloud instance, we recommend that you check the disaster recovery and fault tolerance capabilities of the instance to ensure data security.
  • Before you modify the specifications or data of an Alibaba Cloud instance, such as an Elastic Compute Service (ECS) instance or an ApsaraDB RDS instance, we recommend that you create snapshots or enable backups for the instance. For example, you can enable log backups for an ApsaraDB RDS instance.
  • If you have authorized or submitted security information such as logon * and * on the Alibaba Cloud platform, we recommend that you modify it in a timely manner.

This defect may not only exist in the above-mentioned known kernel versions, but other versions of the kernel may also have this defect. To further confirm whether the Server Guard process is affected by this defect, you need to run the test program in your Ubuntu instance and process it according to the test results. Details are as follows:

Note: To ensure the accuracy of testing and avoid affecting other service programs, we recommend that you generate a custom image for your Ubuntu instance and use the custom image to create a temporary ECS instance (pay-as-you-go) for testing. Release the temporary ECS instance after the test is complete. For more information about how to create a custom image, see Create a custom image.

  1. Download the test program test_high_cpu_load.zip.
    Description: The MD5 value of the test_high_cpu_load.zip is 1795f7825c4aad6d466287c0ca11d05d.
  2. Unzip and go to the test_high_cpu_load folder. Confirm that the following two files exist:
    • test_high_cpu_load_x64
      test procedures applicable to 64-bit systems.
    • test_high_cpu_load.cpp
      the source code of the test program. If your Ubuntu instance is a 32-bit system, you can use gcc to compile this file and run the compiled 32-bit program for testing.
  3. Remotely connect to an Ubuntu instance. For more information, see Overview.
  4. Upload the corresponding test program to the Ubuntu instance based on the number of system digits of the Ubuntu instance.
    Description: You can run the getconf LONG_BIT command to query the number of system digits.
  5. Run the following command to attach execution permissions to the test program:
    chmod +x test_high_cpu_load_x64
    Note: In this example, a 64-bit test program is used. If your Ubuntu instance is a 32-bit system, change the file name of the 32-bit program.
  6. Run the test program. At the same time, open a new terminal window, execute the top command, observe for 2 to 3 minutes, and record the average load of the system.
  7. According to the test results, deal with the following situations:
    • If the average system load is greater than 0.5
      , the test program continuously calls the usleep() function. If the average system load is greater than 0.5, the kernel version of the instance has the kernel defects described in this article. Therefore, the high CPU usage of Server Guard processes is also caused by kernel defects. You can refer to the following solutions:
      • We recommend that you change the version
        of Ubuntu. You can use other versions of Ubuntu, such as Ubuntu 16. For more information about how to change the operating system of an ECS instance, see Change the operating system.
      • Wait for Ubuntu to fix the defect
        . Ubuntu will fix the defect in the future. At that time, you can upgrade your system to solve this problem.
      • Ignore this issue
        for the time being. This issue is due to a defect in the CPU usage calculation method, which will not affect you. You can choose to ignore this issue.
      • Recompile and upgrade the system kernel
        Warning: This solution is not recommended and is risky. Proceed with caution. Make sure that the ECS instance is backed up by using snapshots before the operation.
        If you have kernel upgrade experience and need to solve this problem urgently, you can recompile and upgrade the system kernel through patches. For more information, see Solutions provided by third-party communities.
    • The average system load is less than 0.5.
      • If the average system load is less than 0.5 and the CPU usage of the Server Guard process is high, record the system load information and submit a ticket to contact Alibaba Cloud technical support.

Applicable scope

  • Elastic Compute Service (ECS)
  • Server Guard