All Products
Search
Document Center

The system load is high after the Ubuntu 18.04 process is started on an ECS instance of the Server Guard version

Last Updated: Jun 30, 2020

Disclaimer: this document may contain information about third-party products that are for reference only. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

Description

After the Ubuntu 18.04 process (AliYunDun) has been started on an ECS instance of the Server Guard version, the average system load is high. After the Server Guard process (AliYunDun) is disabled, the average system load returns to normal.

Causes

In some kernel (Kernal) versions of Ubuntu systems, nanosleep() function and usleep() the CPU usage calculation method of the function has defects. When a program process calls the two functions, the average system load is high. Known kernel versions that contain this defect are as follows:

  • 4.15.0-72
  • 4.15.0-74

Server Guard calls the preceding functions. Therefore, after the Server Guard process is started, the average system load on the ECS instance is high. Ubuntu systems using the preceding kernel versions have the same defects, such as Ubuntu 18.04. For more information about this defect, see the following documentation:

Workaround

Alibaba Cloud reminds you that:

  • When you perform operations that have risks, such as modifying instances or data, check the disaster recovery and fault tolerance capabilities of the instances to ensure data security.
  • Before you modify the configurations and data of instances including but not limited to ECS and RDS instances, we recommend that you create snapshots or enable RDS log backup.
  • If you have authorized or submitted security information such as the logon account and password in the Alibaba Cloud Management console, we recommend that you modify such information in a timely manner.

This defect may exist in the known kernel versions described above. Other versions of the kernel may also have this defect. To further check whether the Server Guard process is affected by this bug, run the test program on your Ubuntu instance and handle the issue based on the test result. Details are as follows:

Description: to ensure the accuracy of the test and avoid affecting other service programs, we recommend that you generate a custom image for your Ubuntu instance. Create a temporary ECS instance (pay-as-you-go) for testing from the custom image. Release the temporary ECS instance after the test is completed. For more information about how to create a custom image, see create a custom image.

  1. Download the test program test_high_cpu_load.zip.
    Description: test_high_cpu_load.zip whose MD5 value is cost.
  2. Decompress the package and go to the test_high_cpu_load folder. Verify that the following two files exist.
    • test_high_cpu_load_x64
      Test program for 64-bit system.
    • test_high_cpu_load.cpp
      Test the source code of the program. If your Ubuntu instance is on a 32-bit system, you can compile this file through gcc and then run the compiled 32-bit program for testing.

  3. Connect to the Ubuntu instance. Connection type. For more information, see connection mode overview.
  4. Based on the number of Ubuntu instances, upload a program to the Ubuntu instances.
    Description: you can use getconf LONG_BIT the number of bits in the system.
  5. Run the following commands to grant the execute permission to the test program:
    chmod +x test_high_cpu_load_x64
    Description: The following takes a 64-bit test program as an example. If your Ubuntu instance is a 32-bit system, replace with the file name of a 32-bit program.
  6. Run the test program. Open a new terminal window and run the following command: top observe the statistics for 2 to 3 minutes, and record the average load.

  7. Based on the test results, perform the following operations:
    • The average system load is greater than 0.5
      Test program calls continuously usleep() functions. If the average system load is greater than 0.5, the instance kernel version has the following defects: The high CPU usage of the server guard process is also caused by kernel defects. You can refer to the following solutions:
      • Change the Ubuntu system version
        This solution is recommended, you can use other versions of Ubuntu systems, such as Ubuntu 16. For more information about how to change the operating system of an ECS instance, see replace the operating system.
      • Wait for Ubuntu to officially fix the defect
        Ubuntu will fix this defect in the future, when you can solve it by upgrading the system.
      • Ignore this issue for the time being
        This issue is caused by a bug in the CPU utilization calculation method and does not affect your business. You can choose to ignore this issue.
      • Recompile and upgrade the system kernel
        Warning: this solution is not recommended because it has high risks. Proceed with caution. Be sure to back up the ECS instance from a snapshot before you perform the operation.
        If you have experience in Kernel upgrade and need to solve this problem urgently, you can recompile and upgrade the system kernel through the patch. For more information, see solutions from third-party Communities.

    • The average system load is less than 0.5
      If the average system load is less than 0.5 and the CPU usage of the Server Guard process is high, record the system load information and submit a ticket contact Alibaba Cloud technical support personnel.

Application scope

  • ECS
  • Server Guard