All Products
Search
Document Center

Elastic Compute Service:High CPU utilization drill

Last Updated:Apr 14, 2025

To ensure stability for your business, CPU utilization must be maintained within a reasonable range. Excessively high CPU utilization can cause business latency, or even interruption. This topic describes how to inject high CPU utilization faults to an Elastic Compute Service (ECS) instance to test how the business system responds to specific CPU loads, inspect system recovery capabilities, and verify the effectiveness of monitoring and alert mechanisms. You can then develop response strategies based on the drill results. This ensures that the system can quickly resume normal operation when high CPU utilization occurs in the production environment, reducing the risk of business interruption.

Implementation

A high CPU utilization drill uses the ecs-fault-highcpu Cloud Assistant plugin to start the AliFaultHighCpu injection process to consume CPU time slices at a specific duty cycle.

Procedure

Prerequisites

Inject a fault

  1. As a user with sudo privileges, run the ecs-fault-highcpu Cloud Assistant plugin.

    sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params inject,[cpu-percent=paramA],[cpu-list=paramB]

    The optional fault injection parameters are enclosed in brackets ([]).

    • cpu-percent: the target CPU utilization. If you leave this parameter empty, the default value 100 is used.

      Note

      The cpu-percent parameter specifies the CPU utilization of the injection process. The total CPU utilization is also affected by other processes.

    • cpu-list: binds loads to specific vCPUs. For example, cpu-list=0-2/4 specifies that loads are bound to core 0, core 1, core 2, and core 4. If you leave this parameter empty, loads are bound to all vCPUs.

  2. Check whether a fault is injected.

    • On the ECS instance, run the top command. If the CPU utilization increases, a fault is injected. The sum of CPU utilizations in kernel mode (sy) and user mode (us) is approximately equal to the specified target CPU utilization.

      image

    • View the CPU utilization chart provided by CloudMonitor. If the CPU utilization increases after the plugin is run, a fault is injected.

      image

Recover from the fault

Use the following methods to resume normal instance operation:

  • Method 1 (recommended): Run the following recovery command on the ECS instance and check whether the CPU utilization decreases to the level before fault injection:

    sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params recover

    The following command output indicates that the CPU utilization has decreased to the level prior to fault injection, and the system has resumed normal operation.

    image

  • Method 2: Terminate the AliFaultHighCpu process.

    To avoid affecting subsequent fault injections to the ECS instance after you terminate the process, run the recovery command described in Method 1.

    sudo kill <AliFaultHighCpu PID>