All Products
Search
Document Center

Elastic Compute Service:High CPU utilization drill

Last Updated:Feb 14, 2026

CPU utilization is a key indicator of business system health. To ensure stability, CPU utilization must remain within a reasonable range. Excessively high CPU utilization can cause service latency or even outages. You can inject high CPU utilization faults into an ECS instance to test how the business system responds to specific CPU loads, evaluate system recovery capabilities, and verify the effectiveness of monitoring and alert mechanisms. Based on the drill results, you can develop response strategies. This ensures that the system resumes normal operation quickly when high CPU utilization occurs in the production environment, reducing the risk of business interruption.

How it works

This solution uses the Cloud Assistant plugin ecs-fault-highcpu. The plugin starts the AliFaultHighCpu process to consume CPU time slices at a specified duty cycle.

Instructions

Prerequisites

Inject a fault

  1. As a user with sudo access privileges, run the ecs-fault-highcpu Cloud Assistant plugin.

    sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params inject,[cpu-percent=paramA],[cpu-list=paramB]

    The parameters in [] are optional.

    • cpu-percent (optional): The target CPU utilization percentage. If not specified, the default value is 100.

      Note

      The cpu-percent value represents the CPU utilization of the injection process. The instance's total CPU utilization also includes the load from other running processes.

    • cpu-list (optional): The specific vCPU cores to target. For example, cpu-list=0-2/4 applies the load to vCPU cores 0, 1, 2, and 4. If not specified, the load is applied to all vCPU cores.

  2. Verify that the fault injection was successful.

    • On the ECS instance, run the top command. A successful injection increases CPU utilization. The sum of CPU time spent in kernel mode (sy) and user mode (us) should approximate the specified cpu-percent value.

      image

    • In the CloudMonitor CPU utilization chart, verify that CPU utilization increases after the fault is injected.

      image

Recover from the fault

Use one of the following methods to recover the ECS instance.

  • Method 1 (Recommended): Run the fault recovery command on the ECS instance and verify that CPU utilization drops to its pre-injection level.

    sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params recover

    As shown in the figure below, CPU utilization has dropped to its pre-injection level, indicating that the system has returned to a Normal state.

    image

  • Method 2: Terminate the process named AliFaultHighCpu.

    To prevent issues with subsequent fault injections, run the recovery command from Method 1 after you terminate the AliFaultHighCpu process.

    sudo kill <AliFaultHighCpu PID>