To ensure stability for your business, CPU utilization must be maintained within a reasonable range. Excessively high CPU utilization can cause business latency, or even interruption. This topic describes how to inject high CPU utilization faults to an Elastic Compute Service (ECS) instance to test how the business system responds to specific CPU loads, inspect system recovery capabilities, and verify the effectiveness of monitoring and alert mechanisms. You can then develop response strategies based on the drill results. This ensures that the system can quickly resume normal operation when high CPU utilization occurs in the production environment, reducing the risk of business interruption.
Implementation
A high CPU utilization drill uses the ecs-fault-highcpu Cloud Assistant plugin to start the AliFaultHighCpu injection process to consume CPU time slices at a specific duty cycle.
Procedure
Prerequisites
Cloud Assistant Agent is installed on the ECS instance for which you want to perform a drill.
The status of Cloud Assistant is Normal on the ECS instance. For more information, see View the status of Cloud Assistant and handle anomalies.
Inject a fault
As a user with sudo privileges, run the
ecs-fault-highcpuCloud Assistant plugin.sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params inject,[cpu-percent=paramA],[cpu-list=paramB]The optional fault injection parameters are enclosed in brackets (
[]).cpu-percent: the target CPU utilization. If you leave this parameter empty, the default value 100 is used.
NoteThe cpu-percent parameter specifies the CPU utilization of the injection process. The total CPU utilization is also affected by other processes.
cpu-list: binds loads to specific vCPUs. For example, cpu-list=0-2/4 specifies that loads are bound to core 0, core 1, core 2, and core 4. If you leave this parameter empty, loads are bound to all vCPUs.
Check whether a fault is injected.
On the ECS instance, run the
topcommand. If the CPU utilization increases, a fault is injected. The sum of CPU utilizations in kernel mode (sy) and user mode (us) is approximately equal to the specified target CPU utilization.
View the CPU utilization chart provided by CloudMonitor. If the CPU utilization increases after the plugin is run, a fault is injected.

Recover from the fault
Use the following methods to resume normal instance operation:
Method 1 (recommended): Run the following recovery command on the ECS instance and check whether the CPU utilization decreases to the level before fault injection:
sudo acs-plugin-manager --exec --plugin ecs-fault-highcpu --params recoverThe following command output indicates that the CPU utilization has decreased to the level prior to fault injection, and the system has resumed normal operation.

Method 2: Terminate the
AliFaultHighCpuprocess.To avoid affecting subsequent fault injections to the ECS instance after you terminate the process, run the recovery command described in Method 1.
sudo kill <AliFaultHighCpu PID>