In operating systems, a process Identifier (PID) is a number used to uniquely identify a process, which can be reused after a process is terminated. Although it is difficult to exhaust PIDs, accidental exhaustion can still occur. If PIDs are accidentally exhausted, new processes cannot be created and services may be suspended, which affects business capabilities. It is necessary to perform a PID insufficiency drill to simulate a PID exhaustion or service suspension scenario and test the high availability of your services.
Implementation
A PID insufficiency drill uses the ACS-ECS-TaskLimit Cloud Assistant plug-in to create processes by using the fork() method until the smaller one of the pid_max and threads-max values is reached, thereby preventing the creation of new processes or threads.
Procedure
Prerequisites
Cloud Assistant Agent is installed on the ECS instance for which you want to perform a drill.
The status of Cloud Assistant is Normal on the ECS instance. For more information, see View the status of Cloud Assistant and handle anomalies.
Inject a fault
Log on to the ECS instance.
For more information, see Use Workbench to connect to a Linux instance over SSH.
Run the
ACS-ECS-TaskLimitCloud Assistant plug-in as a user with sudo privileges.ImportantThe
ACS-ECS-TaskLimitCloud Assistant plug-in affects the creation of new business processes but does not affect existing business processes.sudo acs-plugin-manager --exec --plugin ACS-ECS-TaskLimit --params injectThe following output shows indicates that the
ACS-ECS-TaskLimitCloud Assistant plug-in runs.
Check whether the fault is injected.
If the fault is injected, new business processes fail to be created with the following error message:
-bash: fork: retry: Resource temporarily unavailable.
Recover from the fault
You can use one of the following methods to remove the injected fault:
Method 1 (recommended): Restart the instance. For more information, see Restart an instance.
Method 2: Run the following recovery command.
ImportantThe recovey command may fail to run because of system blockage.
sudo acs-plugin-manager --exec --plugin ACS-ECS-TaskLimit --params recover