This tutorial is a practical guide. It shows you how to create simulated Elastic Compute Service (ECS) system events using an API or the command-line interface (CLI). You can use these events to test your alerting pipelines, self-healing scripts, and emergency plans. This ensures that your automated O&M system is reliable.
Scenarios
Automated O&M systems often rely on ECS system events, such as scheduled reboots or system failures, to trigger responses. These responses can include automatic alerts, removing an instance from a load balancer, or starting a data backup. However, real system events are rare and unpredictable. This makes it hard to run full end-to-end tests of automated responses before you deploy to a production environment.
The simulated ECS system event feature solves this problem by allowing you to create simulated event notifications for specific instances. These notifications have the same structure as real events but do not perform any actual O&M operations on the instance, such as reboots or stops. This allows for safe and efficient drills of your downstream automated systems. You can create simulated events to test the following scenarios:
Scenario | Validation goal |
Alerting pipeline test | Verify that Cloud Monitor, EventBridge, text message, or DingTalk notifications are triggered correctly. |
Emergency plan drill | Verify the response time and operational standards of the O&M team. |
Automated script validation | Verify that self-healing scripts, such as scripts that automatically restart services or switch traffic, run as expected. |
Simulated events are not actual operations: Simulated events only trigger event notifications, such as status changes and message pushes. They do not perform operations such as rebooting, stopping, or releasing an instance. A complete record of the event is kept in the console and in ActionTrail.
Solution architecture
The core of this solution is to inject a simulated event into a target ECS instance by calling the CreateSimulatedSystemEvents API. The flow is as follows:
Event injection: You can use the CLI or a software development kit (SDK) to set the event type and scheduled execution time to create a simulated event for an instance.
Generation and distribution: The ECS backend generates a notification with the same structure as a real event. It distributes the notification through standard channels such as EventBridge or Cloud Monitor.
Automated response: Downstream systems, such as Function Compute (FC), webhooks, or O&M scripts, listen for the event and trigger predefined business logic.
Verification and closure: You can query the event status to confirm that it is progressing correctly. After the drill, you can cancel the event using the API to complete the process.
Implementation steps
The simulated system event feature is currently available only through an API, the CLI, or an SDK. The ECS console does not provide an interface to create or cancel these events.
You can use Alibaba Cloud OpenAPI Explorer for online calls and debugging. You can run the API operation directly in OpenAPI Explorer. Upon successful execution, OpenAPI Explorer automatically generates SDK code examples.
This section guides you through a simulated event drill using the Cloud Assistant CLI. The drill includes preparing the environment, creating an event, verifying its status, and cleaning up resources.
Preparations: Configure permissions and tools
Before you start, you must grant the necessary API permissions to the operating account and make sure that the Cloud Assistant CLI is installed and configured.
Configure minimum RAM permissions
Create a custom Resource Access Management (RAM) policy. Grant only the permissions required to create, cancel, and query simulated events. Log on to the RAM console. Create a custom policy named
ECSSimulatedEventPolicyand add the following policy content. Then, grant this policy to the RAM user that performs the operations.{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "ecs:CreateSimulatedSystemEvents", "ecs:CancelSimulatedSystemEvents", "ecs:DescribeInstanceHistoryEvents" ], "Resource": "*" } ] }Install and configure the Cloud Assistant CLI
Step 1: Create a simulated system event
Simulatable event types: Not all system events can be simulated. Supported event types include instance reboots, stops, and redeployments caused by system maintenance or system failures. For a complete list, see the valid values for the EventType parameter in CreateSimulatedSystemEvents.
Generate the scheduled execution time in UTC format.
The
NotBeforeparameter must be in UTC (ISO 8601 format). If you use your local time, the event will trigger at an unexpected time. For example, an event will be delayed by 8 hours in the China time zone. You can use the following cross-platform commands to generate a UTC time that is 10 minutes in the future:Linux
date -u -d '+10 minutes' +"%Y-%m-%dT%H:%M:%SZ"macOS
date -u -v+10m +"%Y-%m-%dT%H:%M:%SZ"Windows
(Get-Date).AddMinutes(10).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss'Z'")Run the following command to create a simulated event.
Make sure that the target instance is in the Running state when you create the event.
# Set the time variable NOT_BEFORE=$(date -u -d '+10 minutes' +"%Y-%m-%dT%H:%M:%SZ") # Create the simulated event aliyun ecs CreateSimulatedSystemEvents \ --RegionId cn-hangzhou \ --InstanceId.1 i-bp1xxxxxxxxx \ --EventType SystemMaintenance.Reboot \ --NotBefore $NOT_BEFOREUpon successful execution, the system returns a response similar to the following. Record the
EventId. You will need this ID for subsequent verification and cancellation operations.{ "EventIdSet": { "EventId": [ "e-t4nxxxxxxxxx" ] }, "RequestId": "A1B2C3D4-E5F6-7890-1234-ABCDEF123456" }
The lifecycle of a simulated system event varies by event type:
SystemMaintenance events, after they are created, typically go through the following states:
Scheduled: The simulated event automatically enters this state after it is created.
Executed: The event automatically transitions to this state at the specified time (NotBefore) if no manual intervention occurs.
Canceled: The event enters this state after you call the CancelSimulatedSystemEvents operation.
Avoided: This state applies only to the SystemMaintenance.Reboot event type. The event transitions to this state if you manually restart the instance before the specified time.
SystemFailure or InstanceFailure events simulate sudden failures. After they are created, they may skip the Scheduled state and go directly to the Executing or Executed state.
Step 2: Verify event notifications and status
After an event is created, the system publishes notifications through multiple channels. This step shows you how to verify that these notifications are received correctly and query the current status of the event. This is a key step to ensure that your automated scripts are triggered.
Query the event status
Use the
EventIdthat you recorded in the previous step to query the detailed status of the event using theDescribeInstanceHistoryEventsAPI operation.# Use the previously recorded EventId aliyun ecs DescribeInstanceHistoryEvents \ --RegionId cn-hangzhou \ --EventId.1 e-t4nxxxxxxxxx \ --output tableThe expected output shows that the event status is
Scheduled. You can also go to the Events page in the ECS console to view the event record.Check downstream notifications
Based on your system configuration, go to the corresponding service console to check whether the event was successfully delivered:
EventBridge: Check the event trace for the corresponding event bus and rule.
Cloud Monitor: On the System Event page of Event Monitoring, filter by instance ID to view the event record.
Message Service (MNS), DingTalk, or email: Check whether you received an alert notification that was triggered by Cloud Monitor or EventBridge.
Parse the event message structure
The key to automated system integration is to correctly parse the event notification. The following code provides an example of an event message body received through EventBridge. The message contains core information, such as the event ID, instance ID, event type, and status. You can write an automated parsing script based on this structure:
{ "id": "d8134431-b269-4f17-9157-xxxxxxxxxxxx", "source": "acs.ecs", "specversion": "1.0", "type": "ecs:SystemEvent:Scheduled", "datacontenttype": "application/json;charset=utf-8", "subject": "acs:ecs:cn-hangzhou:123456789012****:instance/i-bp1xxxxxxxxx", "time": "2026-02-28T09:50:00Z", "aliyunpublishtime": "2026-02-28T09:50:00.123Z", "aliyuneventbusname": "default", "data": { "eventId": "e-t4nxxxxxxxxx", "instanceId": "i-bp1xxxxxxxxx", "eventType": "SystemMaintenance.Reboot", "eventStatus": "Scheduled", "notBefore": "2026-03-01T10:00:00Z", "publishTime": "2026-02-28T09:50:00Z", "regionId": "cn-hangzhou", "eventCategory": "SystemMaintenance" } }
Step 3: Cancel the event
After the drill is complete, you must cancel all unexecuted simulated events. This prevents continuous interference with your monitoring system and avoids future false positives.
Use the CancelSimulatedSystemEvents API operation and the EventId that you recorded earlier to cancel the event.
aliyun ecs CancelSimulatedSystemEvents \
--RegionId cn-hangzhou \
--EventId.1 e-t4nxxxxxxxxxA successful response returns the RequestId for the operation, which indicates that the event was successfully canceled. If you query the event again, its status is Canceled.
Advanced practice: Drill for an automated response
This section provides a complete drill script template. The template simulates an end-to-end automated O&M scenario that includes creating an event, verifying its status, performing a simulated self-healing operation, and finally canceling the event and cleaning up resources.
#!/bin/bash
# End-to-end drill script for ECS simulated events
set -e # Exit immediately if a command fails
# --- Configuration parameters ---
INSTANCE_ID="${1:-i-bp1xxxxxxxxx}" # Get the instance ID from the first argument, or use the default value.
REGION="cn-hangzhou"
EVENT_TYPE="SystemMaintenance.Reboot"
echo "=== Starting ECS simulated event drill ==="
echo "Target instance ID: $INSTANCE_ID"
echo "Event type: $EVENT_TYPE"
echo "Region: $REGION"
# 1. Create a simulated event scheduled to run in 10 minutes.
NOT_BEFORE=$(date -u -d '+10 minutes' +"%Y-%m-%dT%H:%M:%SZ")
echo "Scheduled execution time (UTC): $NOT_BEFORE"
EVENT_ID=$(aliyun ecs CreateSimulatedSystemEvents \
--RegionId $REGION \
--InstanceId.1 $INSTANCE_ID \
--EventType $EVENT_TYPE \
--NotBefore "$NOT_BEFORE" \
--output json | jq -r '.EventIdSet.EventId[0]')
if [ -z "$EVENT_ID" ]; then
echo "Error: Failed to create event!"
exit 1
fi
echo "✓ Event created. ID: $EVENT_ID"
# 2. Wait 5 seconds, then verify the event status.
echo "=== Waiting 5 seconds to verify event status... ==="
sleep 5
aliyun ecs DescribeInstanceHistoryEvents \
--RegionId $REGION \
--EventId.1 $EVENT_ID \
--output table
echo "✓ Status verification complete. Check if the output is 'Scheduled'."
# 3. Simulate the triggering of a self-healing script.
echo "=== Simulating automated system response ==="
echo "System maintenance event detected. Executing predefined response..."
echo " - [Simulated] Sending alert notification to DingTalk group"
echo " - [Simulated] Adding 'maintenance_in_progress' tag to the instance"
echo " - [Simulated] Calling SLB API to set instance weight to 0"
echo "✓ Simulated response complete."
# 4. Clean up drill resources.
echo "=== Cleaning up drill resources ==="
aliyun ecs CancelSimulatedSystemEvents \
--RegionId $REGION \
--EventId.1 $EVENT_ID
echo "✓ Event canceled."
echo "=== Drill complete ==="Costs and risks
Costs
The simulated system event feature is free of charge. However, other cloud products used during the drill, such as EventBridge, Function Compute (FC), and Message Service (MNS), are billed at their standard rates.
Risks and limitations
Operation audit: All operations to create and cancel simulated events are fully recorded by ActionTrail. This ensures that all drill activities are traceable and auditable.
Feature limitations: Simulated events only trigger event notifications. They do not perform any actual physical operations on the instance, such as rebooting or stopping it. An instance can have only one active simulated event at a time.
API rate limiting: Calls to related APIs are subject to quota limits. High-frequency drill requests may trigger API rate limiting. For more information about the specific limits, see Rate limiting.