Elastic Compute Service (ECS) publishes state change events through Cloud Monitor whenever an instance transitions between states -- whether triggered by console actions, API calls, SDK operations, Auto Scaling policies, overdue payments, or system exceptions. By routing these events to Simple Message Queue (formerly MNS), you can build automated responses such as logging instance lifecycles, restarting stopped instances, or removing spot instances from a load balancer before release.
This topic demonstrates three automation scenarios using Python and an MNS message queue as the event delivery mechanism.
How it works
Cloud Monitor captures all ECS instance state change events and delivers them to a Simple Message Queue (formerly MNS) queue. A Python consumer polls the queue, parses each event, and dispatches it to registered listeners by event name (for example, Instance:StateChange or Instance:PreemptibleInstanceInterruption). Each listener implements a specific automation action.
Cloud Monitor supports four methods for processing event-triggered alerts:
| Method | Use case |
|---|---|
| Simple Message Queue (formerly MNS) | Asynchronous processing with custom consumer logic (used in this topic) |
| Function Compute (FC) | Serverless event handling without managing infrastructure |
| URL callback | Forwarding events to an external HTTP endpoint |
| Simple Log Service (SLS) | Centralized event logging and analysis |
Prerequisites
Before you begin, make sure that you have:
A queue (for example,
ecs-cms-event) created in the Simple Message Queue (formerly MNS) consoleA system event-triggered alert rule configured in the Cloud Monitor console (Old version)
Python 3.7 or later installed
The MNS SDK for Python and the ECS SDK for Python installed
The
ALIBABA_CLOUD_ACCESS_KEY_IDandALIBABA_CLOUD_ACCESS_KEY_SECRETenvironment variables set
For SDK installation details, see Install Python SDK. For other programming languages, see MNS SDK reference and ECS SDK overview.
Step 1: Create the configuration file
Define the MNS endpoint, credentials, region, and queue name in a shared configuration file. All three scenarios reuse this configuration.
import os
class Conf:
endpoint = 'http://<id>.mns.<region>.aliyuncs.com/'
access_key = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
access_key_secret = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
region_id = 'cn-beijing'
queue_name = 'ecs-cms-event'
vserver_group_id = '<your-vserver-group-id>'Replace the following placeholders with actual values:
| Placeholder | Description | Example |
|---|---|---|
<id> | Your Alibaba Cloud account ID | 1234567890 |
<region> | The region of your MNS queue | cn-beijing |
<your-vserver-group-id> | The vServer group ID in Server Load Balancer (SLB), required only for Scenario 3 | rsp-bp1abc... |
Find the MNS endpoint on the Queue Details page under the Endpoint section. Both Internet Access and Internal Access endpoints are available.
Never hardcode AccessKey credentials in source code. The preceding example reads credentials from environment variables. For production workloads, use Security Token Service (STS) temporary credentials instead.
Step 2: Build the MNS consumer
Create a reusable MNS client that polls the queue, dispatches events to registered listeners, and deletes processed messages. All three scenarios share this consumer.
# -*- coding: utf-8 -*-
import json
import logging
from mns.mns_exception import MNSExceptionBase
from mns.account import Account
from . import Conf
class MNSClient(object):
def __init__(self):
self.account = Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret)
self.queue_name = Conf.queue_name
self.listeners = dict()
def regist_listener(self, listener, eventname='Instance:StateChange'):
if eventname in self.listeners.keys():
self.listeners.get(eventname).append(listener)
else:
self.listeners[eventname] = [listener]
def run(self):
queue = self.account.get_queue(self.queue_name)
while True:
try:
message = queue.receive_message(wait_seconds=5)
event = json.loads(message.message_body)
if event['name'] in self.listeners:
for listener in self.listeners.get(event['name']):
listener.process(event)
queue.delete_message(receipt_handle=message.receipt_handle)
except MNSExceptionBase as e:
if e.type == 'QueueNotExist':
logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name)
else:
logging.error('No Message, continue waiting')
class BasicListener(object):
def process(self, event):
passThe consumer uses long polling (wait_seconds=5) to retrieve messages. When a message arrives, it parses the JSON body, looks up registered listeners by event name, and calls each listener's process method. After processing, it deletes the message from the queue.
Event payload structure
State change events use the following structure:
| Field | Description | Example |
|---|---|---|
event['name'] | Event type identifier | Instance:StateChange |
event['content']['state'] | Current instance state | Created, Starting, Running, Stopped, Deleted |
event['content']['resourceId'] | ECS instance ID | i-bp1abc... |
Scenario 1: Log instance creation and release events
Track the full lifecycle of ECS instances by logging Created and Deleted state changes. Released instances no longer appear in the ECS console -- recording these events provides an audit trail for capacity planning and post-incident analysis.
Create the listener
# -*- coding: utf-8 -*-
import logging
from .mns_client import BasicListener
class ListenerLog(BasicListener):
def process(self, event):
state = event['content']['state']
resource_id = event['content']['resourceId']
if state == 'Created':
logging.info(f'The instance {resource_id} state is {state}')
elif state == 'Deleted':
logging.info(f'The instance {resource_id} state is {state}')Run the consumer
mns_client = MNSClient()
mns_client.regist_listener(ListenerLog())
mns_client.run()Verify the result
Create or release an ECS instance in the console.
Check the consumer logs for a message like
The instance i-bp1abc... state is Created.
In production, store events in a database or Simple Log Service (SLS) instead of logging to stdout. This enables searching and auditing historical instance lifecycle data.
Scenario 2: Restart stopped instances automatically
An ECS instance can stop unexpectedly due to system exceptions or overdue payments. This listener monitors for Stopped events and calls the ECS StartInstance API to restart the affected instance.
Create the listener
Reuse the MNS consumer from Step 2 and register a new listener:
# -*- coding: utf-8 -*-
import logging
from alibabacloud_ecs20140526.client import Client as Ecs20140526Client
from alibabacloud_ecs20140526.models import StartInstanceRequest
from alibabacloud_tea_openapi.models import Config
from .config import Conf
from .mns_client import BasicListener
class ECSClient(object):
def __init__(self, client):
self.client = client
def start_instance(self, instance_id):
logging.info(f'Start instance {instance_id} ...')
request = StartInstanceRequest(
instance_id=instance_id
)
self.client.start_instance(request)
class ListenerStart(BasicListener):
def __init__(self):
ecs_config = Config(
access_key_id=Conf.access_key,
access_key_secret=Conf.access_key_secret,
endpoint=f'ecs.{Conf.region_id}.aliyuncs.com'
)
client = Ecs20140526Client(ecs_config)
self.ecs_client = ECSClient(client)
def process(self, event):
detail = event['content']
instance_id = detail['resourceId']
if detail['state'] == 'Stopped':
self.ecs_client.start_instance(instance_id)Run the consumer
mns_client = MNSClient()
mns_client.regist_listener(ListenerStart())
mns_client.run()Verify the result
Stop a running ECS instance manually.
Check the consumer logs for
Start instance i-bp1abc... ....Confirm the instance state returns to
Runningin the ECS console.
In production, monitor subsequent state transitions (Starting,Running, orStopped) after issuing the start command. Use timers and retry counters to handle cases where the restart fails.
Scenario 3: Remove a spot instance from SLB before release
About five minutes before a spot instance is reclaimed, Cloud Monitor sends an Instance:PreemptibleInstanceInterruption event. Use this window to proactively remove the instance from the backend server group of a Server Load Balancer (SLB) instance, rather than waiting for SLB to detect the release passively.
Create the listener
Reuse the MNS consumer from Step 2 and register a listener for the spot instance interruption event:
# -*- coding: utf-8 -*-
from alibabacloud_slb20140515.client import Client as Slb20140515Client
from alibabacloud_slb20140515.models import RemoveVServerGroupBackendServersRequest
from alibabacloud_tea_openapi.models import Config
from .config import Conf
from .mns_client import BasicListener
class SLBClient(object):
def __init__(self):
self.client = self.create_client()
def create_client(self):
config = Config()
config.access_key_id = Conf.access_key
config.access_key_secret = Conf.access_key_secret
config.endpoint = 'slb.aliyuncs.com'
return Slb20140515Client(config)
def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id):
request = RemoveVServerGroupBackendServersRequest(
region_id=Conf.region_id,
vserver_group_id=vserver_group_id,
backend_servers="[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]"
)
response = self.client.remove_vserver_group_backend_servers(request)
return response
class ListenerSLB(BasicListener):
def __init__(self, vserver_group_id):
self.slb_caller = SLBClient()
self.vserver_group_id = Conf.vserver_group_id
def process(self, event):
detail = event['content']
instance_id = detail['instanceId']
if detail['action'] == 'delete':
self.slb_caller.remove_vserver_group_backend_servers(self.vserver_group_id, instance_id)The event name for spot instance interruptions differs from state change events. Register this listener with the event name Instance:PreemptibleInstanceInterruption:
mns_client = MNSClient()
mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption')
mns_client.run()Spot instance interruption event structure
Spot instance interruption events use different field names than state change events:
| Field | Description | Example |
|---|---|---|
event['name'] | Event type identifier | Instance:PreemptibleInstanceInterruption |
event['content']['instanceId'] | ECS instance ID | i-bp1abc... |
event['content']['action'] | Interruption action | delete |
Verify the result
Wait for a spot instance interruption event (or simulate one in a test environment).
Confirm the instance is removed from the SLB vServer group before release.
In production, request a replacement spot instance and attach it to the SLB instance immediately to maintain service availability.
Run all listeners together
Register all three listeners on a single MNS consumer to handle multiple event types simultaneously:
mns_client = MNSClient()
# Scenario 1: Log lifecycle events
mns_client.regist_listener(ListenerLog())
# Scenario 2: Auto-restart stopped instances
mns_client.regist_listener(ListenerStart())
# Scenario 3: Remove spot instances from SLB before release
mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption')
mns_client.run()Event type reference
| Event name | Trigger | Key fields |
|---|---|---|
Instance:StateChange | Instance state transitions (Created, Starting, Running, Stopped, Deleted) | content.state, content.resourceId |
Instance:PreemptibleInstanceInterruption | Spot instance scheduled for release (~5 min before reclamation) | content.instanceId, content.action |