All Products
Search
Document Center

Elastic Compute Service:Automate O&M based on ECS instance state change events

Last Updated:Feb 27, 2026

Elastic Compute Service (ECS) publishes state change events through Cloud Monitor whenever an instance transitions between states -- whether triggered by console actions, API calls, SDK operations, Auto Scaling policies, overdue payments, or system exceptions. By routing these events to Simple Message Queue (formerly MNS), you can build automated responses such as logging instance lifecycles, restarting stopped instances, or removing spot instances from a load balancer before release.

This topic demonstrates three automation scenarios using Python and an MNS message queue as the event delivery mechanism.

How it works

Cloud Monitor captures all ECS instance state change events and delivers them to a Simple Message Queue (formerly MNS) queue. A Python consumer polls the queue, parses each event, and dispatches it to registered listeners by event name (for example, Instance:StateChange or Instance:PreemptibleInstanceInterruption). Each listener implements a specific automation action.

Cloud Monitor supports four methods for processing event-triggered alerts:

MethodUse case
Simple Message Queue (formerly MNS)Asynchronous processing with custom consumer logic (used in this topic)
Function Compute (FC)Serverless event handling without managing infrastructure
URL callbackForwarding events to an external HTTP endpoint
Simple Log Service (SLS)Centralized event logging and analysis

Prerequisites

Before you begin, make sure that you have:

For SDK installation details, see Install Python SDK. For other programming languages, see MNS SDK reference and ECS SDK overview.

Step 1: Create the configuration file

Define the MNS endpoint, credentials, region, and queue name in a shared configuration file. All three scenarios reuse this configuration.

import os

class Conf:
    endpoint = 'http://<id>.mns.<region>.aliyuncs.com/'
    access_key = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
    access_key_secret = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
    region_id = 'cn-beijing'
    queue_name = 'ecs-cms-event'
    vserver_group_id = '<your-vserver-group-id>'

Replace the following placeholders with actual values:

PlaceholderDescriptionExample
<id>Your Alibaba Cloud account ID1234567890
<region>The region of your MNS queuecn-beijing
<your-vserver-group-id>The vServer group ID in Server Load Balancer (SLB), required only for Scenario 3rsp-bp1abc...
Find the MNS endpoint on the Queue Details page under the Endpoint section. Both Internet Access and Internal Access endpoints are available.
Important

Never hardcode AccessKey credentials in source code. The preceding example reads credentials from environment variables. For production workloads, use Security Token Service (STS) temporary credentials instead.

Step 2: Build the MNS consumer

Create a reusable MNS client that polls the queue, dispatches events to registered listeners, and deletes processed messages. All three scenarios share this consumer.

# -*- coding: utf-8 -*-
import json
import logging
from mns.mns_exception import MNSExceptionBase
from mns.account import Account
from . import Conf


class MNSClient(object):
    def __init__(self):
        self.account = Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret)
        self.queue_name = Conf.queue_name
        self.listeners = dict()

    def regist_listener(self, listener, eventname='Instance:StateChange'):
        if eventname in self.listeners.keys():
            self.listeners.get(eventname).append(listener)
        else:
            self.listeners[eventname] = [listener]

    def run(self):
        queue = self.account.get_queue(self.queue_name)
        while True:
            try:
                message = queue.receive_message(wait_seconds=5)
                event = json.loads(message.message_body)
                if event['name'] in self.listeners:
                    for listener in self.listeners.get(event['name']):
                        listener.process(event)
                queue.delete_message(receipt_handle=message.receipt_handle)
            except MNSExceptionBase as e:
                if e.type == 'QueueNotExist':
                    logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name)
                else:
                    logging.error('No Message, continue waiting')


class BasicListener(object):
    def process(self, event):
        pass

The consumer uses long polling (wait_seconds=5) to retrieve messages. When a message arrives, it parses the JSON body, looks up registered listeners by event name, and calls each listener's process method. After processing, it deletes the message from the queue.

Event payload structure

State change events use the following structure:

FieldDescriptionExample
event['name']Event type identifierInstance:StateChange
event['content']['state']Current instance stateCreated, Starting, Running, Stopped, Deleted
event['content']['resourceId']ECS instance IDi-bp1abc...

Scenario 1: Log instance creation and release events

Track the full lifecycle of ECS instances by logging Created and Deleted state changes. Released instances no longer appear in the ECS console -- recording these events provides an audit trail for capacity planning and post-incident analysis.

Create the listener

# -*- coding: utf-8 -*-
import logging
from .mns_client import BasicListener


class ListenerLog(BasicListener):
    def process(self, event):
        state = event['content']['state']
        resource_id = event['content']['resourceId']
        if state == 'Created':
            logging.info(f'The instance {resource_id} state is {state}')
        elif state == 'Deleted':
            logging.info(f'The instance {resource_id} state is {state}')

Run the consumer

mns_client = MNSClient()

mns_client.regist_listener(ListenerLog())

mns_client.run()

Verify the result

  1. Create or release an ECS instance in the console.

  2. Check the consumer logs for a message like The instance i-bp1abc... state is Created.

In production, store events in a database or Simple Log Service (SLS) instead of logging to stdout. This enables searching and auditing historical instance lifecycle data.

Scenario 2: Restart stopped instances automatically

An ECS instance can stop unexpectedly due to system exceptions or overdue payments. This listener monitors for Stopped events and calls the ECS StartInstance API to restart the affected instance.

Create the listener

Reuse the MNS consumer from Step 2 and register a new listener:

# -*- coding: utf-8 -*-
import logging

from alibabacloud_ecs20140526.client import Client as Ecs20140526Client
from alibabacloud_ecs20140526.models import StartInstanceRequest
from alibabacloud_tea_openapi.models import Config

from .config import Conf
from .mns_client import BasicListener


class ECSClient(object):
    def __init__(self, client):
        self.client = client

    def start_instance(self, instance_id):
        logging.info(f'Start instance {instance_id} ...')
        request = StartInstanceRequest(
            instance_id=instance_id
        )
        self.client.start_instance(request)


class ListenerStart(BasicListener):
    def __init__(self):
        ecs_config = Config(
            access_key_id=Conf.access_key,
            access_key_secret=Conf.access_key_secret,
            endpoint=f'ecs.{Conf.region_id}.aliyuncs.com'
        )
        client = Ecs20140526Client(ecs_config)
        self.ecs_client = ECSClient(client)

    def process(self, event):
        detail = event['content']
        instance_id = detail['resourceId']
        if detail['state'] == 'Stopped':
            self.ecs_client.start_instance(instance_id)

Run the consumer

mns_client = MNSClient()

mns_client.regist_listener(ListenerStart())

mns_client.run()

Verify the result

  1. Stop a running ECS instance manually.

  2. Check the consumer logs for Start instance i-bp1abc... ....

  3. Confirm the instance state returns to Running in the ECS console.

In production, monitor subsequent state transitions (Starting, Running, or Stopped) after issuing the start command. Use timers and retry counters to handle cases where the restart fails.

Scenario 3: Remove a spot instance from SLB before release

About five minutes before a spot instance is reclaimed, Cloud Monitor sends an Instance:PreemptibleInstanceInterruption event. Use this window to proactively remove the instance from the backend server group of a Server Load Balancer (SLB) instance, rather than waiting for SLB to detect the release passively.

Create the listener

Reuse the MNS consumer from Step 2 and register a listener for the spot instance interruption event:

# -*- coding: utf-8 -*-
from alibabacloud_slb20140515.client import Client as Slb20140515Client
from alibabacloud_slb20140515.models import RemoveVServerGroupBackendServersRequest
from alibabacloud_tea_openapi.models import Config

from .config import Conf
from .mns_client import BasicListener


class SLBClient(object):
    def __init__(self):
        self.client = self.create_client()

    def create_client(self):
        config = Config()
        config.access_key_id = Conf.access_key
        config.access_key_secret = Conf.access_key_secret
        config.endpoint = 'slb.aliyuncs.com'
        return Slb20140515Client(config)

    def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id):
        request = RemoveVServerGroupBackendServersRequest(
            region_id=Conf.region_id,
            vserver_group_id=vserver_group_id,
            backend_servers="[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]"
        )
        response = self.client.remove_vserver_group_backend_servers(request)
        return response


class ListenerSLB(BasicListener):
    def __init__(self, vserver_group_id):
        self.slb_caller = SLBClient()
        self.vserver_group_id = Conf.vserver_group_id

    def process(self, event):
        detail = event['content']
        instance_id = detail['instanceId']
        if detail['action'] == 'delete':
            self.slb_caller.remove_vserver_group_backend_servers(self.vserver_group_id, instance_id)
Important

The event name for spot instance interruptions differs from state change events. Register this listener with the event name Instance:PreemptibleInstanceInterruption:

mns_client = MNSClient()

mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption')

mns_client.run()

Spot instance interruption event structure

Spot instance interruption events use different field names than state change events:

FieldDescriptionExample
event['name']Event type identifierInstance:PreemptibleInstanceInterruption
event['content']['instanceId']ECS instance IDi-bp1abc...
event['content']['action']Interruption actiondelete

Verify the result

  1. Wait for a spot instance interruption event (or simulate one in a test environment).

  2. Confirm the instance is removed from the SLB vServer group before release.

In production, request a replacement spot instance and attach it to the SLB instance immediately to maintain service availability.

Run all listeners together

Register all three listeners on a single MNS consumer to handle multiple event types simultaneously:

mns_client = MNSClient()

# Scenario 1: Log lifecycle events
mns_client.regist_listener(ListenerLog())

# Scenario 2: Auto-restart stopped instances
mns_client.regist_listener(ListenerStart())

# Scenario 3: Remove spot instances from SLB before release
mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption')

mns_client.run()

Event type reference

Event nameTriggerKey fields
Instance:StateChangeInstance state transitions (Created, Starting, Running, Stopped, Deleted)content.state, content.resourceId
Instance:PreemptibleInstanceInterruptionSpot instance scheduled for release (~5 min before reclamation)content.instanceId, content.action

References