All Products
Search
Document Center

Elastic Compute Service:Automated O&M for ECS host state change events

Last Updated:Nov 21, 2025

This topic provides a practical example of how to use Cloud Monitor to automatically process Elastic Compute Service (ECS) host state change events using a queue from Simple Message Queue (formerly MNS).

Prerequisites

Background information

In addition to existing system events, ECS publishes state change events and interruption notification events for spot instances through Cloud Monitor. An ECS state change event is triggered whenever the state of an ECS host changes. These changes can be initiated by you in the console, through an OpenAPI call, or with an SDK. They can also be triggered automatically by services such as Auto Scaling, by overdue payments, or by system exceptions.

Cloud Monitor provides four methods to process event-triggered alerts: Simple Message Queue (formerly MNS), Function Compute, URL callback, and Simple Log Service. This topic uses Simple Message Queue (formerly MNS) as an example to describe three best practices for automatically processing ECS host state change events.

Procedure

Cloud Monitor delivers all ECS host state change events to Simple Message Queue (formerly MNS). You can then use Simple Message Queue (formerly MNS) to retrieve and process the messages.

  • Best practice 1: Record creation and release events for all ECS hosts.

    You cannot query released instances in the ECS console. To enable queries for released instances, you can use ECS host state change events to record the lifecycle of all ECS hosts in a database or Simple Log Service. A Created event is sent when you create an ECS host, and a Deleted event is sent when you release an ECS host.

    1. Edit a Conf file.

      The Conf file must contain the endpoint for Simple Message Queue (formerly MNS), the access_key and access_key_secret for your Alibaba Cloud account, the region_id (such as `cn-beijing`), and the queue_name.

      On the Queue Details page, in the Endpoint section, you can view the endpoints for Internet Access and Internal Access.
      import os
      
      # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set.
      # Leaking your code can expose your AccessKey and compromise all resources in your account. The following code uses environment variables to get the AccessKey. This is for reference only. Use a more secure method, such as Security Token Service (STS).
      class Conf:
          endpoint = 'http://<id>.mns.<region>.aliyuncs.com/'
          access_key = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
          access_key_secret = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
          region_id = 'cn-beijing'
          queue_name = 'test'
          vserver_group_id = 'your_vserver_group_id'
                                          
    2. Use the MNS SDK to write an MNS client that retrieves messages from MNS.

      # -*- coding: utf-8 -*-
      import json
      from mns.mns_exception import MNSExceptionBase
      import logging
      from mns.account import Account
      from . import Conf
      
      
      class MNSClient(object):
          def __init__(self):
              self.account =  Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret)
              self.queue_name = Conf.queue_name
              self.listeners = dict()
      
          def regist_listener(self, listener, eventname='Instance:StateChange'):
              if eventname in self.listeners.keys():
                  self.listeners.get(eventname).append(listener)
              else:
                  self.listeners[eventname] = [listener]
      
          def run(self):
              queue = self.account.get_queue(self.queue_name)
              while True:
                  try:
                      message = queue.receive_message(wait_seconds=5)
                      event = json.loads(message.message_body)
                      if event['name'] in self.listeners:
                          for listener in self.listeners.get(event['name']):
                              listener.process(event)
                      queue.delete_message(receipt_handle=message.receipt_handle)
                  except MNSExceptionBase as e:
                      if e.type == 'QueueNotExist':
                          logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name)
                      else:
                          logging.error('No Message, continue waiting')
      
      
      class BasicListener(object):
          def process(self, event):
              pass
                                      

      The preceding code calls a listener to consume the data retrieved from Simple Message Queue (formerly MNS) and then deletes the message.

    3. Register a specific listener to consume events. This simple listener checks for Created and Deleted events and prints a log entry when it receives one.

       # -*- coding: utf-8 -*-
      import logging
      from .mns_client import BasicListener
      
      
      class ListenerLog(BasicListener):
          def process(self, event):
              state = event['content']['state']
              resource_id = event['content']['resourceId']
              if state == 'Created':
                  logging.info(f'The instance {resource_id} state is {state}')
              elif state == 'Deleted':
                  logging.info(f'The instance {resource_id} state is {state}')
                                      

      The main function is written as follows:

      mns_client = MNSClient()
      
      mns_client.regist_listener(ListenerLog())
      
      mns_client.run()

      In a production environment, you might need to store events in a database or Simple Log Service (SLS) for future search and auditing.

  • Best practice 2: Automatically restart stopped ECS hosts.

    In some scenarios, an ECS host might stop unexpectedly. You can configure a process to automatically restart stopped ECS hosts.

    To automatically restart a stopped ECS host, you can reuse the MNS client from the first best practice and add a new listener. When the listener receives a Stopped event, it executes the Start command on that ECS host.

    # -*- coding: utf-8 -*-
    import logging
    
    from alibabacloud_ecs20140526.client import Client as Ecs20140526Client
    from alibabacloud_ecs20140526.models import StartInstanceRequest
    from alibabacloud_tea_openapi.models import Config
    
    from .config import Conf
    from .mns_client import BasicListener
    
    
    class ECSClient(object):
        def __init__(self, client):
            self.client = client
    
        # Start the ECS host
        def start_instance(self, instance_id):
            logging.info(f'Start instance {instance_id} ...')
            request = StartInstanceRequest(
                instance_id=instance_id
            )
            self.client.start_instance(request)
    
    
    class ListenerStart(BasicListener):
        def __init__(self):
            ecs_config = Config(
                access_key_id=Conf.access_key,
                access_key_secret=Conf.access_key_secret,
                endpoint=f'ecs.{Conf.region_id}.aliyuncs.com'
            )
            client = Ecs20140526Client(ecs_config)
            self.ecs_client = ECSClient(client)
    
        def process(self, event):
            detail = event['content']
            instance_id = detail['resourceId']
            if detail['state'] == 'Stopped':
                self.ecs_client.start_instance(instance_id)

    In a production environment, after you run the Start command, you might need to monitor subsequent events, such as Starting, Running, or Stopped. You can then use timers and counters to handle success or failure scenarios.

  • Best practice 3: Automatically remove a spot instance from a Server Load Balancer (SLB) instance before the spot instance is released.

    About five minutes before a spot instance is released, a release alert event is sent. You can use this short window to run business continuity logic. For example, you can proactively remove the instance from the backend servers of a Server Load Balancer (SLB) instance instead of passively waiting for SLB to handle the removal after the instance is released.

    Reuse the MNS client from the first best practice and add a new listener. When the listener receives a release alert for a spot instance, it calls the SLB SDK.

    # -*- coding: utf-8 -*-
    from alibabacloud_slb20140515.client import Client as Slb20140515Client
    from alibabacloud_slb20140515.models import RemoveVServerGroupBackendServersRequest
    from alibabacloud_tea_openapi.models import Config
    
    from .config import Conf
    from .mns_client import BasicListener
    
    
    class SLBClient(object):
        def __init__(self):
            self.client = self.create_client()
    
        def create_client(self):
            config = Config()
            config.access_key_id = Conf.access_key
            config.access_key_secret = Conf.access_key_secret
            config.endpoint = 'slb.aliyuncs.com'
            return Slb20140515Client(config)
    
        def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id):
            request = RemoveVServerGroupBackendServersRequest(
                region_id=Conf.region_id,
                vserver_group_id=vserver_group_id,
                backend_servers="[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]"
            )
            response = self.client.remove_vserver_group_backend_servers(request)
            return response
    
    
    class ListenerSLB(BasicListener):
        def __init__(self, vserver_group_id):
            self.slb_caller = SLBClient()
            self.vserver_group_id = Conf.vserver_group_id
    
        def process(self, event):
            detail = event['content']
            instance_id = detail['instanceId']
            if detail['action'] == 'delete':
                self.slb_caller.remove_vserver_group_backend_servers(self.vserver_group_id, instance_id)
    
    Important

    The event name for a spot instance release alert is different from the previous ones. For example, the listener registration call is mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption').

    In a production environment, you must request a new spot instance and attach it to the SLB instance to ensure service availability.