In addition to the existing system events, CloudMonitor supports the status change events for Elastic Compute Service (ECS). The status change events include interruption notification events that are applied to preemptible instances. A status change event is triggered when the status of an ECS instance changes. The status changes can be caused by operations that you perform in the ECS console and by calling API operations or using SDKs, auto scaling, overdue payment, and system exceptions.

Background information

The existing system events for ECS are used to notify you of alerts that require manual operations. The status change events are not about alerts. They are common notifications that are suitable for automated audit and O&M scenarios. CloudMonitor allows you to automatically handle the status change events of ECS instances by using Function Compute or Message Service (MNS).

Before you begin

  • Create an MNS queue.
    1. Log on to the MNS console.
    2. On the Queues page, select a region and click Create Queue in the upper-right corner.Create Queue
    3. In the Create Queue dialog box, enter a queue name, set relevant parameters, and click OK. In this example, set the queue name to ecs-cms-event.
  • Create an event-triggered alert rule.
    1. Log on to the CloudMonitor console.
    2. In the left-side navigation pane, click Event Monitoring.
    3. On the Event Monitoring page, click the Alarm Rules tab. On the Alarm Rules tab, click Create Event Alert.Create / Modify Event Alert
    4. In the Basic Information section of the Create / Modify Event Alert right-side pane, enter an alert rule name. In this example, enter ecs-test-rule.
    5. In the Event alert section, perform the following operations:
      • Set the Event Type parameter to System Event.
      • Set the Product Type parameter to ECS.
      • Set the Event Type parameter to Status Notification.
      • Set the Event Name parameter as needed.
      • Set the Resource Range parameter as needed. If you set the Resource Range parameter to All Resources, CloudMonitor sends alert notifications for all resource-related events. If you set the Resource Range parameter to Application Groups, CloudMonitor sends alert notifications for events related to the resources in the specified application group.
    6. In the Alarm Type section, perform the following operations:
      • Set the Contact Group and Notification Method parameters as needed.
      • Select MNS queue and set the Region and Queue parameters as needed. In this example, select the ecs-cms-event queue.
    7. Click OK.
  • Install Python dependencies.

    The following code is tested in Python 3.6. You can use other programming languages, such as Java, as needed.

    Use Python Package Index (PyPI) to install the following Python dependencies:
    • aliyun-python-sdk-core-v3>=2.12.1
    • aliyun-python-sdk-ecs>=4.16.0
    • aliyun-mns>=1.1.5

Procedure

CloudMonitor sends all status change events of ECS instances to MNS. Then, you can write code to receive messages from MNS and handle the messages.

  • Practice 1: Record all creation and release events of ECS instances
    You cannot query ECS instances that have been released in the ECS console. If you need to query released ECS instances, you can store status change events of all ECS instances in your own database or logs. When an ECS instance is created, a Pending event is triggered. When an ECS instance is released, a Deleted event is triggered. CloudMonitor records both types of events.
    1. Create a Conf file.
      Add the following parameters related to MNS in the Conf file:
      • endpoint: the endpoint for accessing MNS. You can obtain the endpoint by clicking Get Endpoint on the Queues page in the MNS console.
      • access_key and access_key_secret: the AccessKey ID and AccessKey secret used to access MNS. You can obtain the AccessKey ID and AccessKey secret in the User Management console.
      • region_id and queue_name: the region where the MNS queue resides and the name of the MNS queue. You can obtain the region ID and queue name on the Queues page in the MNS console.
      class Conf:
          endpoint = 'http://<id>.mns.<region>.aliyuncs.com/'
          access_key = '<access_key>'
          access_key_secret = '<access_key_secrect>'
           = 'cn-beijing'
          queue_name = 'test'
          vsever_group_id = '<your_vserver_group_id>'
                                          
    2. Use the MNS SDK to develop an MNS client for receiving messages from MNS.
      # -*- coding: utf-8 -*-
      import json
      from mns.mns_exception import MNSExceptionBase
      import logging
      from mns.account import Account
      from . import Conf
      
      
      class MNSClient(object):
          def __init__(self):
              self.account =  Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret)
              self.queue_name = Conf.queue_name
              self.listeners = dict()
      
          def regist_listener(self, listener, eventname='Instance:StateChange'):
              if eventname in self.listeners.keys():
                  self.listeners.get(eventname).append(listener)
              else:
                  self.listeners[eventname] = [listener]
      
          def run(self):
              queue = self.account.get_queue(self.queue_name)
              while True:
                  try:
                      message = queue.receive_message(wait_seconds=5)
                      event = json.loads(message.message_body)
                      if event['name'] in self.listeners:
                          for listener in self.listeners.get(event['name']):
                              listener.process(event)
                      queue.delete_message(receipt_handle=message.receipt_handle)
                  except MNSExceptionBase as e:
                      if e.type == 'QueueNotExist':
                          logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name)
                      else:
                          logging.error('No Message, continue waiting')
      
      
      class BasicListener(object):
          def process(self, event):
              pass
                                      

      The preceding code is used to receive messages from MNS and delete the messages after the listener is called to consume the messages.

    3. Register a listener to consume events. The following listener generates a log entry after it receives a Pending or Deleted event.
       # -*- coding: utf-8 -*-
      import logging
      from .mns_client import BasicListener
      
      
      class ListenerLog(BasicListener):
          def process(self, event):
              state = event['content']['state']
              resource_id = event['content']['resourceId']
              if state == 'Panding':
                  logging.info(f'The instance {resource_id} state is {state}')
              elif state == 'Deleted':
                  logging.info(f'The instance {resource_id} state is {state}')
                                      
      Add the following code to the Main function:
      mns_client = MNSClient()
      
      mns_client.regist_listener(ListenerLog())
      
      mns_client.run()

      In the production environment, you can store the events in your database or Log Service for subsequent queries and audits.

  • Practice 2: Automatically start ECS instances that are shut down

    In scenarios where ECS instances may be shut down unexpectedly, you may want to automatically start the ECS instances.

    You can reuse the MNS client developed in Practice 1 and create another listener. When the listener receives a Stopped event for an ECS instance, you can run the start command on the ECS instance to start it.

    # -*- coding: utf-8 -*-
    import logging
    from aliyunsdkecs.request.v20140526 import StartInstanceRequest
    from aliyunsdkcore.client import AcsClient
    from .mns_client import BasicListener
    from .config import Conf
    
    
    class ECSClient(object):
        def __init__(self, acs_client):
            self.client = acs_client
    
        # Start the target ECS instance.
        def start_instance(self, instance_id):
            logging.info(f'Start instance {instance_id} ...')
            request = StartInstanceRequest.StartInstanceRequest()
            request.set_accept_format('json')
            request.set_InstanceId(instance_id)
            self.client.do_action_with_exception(request)
    
    
    class ListenerStart(BasicListener):
        def __init__(self):
            acs_client = AcsClient(Conf.access_key, Conf.access_key_secret, Conf.region_id)
            self.ecs_client = ECSClient(acs_client)
    
        def process(self, event):
            detail = event['content']
            instance_id = detail['resourceId']
            if detail['state'] == 'Stopped':
                self.ecs_client.start_instance(instance_id)
                        

    In the production environment, you can listen to Starting, Running, or Stopped events after the start command is run. Then, you can perform further O&M by using a timer and a counter based on whether the ECS instance is started.

  • Practice 3: Automatically remove preemptible instances from SLB before they are released

    An interruption notification event is triggered 5 minutes before a preemptible instance is released. During the 5 minutes, you can perform specific operations to prevent your services from being interrupted. For example, you can remove the target preemptible instance from a Server Load Balancer (SLB) instance.

    You can reuse the MNS client developed in Practice 1 and create another listener. When the listener receives the interruption notification event for a preemptible instance, you can call the SLB SDK to remove the preemptible instance from an SLB instance.

    # -*- coding: utf-8 -*-
    from aliyunsdkcore.client import AcsClient
    from aliyunsdkcore.request import CommonRequest
    from .mns_client import BasicListener
    from .config import Conf
    
    
    class SLBClient(object):
        def __init__(self):
            self.client = AcsClient(Conf.access_key, Conf.access_key_secret, Conf.region_id)
            self.request = CommonRequest()
            self.request.set_method('POST')
            self.request.set_accept_format('json')
            self.request.set_version('2014-05-15')
            self.request.set_domain('slb.aliyuncs.com')
            self.request.add_query_param('RegionId', Conf.region_id)
    
        def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id):
            self.request.set_action_name('RemoveVServerGroupBackendServers')
            self.request.add_query_param('VServerGroupId', vserver_group_id)
            self.request.add_query_param('BackendServers',
                                         "[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]")
            response = self.client.do_action_with_exception(self.request)
            return str(response, encoding='utf-8')
    
    
    class ListenerSLB(BasicListener):
        def __init__(self, vsever_group_id):
            self.slb_caller = SLBClient()
            self.vsever_group_id = Conf.vsever_group_id
    
        def process(self, event):
            detail = event['content']
            instance_id = detail['instanceId']
            if detail['action'] == 'delete':
                self.slb_caller.remove_vserver_group_backend_servers(self.vsever_group_id, instance_id)
                        
    Notice

    For interruption notification events, set the event name in the following way: mns_client.regist_listener(ListenerSLB(Conf.vsever_group_id), 'Instance:PreemptibleInstanceInterruption').

    In the production environment, you can apply for another preemptible instance and add it as a backend server to SLB to ensure the performance of your services.