This topic provides a practical example of how to use Cloud Monitor to automatically process Elastic Compute Service (ECS) host state change events using a queue from Simple Message Queue (formerly MNS).
Prerequisites
Create a queue, such as
ecs-cms-event, in the Simple Message Queue (formerly MNS) console.For more information, see Create a queue.
Create a system event-triggered alert rule in the Cloud Monitor console. For more information, see Manage system event-triggered alert rules (Old version).
Install the Python dependencies.
All code in this topic uses Python 3.7 as an example. You must install the MNS SDK for Python and the ECS SDK for Python.
For more information about how to install the Python SDK, see Install Python SDK.
If you use other programming languages, see Download and use MNS SDK and Overview of ECS SDK.
Background information
In addition to existing system events, ECS publishes state change events and interruption notification events for spot instances through Cloud Monitor. An ECS state change event is triggered whenever the state of an ECS host changes. These changes can be initiated by you in the console, through an OpenAPI call, or with an SDK. They can also be triggered automatically by services such as Auto Scaling, by overdue payments, or by system exceptions.
Cloud Monitor provides four methods to process event-triggered alerts: Simple Message Queue (formerly MNS), Function Compute, URL callback, and Simple Log Service. This topic uses Simple Message Queue (formerly MNS) as an example to describe three best practices for automatically processing ECS host state change events.
Procedure
Cloud Monitor delivers all ECS host state change events to Simple Message Queue (formerly MNS). You can then use Simple Message Queue (formerly MNS) to retrieve and process the messages.
Best practice 1: Record creation and release events for all ECS hosts.
You cannot query released instances in the ECS console. To enable queries for released instances, you can use ECS host state change events to record the lifecycle of all ECS hosts in a database or Simple Log Service. A
Createdevent is sent when you create an ECS host, and aDeletedevent is sent when you release an ECS host.Edit a Conf file.
The Conf file must contain the
endpointfor Simple Message Queue (formerly MNS), theaccess_keyandaccess_key_secretfor your Alibaba Cloud account, theregion_id(such as `cn-beijing`), and thequeue_name.On the Queue Details page, in the Endpoint section, you can view the endpoints for Internet Access and Internal Access.
import os # Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are set. # Leaking your code can expose your AccessKey and compromise all resources in your account. The following code uses environment variables to get the AccessKey. This is for reference only. Use a more secure method, such as Security Token Service (STS). class Conf: endpoint = 'http://<id>.mns.<region>.aliyuncs.com/' access_key = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] access_key_secret = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] region_id = 'cn-beijing' queue_name = 'test' vserver_group_id = 'your_vserver_group_id'Use the MNS SDK to write an MNS client that retrieves messages from MNS.
# -*- coding: utf-8 -*- import json from mns.mns_exception import MNSExceptionBase import logging from mns.account import Account from . import Conf class MNSClient(object): def __init__(self): self.account = Account(Conf.endpoint, Conf.access_key, Conf.access_key_secret) self.queue_name = Conf.queue_name self.listeners = dict() def regist_listener(self, listener, eventname='Instance:StateChange'): if eventname in self.listeners.keys(): self.listeners.get(eventname).append(listener) else: self.listeners[eventname] = [listener] def run(self): queue = self.account.get_queue(self.queue_name) while True: try: message = queue.receive_message(wait_seconds=5) event = json.loads(message.message_body) if event['name'] in self.listeners: for listener in self.listeners.get(event['name']): listener.process(event) queue.delete_message(receipt_handle=message.receipt_handle) except MNSExceptionBase as e: if e.type == 'QueueNotExist': logging.error('Queue %s not exist, please create queue before receive message.', self.queue_name) else: logging.error('No Message, continue waiting') class BasicListener(object): def process(self, event): passThe preceding code calls a listener to consume the data retrieved from Simple Message Queue (formerly MNS) and then deletes the message.
Register a specific listener to consume events. This simple listener checks for
CreatedandDeletedevents and prints a log entry when it receives one.# -*- coding: utf-8 -*- import logging from .mns_client import BasicListener class ListenerLog(BasicListener): def process(self, event): state = event['content']['state'] resource_id = event['content']['resourceId'] if state == 'Created': logging.info(f'The instance {resource_id} state is {state}') elif state == 'Deleted': logging.info(f'The instance {resource_id} state is {state}')The
mainfunction is written as follows:mns_client = MNSClient() mns_client.regist_listener(ListenerLog()) mns_client.run()In a production environment, you might need to store events in a database or Simple Log Service (SLS) for future search and auditing.
Best practice 2: Automatically restart stopped ECS hosts.
In some scenarios, an ECS host might stop unexpectedly. You can configure a process to automatically restart stopped ECS hosts.
To automatically restart a stopped ECS host, you can reuse the MNS client from the first best practice and add a new listener. When the listener receives a
Stoppedevent, it executes theStartcommand on that ECS host.# -*- coding: utf-8 -*- import logging from alibabacloud_ecs20140526.client import Client as Ecs20140526Client from alibabacloud_ecs20140526.models import StartInstanceRequest from alibabacloud_tea_openapi.models import Config from .config import Conf from .mns_client import BasicListener class ECSClient(object): def __init__(self, client): self.client = client # Start the ECS host def start_instance(self, instance_id): logging.info(f'Start instance {instance_id} ...') request = StartInstanceRequest( instance_id=instance_id ) self.client.start_instance(request) class ListenerStart(BasicListener): def __init__(self): ecs_config = Config( access_key_id=Conf.access_key, access_key_secret=Conf.access_key_secret, endpoint=f'ecs.{Conf.region_id}.aliyuncs.com' ) client = Ecs20140526Client(ecs_config) self.ecs_client = ECSClient(client) def process(self, event): detail = event['content'] instance_id = detail['resourceId'] if detail['state'] == 'Stopped': self.ecs_client.start_instance(instance_id)In a production environment, after you run the
Startcommand, you might need to monitor subsequent events, such asStarting,Running, orStopped. You can then use timers and counters to handle success or failure scenarios.Best practice 3: Automatically remove a spot instance from a Server Load Balancer (SLB) instance before the spot instance is released.
About five minutes before a spot instance is released, a release alert event is sent. You can use this short window to run business continuity logic. For example, you can proactively remove the instance from the backend servers of a Server Load Balancer (SLB) instance instead of passively waiting for SLB to handle the removal after the instance is released.
Reuse the MNS client from the first best practice and add a new listener. When the listener receives a release alert for a spot instance, it calls the SLB SDK.
# -*- coding: utf-8 -*- from alibabacloud_slb20140515.client import Client as Slb20140515Client from alibabacloud_slb20140515.models import RemoveVServerGroupBackendServersRequest from alibabacloud_tea_openapi.models import Config from .config import Conf from .mns_client import BasicListener class SLBClient(object): def __init__(self): self.client = self.create_client() def create_client(self): config = Config() config.access_key_id = Conf.access_key config.access_key_secret = Conf.access_key_secret config.endpoint = 'slb.aliyuncs.com' return Slb20140515Client(config) def remove_vserver_group_backend_servers(self, vserver_group_id, instance_id): request = RemoveVServerGroupBackendServersRequest( region_id=Conf.region_id, vserver_group_id=vserver_group_id, backend_servers="[{'ServerId':'" + instance_id + "','Port':'80','Weight':'100'}]" ) response = self.client.remove_vserver_group_backend_servers(request) return response class ListenerSLB(BasicListener): def __init__(self, vserver_group_id): self.slb_caller = SLBClient() self.vserver_group_id = Conf.vserver_group_id def process(self, event): detail = event['content'] instance_id = detail['instanceId'] if detail['action'] == 'delete': self.slb_caller.remove_vserver_group_backend_servers(self.vserver_group_id, instance_id)ImportantThe event name for a spot instance release alert is different from the previous ones. For example, the listener registration call is
mns_client.regist_listener(ListenerSLB(Conf.vserver_group_id), 'Instance:PreemptibleInstanceInterruption').In a production environment, you must request a new spot instance and attach it to the SLB instance to ensure service availability.