×
Community Blog Enhanced Management of ECS Instances with System Event

Enhanced Management of ECS Instances with System Event

Alibaba Cloud's System Event function for Elastic Compute Service (ECS) gives customers better control over their computing resources.

Alibaba Cloud System Event is a functionality for Elastic Compute Service (ECS), which helps users to better understand the running status of an ECS instance. A system event is a scheduled and recorded maintenance event of ECS service. System events occur when updates, invalid operations, unexpected system failures, or unexpected hardware or software failures are detected on your ECS instance. Moreover, you will receive notification about the details of the event in the console when it occurs, including the event response plan and event cycle.

When an ECS user receives a notification from Alibaba Cloud, he or she can acknowledge the planned underlying maintenance for ECS instance by system event. The user can then choose the appropriate time window to execute the system event as well as operation activities according to individual business needs. By providing users this flexibility, users can reduce the impact on system reliability and business continuity.

Multi-Region for Increased Reliability

Alibaba Cloud is dedicated to guaranteeing data reliability and high availability of cloud computing infrastructure and cloud servers to our customers. Compared with traditional IDC or on-premises environments, Alibaba Cloud adopts more stringent IDC standards, server access standards, and O&M standards. In addition, Alibaba Cloud provides services in multi-Zones in various Regions. When customers need higher availability, they can leverage Alibaba Cloud's Multi-zones to build their own active/standby or active/active services.

For financial solutions, which may have higher requirements for business continuity, systems and services can be built based on multiple regions and zones, for better RTO/RPO and greater fault tolerance. For one ECS instance, Alibaba Cloud uses commercially reasonable endeavors to provide a Monthly Uptime Percentage of no less than 99.975% each calendar month in connection with your use of the ECS instance. Moreover, Alibaba Cloud provides the service availability of no less than 99.99% with multi-zones in a region.

Proactive Maintenance for Increased Availability

In order to ensure a high level of service availability, Alibaba Cloud will perform proactive maintenance for physical servers that host ECS instances and resolve potential issues about hardware and software to continuously improve system reliability, performance, and security protection capabilities. Normally, if there is an online update or patching for key software components on the physical server, the ECS instances running on this server will not be affected or undergo some slight performance impact less than 1 minute during this online maintenance, Furthermore, if there is a maintenance activity planned on the physical server with potential severe impact to ECS instances, which will be live migrated to another server to maintain the health of ECS instances.

However, ECS customers may occasionally receive message notifications to remind that the ECS instance needs to be maintained due to the risk of a physical server failure, and Alibaba Cloud sets a scheduled system event to restart the instance and migrate to another health physical server in a few days for emergency cases and a month for normal cases.

Scheduled Maintenance for Increased Flexibility

In fact, this is a maintenance notification triggered automatically by Alibaba Cloud's proactive maintenance. During the maintenance process, some software and hardware failures may cause live-migration to fail. In this case, Alibaba Cloud will send the above notification to the user to remind that the system is about to perform a migration by restarting the instance.

In order to improve the efficiency and experience of your operation of ECS instances, Alibaba Cloud has provided the system event functionality for ECS instances. When customers receive a notification, they can check the system planning events at the ECS console or using OpenAPI, and select the appropriate time to execute the events according to the needs of the business (in some cases, customer can only wait for system events to execute at scheduled time windows). This eliminates the need for manual intervention by customer contact via a work order, reduces risk of human error, and provides the possibility for automated failover based on system events.

If there is a scheduled system event to restart instance, an indication appears on the ECS Console to remind the user to check. In Unsettled Events > System Scheduled Events page, user can check instance-related information for instance ID, region, status, and system event-related information for event type, planned schedule, and optional operation button. Alternatively, ECS user can query the instance system events with OpenAPI DescribeInstanceFullStatus.

System Events for Ensuring Business Continuity

When mission-critical applications are running on the ECS instances, any unexpected restart of an instance may threaten or seriously affect system availability and business continuity. Therefore, we recommend that the users build the application with fault tolerant architecture and leverage the services such as regions/zones and load balancer to enhance system reliability.

On this basis, for the system event triggered by Alibaba Cloud's proactive maintenance, the notice will usually be sent to the users a few days in advance. This allows users to use the period before the planned execution time as user's operation window to prepare the failover operations and then restart the instance.

For example, users can timely transfer the workload from the instance with scheduled event to another one in a cluster environment, or backup and transport the data on the local disk before instance redeployment. They can also proactively modify the configuration of load balancer and elastic scaling, or make sequential stop and start instances based on the business logic, to minimize the impact of instance restart on business continuity.

Furthermore, Alibaba Cloud will continue to launch more types and scenarios of ECS system events. In this way, we hope to continuously improve the efficiency and experience of IT operation on Alibaba Cloud, and deliver more interfaces and services to support users to achieve the peace of mind for operation and continuity for business.

About Alibaba Cloud Elastic Compute Service

As a leading and trusted cloud service provider, Alibaba Cloud provides and guarantees the availability, stability, and security of computing, storage, network services, and the underlying infrastructure. According to the strategic targets and business needs, customers can design a high available IT architecture on Alibaba Cloud, select suitable products and services to build a reliable and robust business system.

Based on this foundation, through Alibaba Cloud's OpenAPI, monitoring, orchestration, and other diversified means, customers are able to obtain various IT development and operation capabilities, such as reqid provision of resources, easy management of multiple sets of environments, agile deployment, etc.

To learn more about the System Event feature for Alibaba Cloud ECS, visit https://www.alibabacloud.com/help/doc-detail/66574.htm

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments