Alibaba Cloud will be launching a new functionality for the Elastic Compute Service (ECS) called System Event. A system event is a scheduled and recorded maintenance event of ECS service. System events occur when updates, invalid operations, unexpected system failures, or unexpected hardware or software failures are detected on your ECS instance. Moreover, you will receive notification about the details of the event in the console when it occurs, including the event response plan and event cycle.
When an ECS user receives a notification from Alibaba Cloud, he or she can acknowledge the planned underlying maintenance for ECS instance by system event. The user can then choose the appropriate time window to execute the system event as well as operation activities according to individual business needs. By providing users this flexibility, users can reduce the impact on system reliability and business continuity.
Alibaba Cloud is dedicated to guaranteeing data reliability and high availability of cloud computing infrastructure and cloud servers to our customers. Compared with traditional IDC or on-premises environments, Alibaba Cloud adopts more stringent IDC standards, server access standards, and O&M standards. In addition, Alibaba Cloud provides multi-Zones in various Regions. When customers need higher availability, they can leverage Alibaba Cloud's Multi-zones to build their own active/standby or active/active services.
For financial solutions, which may have higher requirements for business continuity, systems and services can be built based on multiple regions and zones, for better RTO/RPO and greater fault tolerance. For one ECS instance, Alibaba Cloud uses commercially reasonable endeavors to provide a Monthly Uptime Percentage of no less than 99.95% each calendar month in connection with your use of the ECS instance. Moreover, Alibaba Cloud provides the service availability of no less than 99.99% with multi-zones in a region.
In order to ensure a high level of service availability, Alibaba Cloud will perform proactive maintenance for physical servers that host ECS instances and resolve potential issues about hardware and software to continuously improve system reliability, performance, and security protection capabilities. Normally, when there is maintenance activity planned on the physical server, the ECS instance will be live migrated to another server to maintain the health of ECS instance.
However, ECS customers may occasionally receive message notifications to remind that the ECS instance needs to be maintained due to the risk of a physical server failure, and Alibaba Cloud sets a scheduled system event to restart the instance and migrate to a health physical server in a few days.
In fact, this is a maintenance notification triggered automatically by Alibaba Cloud's proactive maintenance. During the maintenance process, some software and hardware failures may cause live-migration to fail. In this case, Alibaba Cloud will send the above notification to the user to remind that the system is about to perform a migration by restarting the instance.
In order to improve the efficiency and experience of your operation of ECS instances, Alibaba Cloud will launch new functionality as system event for ECS instances. When customers receive a notification, they can check the system planning events at the ECS console or using OpenAPI, and select the appropriate time to execute the events according to the needs of the business (in some cases, customer can only wait for system events to execute at scheduled time windows). This eliminates the need for manual intervention by customer contact via a work order, reduces risk of human error, and provides the possibility for automated failover based on system events.
If there is a scheduled system event to restart instance, an indication appears on the ECS Console to remind the user to check. In Unsettled Events > System Scheduled Events page, user can check instance-related information for instance ID, region, status, and system event-related information for event type, planned schedule, and optional operation button. Alternatively, ECS user can query the instance system events with OpenAPI DescribeInstanceFullStatus.
When mission-critical applications are running on the ECS instances, any unexpected restart of an instance may threaten or seriously affect system availability and business continuity. Therefore, we recommend that the users build the application with fault tolerant architecture and leverage the services such as regions/zones and load balancer to enhance system reliability.
On this basis, for the system event triggered by Alibaba Cloud's proactive maintenance, the notice will usually be sent to the users a few days in advance. This allows users to use the period before the planned execution time as user's operation window to prepare the failover operations and then restart the instance.
For example, users can timely transfer the workload from the instance with scheduled event to another one in a cluster environment, or backup and transport the data on the local disk before instance redeployment. They can also proactively modify the configuration of load balancer and elastic scaling, or make sequential stop and start instances based on the business logic, to minimize the impact of instance restart on business continuity.
Furthermore, Alibaba Cloud will continue to launch more types and scenarios of ECS system events. In this way, we hope to continuously improve the efficiency and experience of IT operation on Alibaba Cloud, and deliver more interfaces and services to support users to achieve the peace of mind for operation and continuity for business.
As a leading and trusted cloud service provider, Alibaba Cloud provides and guarantees the availability, stability, and security of computing, storage, network services, and the underlying infrastructure. According to the strategic targets and business needs, customers can design a high available IT architecture on Alibaba Cloud, select suitable products and services to build a reliable and robust business system.
Based on this foundation, through Alibaba Cloud's OpenAPI, monitoring, orchestration, and other diversified means, customers are able to obtain various IT development and operation capabilities, such as reqid provision of resources, easy management of multiple sets of environments, agile deployment, etc.
To learn more about the System Event feature for Alibaba Cloud ECS, visit https://www.alibabacloud.com/help/doc-detail/66574.htm.
Alibaba Clouder - October 25, 2018
Alibaba Clouder - December 12, 2017
ApsaraDB - December 5, 2018
Alibaba Cloud Product Launch - December 17, 2018
Alibaba Clouder - November 20, 2018
AlibabaCloud_Network - September 14, 2018
An online computing service that offers elastic and secure virtual cloud servers to cater all your cloud hosting needs.Learn More
Powerful parallel computing capabilities based on GPU technology.Learn More
Resource management and task scheduling for large-scale batch processingLearn More
Super Computing Service provides ultimate computing performance and parallel computing cluster services for high-performance computing through high-speed RDMA network and heterogeneous accelerators such as GPU.Learn More
More Posts by Alibaba Clouder