You can use Elastic Compute Service (ECS) monitoring features to identify and troubleshoot instance issues and address potential risks before they affect your business.

Handle system events in a timely manner

When the system performs O&M and identifies issues that affect the running of ECS instances, system event notifications are sent. System event notifications provide information such as solutions and event cycles. We recommend that you handle system events in a timely manner to prevent consequences of system events such as instance restart and stop from affecting your business deployed on the instances. For more information, see Overview.

When a subscription instance expires, a system event is displayed in the ECS console, as shown in the following figure. system-event
Make sure that internal messages for instance expiration, service O&M, and instance issues are enabled on the Common Settings page in the Message Center console, as shown in the following figure. Otherwise, you cannot receive system event notifications in the ECS console. noti

Monitor the running metrics of instances

Alibaba Cloud collects and shows the running metrics of your instances to help you understand their real-time and historical running status. You can check whether instances are running normally based on their running metrics. If the CPU utilization of an instance is consistently high, you can check whether processes on the instance are abnormal or whether the configurations of the instance cannot meet your requirements.

You can view the running metrics of an instance on the Instance Details page in the ECS console or on the Host Monitoring page in the CloudMonitor console. For more information, see View the monitoring information of an ECS instance and Overview
  • The following running metrics of an instance are displayed on the Instance Details page in the ECS console:
    • The usage of computing, storage, and network resources such as the CPU utilization, disk read/write performance, and packet forwarding rate
    • The CPU credit usage of a burstable instance
    instance-monitoring
  • The following running metrics of an instance are displayed on the Host Monitoring page in the CloudMonitor console:
    • The usage of computing, storage, and network resources such as the CPU utilization, disk read/write performance, and packet forwarding rate
    • The active processes on an instance
    • The GPU memory usage of a GPU-accelerated instance
    cloudmonitor-host

Use the alerting feature to trigger notifications

You can use the alerting feature of CloudMonitor to set alert rules for specified events and instance running metrics. When specified events occur or when instance running metrics are abnormal, notifications are sent to the contacts by email. This reduces manual O&M workloads. For more information, see Configure event notifications and Configure alerts for an ECS instance.

You can set an alert rule for a specified event, as shown in the following figure. event-alert
You can set alert rules for instance running metrics, as shown in the following figure. host-alert