The health of your Elastic Compute Service (ECS) instances is an important metric to measure. Healthy instances ensure that you can use your instances to process data or render videos, and that your customers can access your websites and applications. Alibaba Cloud provides data monitoring, data visualization capabilities, and real-time alerts to help you monitor the health of your ECS instances. A healthy instance is an instance that runs as expected.

Background information

You can monitor your ECS instances by using the ECS monitoring service or CloudMonitor. ECS provides monitoring capabilities for vCPU utilization, network traffic, and disk I/O for instances. Compared with ECS, CloudMonitor provides finer-grained monitoring of resources. The following section describes some of the monitoring metrics provided by CloudMonitor for ECS instances:
  • vCPU utilization: the percentage of allocated compute units that are currently in use on an ECS instance. A higher percentage indicates a higher vCPU load on the instance. You can view the monitoring data of an ECS instance by using the ECS or CloudMonitor console or by calling ECS API operations. You can also connect to an ECS instance to view its monitoring data. You can use one of the following methods to view the vCPU utilization of an ECS instance after you connect to the instance:
    • Windows instance: View the vCPU utilization in Task Manager. You can sort processes by vCPU utilization to identify processes that are consuming the vCPUs of the instance.
    • Linux instance: Run the top command on the instance to view its vCPU utilization. Press Shift+P to sort processes by vCPU utilization and identify processes that are consuming the vCPUs of the ECS instance.
  • Network traffic: the inbound and outbound bandwidth usages of the ECS instance in Kbit/s. ECS monitors public bandwidth usage, whereas CloudMonitor monitors both public and internal bandwidth usages. If an outbound public bandwidth of 1,024 Kbit/s is allocated to an ECS instance and the outbound public bandwidth usage by the instance reaches 1 Mbit/s, the allocated outbound public bandwidth is fully utilized.
    Note The monitoring data of public bandwidth over classic network does not include back-to-origin traffic. To view the complete monitoring data, log on to the CloudMonitor console.

ECS monitoring service

To view monitoring data in the ECS console, perform the following steps.

  1. Log on to the ECS console.
  2. In the left-side navigation pane, choose Instances & Images > Instances.
  3. In the top navigation bar, select a region.
  4. On the Instances page, find the instance that you want to monitor and click its ID.
  5. On the Instance Details page, click the Monitoring tab.
  6. Specify the time period to query and view monitoring data such as vCPU utilization.
    Instance monitoring data
    Note The length of the specified time period affects the granularity of the data displayed. Shorter time periods display higher resolution data. For example, the aggregation intervals in a 1 hour period and 6 hour period are different, which result in different average values.

You can also call ECS API operations such as DescribeInstanceMonitorData, DescribeDiskMonitorData, and DescribeEniMonitorData to query monitoring data.

The following table describes the monitoring metrics in ECS. The sampling interval for each metric is 1 minute.
Metric Description Unit
CPUUtilization The CPU utilization. %
InternetInRate(Classic Network) The average rate of inbound traffic over the Internet. bit/s
IntranetInRate The average rate of inbound traffic over the internal network. bit/s
InternetOutRate(Classic Network) The average rate of outbound traffic over the Internet. bit/s
IntranetOutRate The average rate of outbound traffic over the internal network. bit/s
DiskReadBPS The number of bytes that are read from the system disk each second. Byte/s
DiskWriteBPS The number of bytes that are written to the system disk each second. Byte/s
DiskReadIOPS The number of read operations that are performed on the system disks each second. Read IOPS
DiskWriteIOPS The number of write operations that are performed on the system disks each second. Write IOPS
InternetInRate_IP The inbound public bandwidth. bit/s
InternetOutRate_IP The outbound public bandwidth. bit/s
InternetOutRatePercent_IP The outbound public bandwidth usage. bit/s
InternetIn(Classic Network) The amount of inbound traffic over the Internet. Byte
InternetOut(Classic Network) The amount of outbound traffic over the Internet. Byte
IntranetInRate The amount of inbound traffic over the internal network. Byte

CloudMonitor

CloudMonitor provides end-to-end and out-of-box monitoring solutions for enterprises in the cloud. CloudMonitor provides the host monitoring service to monitor ECS instances.
  • For more information about the host monitoring service, see Overview.
  • For information about the items and metrics related to the host monitoring service, see Metrics.

To obtain monitoring data of an ECS instance in the CloudMonitor console, perform the following steps.

  1. Log on to the CloudMonitor console.
  2. In the left-side navigation pane, click Host Monitoring.
  3. Find the ECS instance that you want to monitor.
  4. Optional:If the CloudMonitor agent is not installed on the ECS instance, click Install/Upgrade Agent.
  5. To obtain monitoring data, click the Monitoring Chart icon icon in the Actions column.
    Note Monitoring data can be retained for up to 30 days.
  6. To configure alert rules, click Alert Rules in the Actions column.
Configure Alert Rules