CloudMonitor helps reduce the O&M costs and workloads of cloud services. CloudMonitor provides real-time operation data that you can use to identify risks in advance, troubleshoot issues, and prevent potential loss. When issues occur, CloudMonitor immediately sends you an alert notification to help you restore business in a quick manner.

Prerequisites

Before you use CloudMonitor, make sure that the following requirements are met:
  • The CloudMonitor agent is running on the Elastic Compute Service (ECS) instances that you want to monitor and is able to collect metric data. If the CloudMonitor agent is not installed on the instances, manually install it. For more information, see Install the CloudMonitor agent.
  • Alert contacts and contact groups are added. We recommend that you add at least two contacts to ensure real-time responses to monitoring alerts. For more information about monitoring metrics, see Alert service and Overview.

Background information

The dashboard feature of CloudMonitor provides system-wide visibility into resource utilization and operational health. In this topic, the CPU utilization, memory usage, and disk usage of ECS instances are separately displayed, and the four metrics of ApsaraDB RDS instances are displayed in two groups.

Dashboard metrics
In this topic, a website is used to describe how to configure CloudMonitor. ECS, RDS, Object Storage Service (OSS), and Server Load Balancer (SLB) are used. Architecture

Configure alert thresholds and alert rules

We recommend that you configure suitable alert thresholds for monitoring metrics based on your business requirements. A low threshold may lead to frequent triggering of alerts and affect user experience. A high threshold may leave you with insufficient time to respond to events.

For example, to reserve some processing capacity to ensure the normal operation of the system, you can set the alert threshold for CPU utilization to 70% and set an alert to be triggered when the threshold is exceeded three consecutive times, as shown in the following figure. Configure an alert threshold for CPU utilization
If you want to configure alert rules for other metrics, click Add Alert Rule. For example, you can perform the following operations:
  • Configure alert rules for RDS instances

    We recommend that you configure alert rules for RDS instances based on your requirements. For example, you can set the alert threshold for the CPU utilization of RDS instances to 70% and set an alert to be triggered when the threshold is exceeded three consecutive times. You can configure alert thresholds for the disk usage, IOPS utilization, and total number of connections based on your requirements. For information about how to view the information about monitoring metrics, see Cloud service monitoring.

    Configure alert rules for RDS instances
  • Configure alert rules for SLB instances
    Before you use CloudMonitor for SLB instances, make sure that health check is enabled for your SLB instances. You can set the alert threshold for the bandwidth value of SLB instances to 7 Mbit/s, as shown in the following figure. Configure alert rules for SLB instances

Configure process monitoring

For web applications, you can configure process monitoring to monitor application processes in real time and use monitoring data to troubleshoot issues. For more information, see Configure process monitoring.

Configure site monitoring

Site monitoring is an external monitoring service for ECS instances and is used to simulate real user access scenarios and test the business availability in real time. The monitoring data can also be used to troubleshoot issues.

Configure site monitoring

If the preceding monitoring metrics do not meet your requirements, you can use the custom monitoring feature. For more information, see Overview.