The group details page contains the fault list, alarm history, alarm rules, group resources, events, and group resource metric data. You can use this page to monitor the preceding details of your application groups.

Group list

All application groups on CloudMonitor, along with the resources and health status of each group, are displayed on the group details page.

Parameters

  • Group name (or ID): The name or identification number of an application group.
  • Health status: The alarm status of any group resource. An application group is healthy when no active alarms are triggered for any of the resources in the group, but unhealthy whenever any metric threshold of a resource in the group is met and an alarm is triggered.
  • Instance count: The total number of instances in an application group, both ECS and non-ECS instances.
  • Resource types: The number of resource types in an application group. For example, if an application group contains ECS, ApsaraDB for RDS, and Server Load Balancer instances, then this number is three.
  • Unhealthy instances: The total number of instances with active alarms in an application group. For example, if two ECS instances and one ApsaraDB for RDS instance have active alarms, the number of unhealthy instances is three.
  • Creation time: The time when an application group is created.
  • Actions: The actions that can be applied to an application group. Action types supported are manage, stop notifications, enable and disable all the alarm rules, and delete group.

Exception list

The resources with active alarms in your group are displayed in the fault list to help you to easily view unhealthy instances and quickly troubleshoot the causes.

Note
  • When multiple metrics of a resource have active alarms at the same time, the fault list displays the resource multiple times. Each row of the list shows a metric with an active alarm.
  • Once you disable an alarm rule with an active alarm, the resources and metrics associated with the rule no longer appearing on the fault list.

Parameters

  • Faulty resource: A resource with an active alarm.
  • Start time: The time when the first alarm is generated for the resource.
  • Status: Indicates whether a resource has an active alarm.
  • Duration: The period of time when a faulty resource is in an alarm state.
  • Alarm rule name: The name of the alarm rule applied to a faulty resource.
  • Actions: The actions that can be applied to a faulty resource. You can click Expand to view the metric trends of a faulty resource with an active alarm over the past six hours, and compare the metric data with the alarm threshold value.

Alarm history

Alarm history provides the account of all the alarm rules applied to a group.

Note You can request the alarm history of the last three days. If the interval between the query start time and end time exceeds three days, the system prompts you to re-select the time range.

Parameters

  • Faulty resource: A resource with an active alarm.
  • Duration: The time during which a faulty resource is in an alarm state.
  • Occurrence time: The time when the alarm is generated.
  • Alarm rule name: The name of the alarm rule applied to a faulty resource.
  • Notification method: The method by which alarm notifications are sent, which are SMS, email, and TradeManager.
  • Product type: The product type to which a faulty resource belongs.
  • Status: The status of the alarm rule, which are alarm status, cleared status, and muted states.
  • Notification target: The group of contacts who receive alarm notifications.

Alarm rules

A list of all the alarm rules applied to a group is displayed in an alarm rules list. You can select the preferred alarm rule from the list and can enable, disable, or modify the rules based on your requirements.

Note The alarm rules list only shows the alarm rules applied to a specific application group. It does not show the alarm rules with Resource Range set to the All Resources or Instance.

Parameters

  • Alarm name: Name of an alarm rule specified when the alarm rule was created.
  • Status: Displays whether the resources associated with the alarm rules have active alarms.
    • Normal state: All resources associated with the alarm rules are normal.
    • Alarm state: At least one instance associated with the alarm rule has an active alarm.
    • Insufficient data: At least one instance associated with the alarm rule has insufficient data and no instance has an active alarm.
  • Enable: Shows whether the alarm rule is enabled.
  • Product name: The name of the product to which group resources belong.
  • Alarm description: A brief description of alarm rules setting.
  • Actions: The optional operations include Modify, Enable, Disable, Delete, and Alarm History.
    • Modify: Click to make changes in the alarm rule.
    • Disable: Click to disable the alarm rule. Once the alarm rule is disabled, the alarm service does not check whether metric data exceeds the threshold value.
    • Enable: Click to enable the alarm rule. Once you enable a previously disabled alarm rule, the alarm service checks the metric data and determines whether to trigger an alarm based on the alarm rule.
    • Delete: Click to delete the alarm rule.
    • Alarm History: Click to view the alarm history of the alarm rule.

Group resources

Display all the resources of a group and the health condition of these resource.

Parameters

  • Instance name (or ID): The instance name or ID of a resource.
  • Health status: The alarm status of any group resource. An application group is healthy when no alarms are triggered for any of the resources in the group, but unhealthy whenever an alarm is triggered for any resource in the group.

Events

Alarm history and records for alarm rule operation events, such as add, modify, and delete actions, are supported, allowing you to trace any operation performed on a specific alarm rule.

Note You can query event information from the last 90 days.

Parameters

  • Occurrence time: The time when an event occurred.
  • Event name: The name of an event, which may be an alarm event such as alarm generated or alarm cleared, or an system event such as create alarm rule, modify alarm rule, or delete alarm rule.
  • Event type: The type of event, which can be divided into system events and alarm events. Types of system events include create alarm rule, delete alarm rule, and modify alarm rule. Types of alarm events include alarm generated and alarm cleared.
  • Event details: Detailed information associated with an event.

Charts

The lower area of the application group details page displays the monitoring details of group resources. By default, CloudMonitor initializes frequently used metric data. You can choose to customize the area, changing the chart type and metric data displayed.

Note To obtain the OS metrics of ECS, you must install the CloudMonitor agent.

Initialized metric data

By default, CloudMonitor initiates the following application group data, which are all displayed in line charts. If you want to view more metric data, click Add Metric Chart to add more metrics to the data.

Product Metrics Chart type Description
ECS CPU usage and outbound bandwidth (Internet) Line chart Displays the aggregate data of all servers in the group.
ApsaraDB for RDS  CPU usage, disk usage, IOPS usage, connection usage Line chart Displays the data of a single database instance.
Server Load Balancer Outbound bandwidth and inbound bandwidth Line chart Displays the data of a single Server Load Balancer instance.
OSS Storage size and GET/PUT request count Line chart Displays the data of a single bucket.
CDN Downstream bandwidth and hit rate Line chart Displays the data of a single domain name.
EIP  outbound bandwidth (Internet) Line chart Displays the data of a single instance.
ApsaraDB for Redis Memory usage, connection usage, and QPS usage Line chart Displays the data of a single instance.
ApsaraDB for MongoDB CPU usage, memory usage, IOPS usage, and connection usage Line chart Displays the data of a single instance.