All Products
Search
Document Center

ACS__AlarmTrigger

Last Updated: Jun 11, 2020

Features

The ACS::AlarmTrigger action can be used to perform O&M operations upon alerts. After a template containing this action is created and executed, the execution is in the Waiting status. When the threshold of a metric specified in the action is reached, the execution status changes to Running and the subsequent tasks defined in the template are run immediately. Generally, subsequent tasks are run to automatically clear the alert. For example, Operation Orchestration Service (OOS) automatically restarts an Elastic Compute Service (ECS) instance upon receiving an alert indicating that the CPU usage of the instance exceeds 90%.

Note: The ACS::AlarmTrigger action supports built-in ECS metrics and metrics whose data is collected by CloudMonitor agents. For more information about these two types of metrics, see Metrics. Before monitoring metrics whose data is collected by CloudMonitor agents, you need to install the required agents first. Otherwise, the alert cannot be triggered. To install a CloudMonitor agent, go to the Host Monitoring page in the CloudMonitor console, select an instance to be monitored, and click Click to install.

Restrictions

The trigger actions have the following restrictions:

  • Only one trigger action is allowed for one template.
  • The task where a trigger action resides must be defined as the first task in the Tasks parameter of the template.
  • Trigger actions are not allowed for embedded templates.

Syntax

  • YAML format
  1. Tasks:
  2. - Name: taskName1 # The name of the task.
  3. Action: 'ACS::AlarmTrigger'
  4. Properties:
  5. Namespace: 'acs_ecs_dashboard' # Required. The namespace of the cloud service, such as the namespace of ECS. You can obtain the optional values by calling the DescribeMetricMetaList operation or visit https://www.alibabacloud.com/help/doc-detail/28619.html.
  6. MetricName: 'cpu_total' # Required. The name of the metric, such as cpu_total, indicating the current CPU usage. You can obtain the optional values by calling the DescribeMetricMetaList operation or visit https://www.alibabacloud.com/help/doc-detail/28619.html.
  7. Statistics: 'Average' # The method used to process monitored data. For example, the value of Average indicates calculating the average value of the specified metric in a certain time period. You can obtain the optional values by calling the DescribeMetricMetaList operation or visit https://www.alibabacloud.com/help/doc-detail/28619.html.
  8. ComparisonOperator: 'GreaterThanThreshold' # Required. The comparison operator used to compare the value of the metric with the threshold. Optional values include GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanOrEqualToThreshold, LessThanThreshold, NotEqualToThreshold, GreaterThanYesterday, LessThanYesterday, GreaterThanLastWeek, LessThanLastWeek, GreaterThanLastPeriod, and LessThanLastPeriod.
  9. Threshold: '90' # The threshold for triggering an alert. In this example where MetricName is set to cpu_total, the value of 90 indicates that an alert is triggered when 90% of the CPU resources are occupied.
  10. Resources: '[{"resource":"_ALL"}]' # Required. The resource to be monitored for alerting. For example, [{"resource":"_ALL"}] indicates all resources under the current account, [{"instanceId":"i-bp123467zxcvb"}] indicates a specific instance, [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}] indicates a specific disk partition of an instance, and [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] indicates multiple disk partitions of an instance.
  11. Times: 1 # The maximum number of notifications for an alert.
  12. Interval: 60 # The interval at which the alerting rules are applied. Unit: seconds. Default value: 60. It is the highest frequency at which the metric is polled.
  13. Silencetime: 3600 # The mute duration. Unit: seconds. Default value: 86400 (one day). Minimum value: 3600 (one hour). Only one notification is sent during each mute duration even if the metric value exceeds the threshold for several consecutive times.
  • JSON format (For more information, see the parameter description for the YAML format.)
  1. {
  2. "Tasks": [
  3. {
  4. "Name": "taskName1",
  5. "Action": "ACS::AlarmTrigger",
  6. "Properties": {
  7. "Namespace": "acs_ecs_dashboard",
  8. "MetricName": "cpu_total",
  9. "Statistics": "Average",
  10. "ComparisonOperator": "GreaterThanThreshold",
  11. "Threshold": "90",
  12. "Resources": "[{\"resource\":\"_ALL\"}]",
  13. "Times": 1,
  14. "Interval": 60
  15. }
  16. }
  17. ]
  18. }

Example

The following template is used to automatically restart an ECS instance after the CPU usage of the instance exceeds the threshold in 1 minute:

  • YAML format
  1. ---
  2. FormatVersion: OOS-2019-06-01
  3. Description: Reboot ECS instance when CPU utilization exceeded the threshold.
  4. Parameters:
  5. instanceId:
  6. Type: String
  7. Description: The ECS instance ID to be monitored.
  8. threshold:
  9. Type: String
  10. Description: The CPU utilization threshold, for example, 80.
  11. Tasks:
  12. - Name: cpuAcrossThresholdAlarmTrigger
  13. Action: 'ACS::AlarmTrigger'
  14. Properties:
  15. Namespace: 'acs_ecs_dashboard'
  16. MetricName: 'cpu_total'
  17. Statistics: 'Average'
  18. ComparisonOperator: 'GreaterThanThreshold'
  19. Threshold: '{{threshold}}'
  20. Resources: '[{"instanceId":"{{ instanceId }}"}]'
  21. Times: 1
  22. SilenceTime: 3600
  23. - Name: RebootInstance
  24. Action: 'ACS::ECS::RebootInstance'
  25. Properties:
  26. instanceId: '{{ instanceId }}'
  • JSON format
  1. {
  2. "FormatVersion": "OOS-2019-06-01",
  3. "Description": "Reboot ECS instance when CPU utilization exceeded the threshold.",
  4. "Parameters": {
  5. "instanceId": {
  6. "Type": "String",
  7. "Description": "The ECS instance ID to be monitored."
  8. },
  9. "threshold": {
  10. "Type": "String",
  11. "Description": "The CPU utilization threshold, for example, 80."
  12. }
  13. },
  14. "Tasks": [
  15. {
  16. "Name": "cpuAcrossThresholdAlarmTrigger",
  17. "Action": "ACS::AlarmTrigger",
  18. "Properties": {
  19. "Namespace": "acs_ecs_dashboard",
  20. "MetricName": "cpu_total",
  21. "Statistics": "Average",
  22. "ComparisonOperator": "GreaterThanThreshold",
  23. "Threshold": "{{threshold}}",
  24. "Resources": "[{\"instanceId\":\"{{ instanceId }}\"}]",
  25. "Times": 1,
  26. "SilenceTime": 3600
  27. }
  28. },
  29. {
  30. "Name": "RebootInstance",
  31. "Action": "ACS::ECS::RebootInstance",
  32. "Properties": {
  33. "instanceId": "{{ instanceId }}"
  34. }
  35. }
  36. ]
  37. }