ACS::AlarmTrigger - CloudOps Orchestration Service - Alibaba Cloud Documentation Center

Description

You can use the ACS::AlarmTrigger action to perform operations and maintenance (O&M) tasks upon alerts. After you create an execution based on a template that contains the ACS::AlarmTrigger action, the execution is in the Waiting state. When the threshold of a metric specified in the action is reached, the execution status changes to Running and subsequent tasks defined in the template are immediately run. Generally, subsequent tasks are run to automatically clear the alert. For example, CloudOps Orchestration Service (OOS) automatically restarts an Elastic Compute Service (ECS) instance when OOS receives an alert indicating that the CPU utilization of the ECS instance exceeds 90%.

Important

The ACS::AlarmTrigger action supports metrics that the Cloud Monitor agent monitors and built-in ECS metrics. For more information about the two types of metrics, see Metrics. To use the Cloud Monitor agent to monitor metrics of ECS instances, you must install the Cloud Monitor agent on the ECS instances. Otherwise, alerts cannot be triggered for the ECS instances. To install the Cloud Monitor agent on an ECS instance, go to the Host Monitoring page in the Cloud Monitor console, select the ECS instance, and then click Click to Install.

Limits

The ACS::AlarmTrigger action has the following limits:

You can set only one ACS::AlarmTrigger action in a template.
The task where the ACS::AlarmTrigger action resides must be defined as the first task in the Tasks parameter of the template.
You are not allowed to set the ACS::AlarmTrigger action in child templates.

Syntax

YAML format

Tasks:
- Name: taskName1 # The name of the task.
  Action: 'ACS::AlarmTrigger'
  Properties:
      Namespace: 'acs_ecs_dashboard'  # Required. The namespace of the Alibaba Cloud service to monitor, such as ECS. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
      MetricName: 'cpu_total'  # Required. The name of the metric. For example, a value of cpu_total indicates the CPU utilization. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
      Statistics: 'Average' # The method used to process monitored data. For example, a value of Average indicates calculating the average value of the specified metric in a time period. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
      ComparisonOperator:  'GreaterThanThreshold' # Required. The operator used to compare the value of the metric with the threshold. Valid values: GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanOrEqualToThreshold, LessThanThreshold, NotEqualToThreshold, GreaterThanYesterday, LessThanYesterday, GreaterThanLastWeek, LessThanLastWeek, GreaterThanLastPeriod, and LessThanLastPeriod.
      Threshold: '90' # The threshold for triggering an alert. For example, if the metric is cpu_total, a value of 90 indicates that an alert is triggered when the CPU utilization exceeds 90%.
      Resources: '[{"resource":"_ALL"}]'  # Required. The resource to be monitored with alerts. For example, [{"resource":"_ALL"}] indicates all resources under the current account. [{"instanceId":"i-bp123467zxcvb"}] indicates a specific instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}] indicates a specific disk partition of an instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] indicates multiple disk partitions of an instance.
      Times: 1 # The maximum number of notifications for an alert.
      Interval: 60 # The interval at which the alerting rules are applied, in seconds. Default value: 60. It is the highest frequency at which the metric is polled.
      SilenceTime: 3600   # The mute duration, in seconds. Default value: 86400, indicating one day. Minimum value: 3600, indicating one hour. Only one notification is sent during each mute duration even if the metric value exceeds the threshold several consecutive times.
  Outputs:  
   paraName1:
       Type: String
       ValueSelector: .key # The value of the key to be queried in the JSON message body of an alert. Assume that the JSON message body of an alert is { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-13012345678-exec-2130c0c073fa487098d3", "userId": "13012345678", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "13012345678" }. If you set the ValueSelector parameter to .instanceId, the ID of the instance, which is i-abc12345zxcv, is returned.

JSON format

For more information, see the parameter description for the YAML format.

{
  "Tasks": [
    {
      "Name": "taskName1",
      "Action": "ACS::AlarmTrigger",
      "Properties": {
        "Namespace": "acs_ecs_dashboard",
        "MetricName": "cpu_total",
        "Statistics": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": "90",
        "Resources": "[{\"resource\":\"_ALL\"}]",
        "Times": 1,
        "Interval": 60,
        "SilenceTime": 3600
      },
      "Outputs": {
        "paraName1": {
          "Type": "String",
          "ValueSelector": ".key"
        }
      }
    }
  ]
}

Examples

The following template is used to automatically restart an ECS instance when the CPU utilization of the ECS instance exceeds the threshold in 1 minute:

YAML format

FormatVersion: OOS-2019-06-01
Description:
  en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed.
  name-en: ACS-ECS-RebootInstanceAtHighCpuByTags
  categories:
    - alarm-trigger
Parameters:
  tags:
    Type: Json
    Description:
      en: The tags to select ECS instances.
    AssociationProperty: Tags
  threshold:
    Type: Number
    Description:
      en: The CPU utilization threshold.
  silenceTime:
    Type: Number
    Description:
      en: The silence time of alarm (seconds).
    Default: 60
  OOSAssumeRole:
    Description:
      en: The RAM role to be assumed by OOS.
    Type: String
    Default: OOSServiceRole
RamRole: '{{ OOSAssumeRole }}'
Tasks:
  - Name: alarmTrigger
    Action: 'ACS::AlarmTrigger'
    Description:
      en: Set the CPU utilization alarm for ECS instance.
    Properties:
      Namespace: acs_ecs_dashboard
      MetricName: cpu_total
      Statistics: Average
      ComparisonOperator: GreaterThanThreshold
      Threshold: '{{threshold}}'
      Times: 1
      SilenceTime: '{{ silenceTime }}'
      Period: 60
      Interval: 60
    Outputs:
      InstanceId:
        Type: String
        ValueSelector: .instanceId
  - Name: CheckForInstances
    Action: 'ACS::CheckFor'
    Description:
      en: Check ECS instance has specified tag.
    OnError: 'ACS::END'
    Properties:
      Service: ECS
      API: DescribeInstances
      Parameters:
        Tags: '{{ tags }}'
        InstanceIds: '["{{ alarmTrigger.instanceId }}"]'
      PropertySelector: TotalCount
      DesiredValues:
        - 1
  - Name: RebootInstance
    Action: 'ACS::ECS::RebootInstance'
    Description:
      en: Restarts the ECS instances.
    Properties:
      instanceId: '{{ alarmTrigger.instanceId }}'

JSON format

{
  "FormatVersion": "OOS-2019-06-01",
  "Description": {
    "en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed.",
    "name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
    "categories": [
      "alarm-trigger"
    ]
  },
  "Parameters": {
    "tags": {
      "Type": "Json",
      "Description": {
        "en": "The tags to select ECS instances.",
      },
      "AssociationProperty": "Tags"
    },
    "threshold": {
      "Type": "Number",
      "Description": {
        "en": "The CPU utilization threshold.",
      }
    },
    "silenceTime": {
      "Type": "Number",
      "Description": {
        "en": "The silence time of alarm (seconds).",
      },
      "Default": 60
    },
    "OOSAssumeRole": {
      "Description": {
        "en": "The RAM role to be assumed by OOS.",
      },
      "Type": "String",
      "Default": "OOSServiceRole"
    }
  },
  "RamRole": "{{ OOSAssumeRole }}",
  "Tasks": [
    {
      "Name": "alarmTrigger",
      "Action": "ACS::AlarmTrigger",
      "Description": {
        "en": "Set the CPU utilization alarm for ECS instance.",
      },
      "Properties": {
        "Namespace": "acs_ecs_dashboard",
        "MetricName": "cpu_total",
        "Statistics": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": "{{threshold}}",
        "Times": 1,
        "SilenceTime": "{{ silenceTime }}",
        "Period": 60,
        "Interval": 60
      },
      "Outputs": {
        "InstanceId": {
          "Type": "String",
          "ValueSelector": ".instanceId"
        }
      }
    },
    {
      "Name": "CheckForInstances",
      "Action": "ACS::CheckFor",
      "Description": {
        "en": "Check ECS instance has specified tag.",
      },
      "OnError": "ACS::END",
      "Properties": {
        "Service": "ECS",
        "API": "DescribeInstances",
        "Parameters": {
          "Tags": "{{ tags }}",
          "InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
        },
        "PropertySelector": "TotalCount",
        "DesiredValues": [
          1
        ]
      }
    },
    {
      "Name": "RebootInstance",
      "Action": "ACS::ECS::RebootInstance",
      "Description": {
        "en": "Restarts the ECS instances.",
      },
      "Properties": {
        "instanceId": "{{ alarmTrigger.instanceId }}"
      }
    }
  ]
}