All Products
Search
Document Center

CloudOps Orchestration Service:ACS::AlarmTrigger

Last Updated:Dec 28, 2023

Description

You can use the ACS::AlarmTrigger action to perform operations and maintenance (O&M) tasks upon alerts. After you create an execution based on a template that contains the ACS::AlarmTrigger action, the execution is in the Waiting state. When the threshold of a metric specified in the action is reached, the execution status changes to Running and subsequent tasks defined in the template are immediately run. Generally, subsequent tasks are run to automatically clear the alert. For example, CloudOps Orchestration Service (OOS) automatically restarts an Elastic Compute Service (ECS) instance when OOS receives an alert indicating that the CPU utilization of the ECS instance exceeds 90%.

Important

The ACS::AlarmTrigger action supports metrics that the Cloud Monitor agent monitors and built-in ECS metrics. For more information about the two types of metrics, see Metrics. To use the Cloud Monitor agent to monitor metrics of ECS instances, you must install the Cloud Monitor agent on the ECS instances. Otherwise, alerts cannot be triggered for the ECS instances. To install the Cloud Monitor agent on an ECS instance, go to the Host Monitoring page in the Cloud Monitor console, select the ECS instance, and then click Click to Install.

Limits

The ACS::AlarmTrigger action has the following limits:

  • You can set only one ACS::AlarmTrigger action in a template.

  • The task where the ACS::AlarmTrigger action resides must be defined as the first task in the Tasks parameter of the template.

  • You are not allowed to set the ACS::AlarmTrigger action in child templates.

Syntax

  • YAML format

    Tasks:
    - Name: taskName1 # The name of the task.
      Action: 'ACS::AlarmTrigger'
      Properties:
          Namespace: 'acs_ecs_dashboard'  # Required. The namespace of the Alibaba Cloud service to monitor, such as ECS. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
          MetricName: 'cpu_total'  # Required. The name of the metric. For example, a value of cpu_total indicates the CPU utilization. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
          Statistics: 'Average' # The method used to process monitored data. For example, a value of Average indicates calculating the average value of the specified metric in a time period. To query the valid values of this parameter, call the DescribeMetricMetaList operation or visit https://help.aliyun.com/document_detail/28619.html.
          ComparisonOperator:  'GreaterThanThreshold' # Required. The operator used to compare the value of the metric with the threshold. Valid values: GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanOrEqualToThreshold, LessThanThreshold, NotEqualToThreshold, GreaterThanYesterday, LessThanYesterday, GreaterThanLastWeek, LessThanLastWeek, GreaterThanLastPeriod, and LessThanLastPeriod.
          Threshold: '90' # The threshold for triggering an alert. For example, if the metric is cpu_total, a value of 90 indicates that an alert is triggered when the CPU utilization exceeds 90%.
          Resources: '[{"resource":"_ALL"}]'  # Required. The resource to be monitored with alerts. For example, [{"resource":"_ALL"}] indicates all resources under the current account. [{"instanceId":"i-bp123467zxcvb"}] indicates a specific instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}] indicates a specific disk partition of an instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] indicates multiple disk partitions of an instance.
          Times: 1 # The maximum number of notifications for an alert.
          Interval: 60 # The interval at which the alerting rules are applied, in seconds. Default value: 60. It is the highest frequency at which the metric is polled.
          SilenceTime: 3600   # The mute duration, in seconds. Default value: 86400, indicating one day. Minimum value: 3600, indicating one hour. Only one notification is sent during each mute duration even if the metric value exceeds the threshold several consecutive times.
      Outputs:  
       paraName1:
           Type: String
           ValueSelector: .key # The value of the key to be queried in the JSON message body of an alert. Assume that the JSON message body of an alert is { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-13012345678-exec-2130c0c073fa487098d3", "userId": "13012345678", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "13012345678" }. If you set the ValueSelector parameter to .instanceId, the ID of the instance, which is i-abc12345zxcv, is returned.
  • JSON format

    For more information, see the parameter description for the YAML format.

    {
      "Tasks": [
        {
          "Name": "taskName1",
          "Action": "ACS::AlarmTrigger",
          "Properties": {
            "Namespace": "acs_ecs_dashboard",
            "MetricName": "cpu_total",
            "Statistics": "Average",
            "ComparisonOperator": "GreaterThanThreshold",
            "Threshold": "90",
            "Resources": "[{\"resource\":\"_ALL\"}]",
            "Times": 1,
            "Interval": 60,
            "SilenceTime": 3600
          },
          "Outputs": {
            "paraName1": {
              "Type": "String",
              "ValueSelector": ".key"
            }
          }
        }
      ]
    }

Examples

The following template is used to automatically restart an ECS instance when the CPU utilization of the ECS instance exceeds the threshold in 1 minute:

  • YAML format

    FormatVersion: OOS-2019-06-01
    Description:
      en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed.
      name-en: ACS-ECS-RebootInstanceAtHighCpuByTags
      categories:
        - alarm-trigger
    Parameters:
      tags:
        Type: Json
        Description:
          en: The tags to select ECS instances.
        AssociationProperty: Tags
      threshold:
        Type: Number
        Description:
          en: The CPU utilization threshold.
      silenceTime:
        Type: Number
        Description:
          en: The silence time of alarm (seconds).
        Default: 60
      OOSAssumeRole:
        Description:
          en: The RAM role to be assumed by OOS.
        Type: String
        Default: OOSServiceRole
    RamRole: '{{ OOSAssumeRole }}'
    Tasks:
      - Name: alarmTrigger
        Action: 'ACS::AlarmTrigger'
        Description:
          en: Set the CPU utilization alarm for ECS instance.
        Properties:
          Namespace: acs_ecs_dashboard
          MetricName: cpu_total
          Statistics: Average
          ComparisonOperator: GreaterThanThreshold
          Threshold: '{{threshold}}'
          Times: 1
          SilenceTime: '{{ silenceTime }}'
          Period: 60
          Interval: 60
        Outputs:
          InstanceId:
            Type: String
            ValueSelector: .instanceId
      - Name: CheckForInstances
        Action: 'ACS::CheckFor'
        Description:
          en: Check ECS instance has specified tag.
        OnError: 'ACS::END'
        Properties:
          Service: ECS
          API: DescribeInstances
          Parameters:
            Tags: '{{ tags }}'
            InstanceIds: '["{{ alarmTrigger.instanceId }}"]'
          PropertySelector: TotalCount
          DesiredValues:
            - 1
      - Name: RebootInstance
        Action: 'ACS::ECS::RebootInstance'
        Description:
          en: Restarts the ECS instances.
        Properties:
          instanceId: '{{ alarmTrigger.instanceId }}'
                                            

  • JSON format

{
  "FormatVersion": "OOS-2019-06-01",
  "Description": {
    "en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed.",
    "name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
    "categories": [
      "alarm-trigger"
    ]
  },
  "Parameters": {
    "tags": {
      "Type": "Json",
      "Description": {
        "en": "The tags to select ECS instances.",
      },
      "AssociationProperty": "Tags"
    },
    "threshold": {
      "Type": "Number",
      "Description": {
        "en": "The CPU utilization threshold.",
      }
    },
    "silenceTime": {
      "Type": "Number",
      "Description": {
        "en": "The silence time of alarm (seconds).",
      },
      "Default": 60
    },
    "OOSAssumeRole": {
      "Description": {
        "en": "The RAM role to be assumed by OOS.",
      },
      "Type": "String",
      "Default": "OOSServiceRole"
    }
  },
  "RamRole": "{{ OOSAssumeRole }}",
  "Tasks": [
    {
      "Name": "alarmTrigger",
      "Action": "ACS::AlarmTrigger",
      "Description": {
        "en": "Set the CPU utilization alarm for ECS instance.",
      },
      "Properties": {
        "Namespace": "acs_ecs_dashboard",
        "MetricName": "cpu_total",
        "Statistics": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": "{{threshold}}",
        "Times": 1,
        "SilenceTime": "{{ silenceTime }}",
        "Period": 60,
        "Interval": 60
      },
      "Outputs": {
        "InstanceId": {
          "Type": "String",
          "ValueSelector": ".instanceId"
        }
      }
    },
    {
      "Name": "CheckForInstances",
      "Action": "ACS::CheckFor",
      "Description": {
        "en": "Check ECS instance has specified tag.",
      },
      "OnError": "ACS::END",
      "Properties": {
        "Service": "ECS",
        "API": "DescribeInstances",
        "Parameters": {
          "Tags": "{{ tags }}",
          "InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
        },
        "PropertySelector": "TotalCount",
        "DesiredValues": [
          1
        ]
      }
    },
    {
      "Name": "RebootInstance",
      "Action": "ACS::ECS::RebootInstance",
      "Description": {
        "en": "Restarts the ECS instances.",
      },
      "Properties": {
        "instanceId": "{{ alarmTrigger.instanceId }}"
      }
    }
  ]
}