Purpose
When a template containing the AlarmTrigger is executed, the execution initially will enter Pending status. If the metric specified in the AlarmTrigger reaches the defined threshold, the execution status changes to Active. Then start executing the subsequent tasks defined in the template. These tasks are generally related operations to clear the alarm automatically. For example: When the CPU usage of an ECS instance exceeds 90%, an alarm is triggered and the instance will automatically reboot.
AlarmTrigger supports two types of metrics: collected by installed plugins and provided natively by ECS. For details on how to distinguish between them, see the metric descriptions. To monitor metrics collected by the CloudMonitor plugins, make sure the required plugin is installed on the instance; otherwise, the alarm cannot be triggered. How to install the plugin: In the Cloud Monitor console, go to Host Monitoring, select the instance to be monitored, and click Install.
Limitations
The trigger has the following limits:
A template is only allowed to have one trigger action.
The task of the trigger action must be defined as the first task in the template.
Trigger actions are not allowed in embedded templates (child templates).
Syntax
YAML format
Tasks: - Name: taskName1 # The name of the task. Action: 'ACS::AlarmTrigger' Properties: Namespace: 'acs_ecs_dashboard' # Required. The namespace of the Alibaba Cloud service to monitor, such as ECS. To query the valid values of this parameter, call the DescribeMetricMetaList operation. MetricName: 'cpu_total' # Required. The name of the metric. For example, the total CPU percentage currently consumed. To query the valid values of this parameter, call the DescribeMetricMetaList operation. Statistics: 'Average' # The method used to process monitored data. For example, a value of Average indicates calculating the average value of the specified metric in a time period. To query the valid values of this parameter, call the DescribeMetricMetaList operation. ComparisonOperator: 'GreaterThanThreshold' # Required. The operator used to compare the value of the metric with the threshold. Valid values: GreaterThanOrEqualToThreshold: greater than or equal to, GreaterThanThreshold: greater than, LessThanOrEqualToThreshold: less than or equal to, LessThanThreshold: less than, NotEqualToThreshold: not equal to, GreaterThanYesterday: increase compared with the same time yesterday, LessThanYesterday: decrease compared with the same time yesterday, GreaterThanLastWeek: increase compared with the same time last week, LessThanLastWeek: decrease compared with the same time last week, GreaterThanLastPeriod: increase compared with the last period, LessThanLastPeriod: decrease compared with the last period. Threshold: '90' # The threshold for triggering an alert. For example, 90% of total CPU utilization. Tags: [{"Key": "k1", "Value": "v1"}] # Optional. Filter resources to be monitored by tags. You can specify either Tags or ResourceGroup, but not both. ResourceGroup: 'rg-xxxx' # Optional. Filter resources to be monitored by resource group. You can specify either Tags or ResourceGroup, but not both. Resources: '[{"resource":"_ALL"}]' # Required. The resources to be monitored with alerts. For example, [{"resource":"_ALL"}] indicates all resources under the current account. [{"instanceId":"i-bp123467zxcvb"}] indicates a specific instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}] indicates a specific disk partition of an instance. [{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] indicates multiple disk partitions of an instance. Times: 1 # The number of times an alert is repeated. Interval: 60 # The interval at which the alerting rules are applied, in seconds. Default value: 60, which is the minimum frequency of the metric. SilenceTime: 3600 # The mute period(Unit: seconds). Default value: 86400, which is equivalent to one day. Minimum value: 3600, which is equivalent to 1 hour. Only one alert notification is sent during each mute period even if the metric value continuously exceeds the threshold set in the alert rule. Outputs: paraName1: Type: String ValueSelector: .key # The value of the key to be queried in the JSON message body of an alert. For example, .instanceId will return "i-abc12345zxcv". The JSON message body format of an alert event is { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-13012345678-exec-2130c0c073fa487098d3", "userId": "13012345678", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "13012345678" }JSON format (refer to the YAML comments)
{ "Tasks": [ { "Name": "taskName1", "Action": "ACS::AlarmTrigger", "Properties": { "Namespace": "acs_ecs_dashboard", "MetricName": "cpu_total", "Statistics": "Average", "ComparisonOperator": "GreaterThanThreshold", "Threshold": "90", "Tags": "[{\"Key\": \"k1\", \"Value\": \"v1\"}]", "ResourceGroup": "rg-xxxx", "Resources": "[{\"resource\":\"_ALL\"}]", "Times": 1, "Interval": 60, "SilenceTime": 3600 }, "Outputs": { "paraName1": { "Type": "String", "ValueSelector": ".key" } } } ] }
Example
In a 1-minute period, if the total CPU usage of the monitored ECS instance exceeds the threshold, the instance will automatically reboot.
YAML format
FormatVersion: OOS-2019-06-01 Description: en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed. name-en: ACS-ECS-RebootInstanceAtHighCpuByTags categories: - alarm-trigger Parameters: tags: Type: Json Description: en: The tags to select ECS instances. AssociationProperty: Tags threshold: Type: Number Description: en: The CPU utilization threshold. silenceTime: Type: Number Description: en: The silence time of alarm (seconds). Default: 60 OOSAssumeRole: Description: en: The RAM role to be assumed by OOS. Type: String Default: OOSServiceRole RamRole: '{{ OOSAssumeRole }}' Tasks: - Name: alarmTrigger Action: 'ACS::AlarmTrigger' Description: en: Set the CPU utilization alarm for ECS instance. Properties: Namespace: acs_ecs_dashboard MetricName: cpu_total Statistics: Average ComparisonOperator: GreaterThanThreshold Threshold: '{{threshold}}' Times: 1 SilenceTime: '{{ silenceTime }}' Period: 60 Interval: 60 Outputs: InstanceId: Type: String ValueSelector: .instanceId - Name: CheckForInstances Action: 'ACS::CheckFor' Description: en: Check ECS instance has specified tag. OnError: 'ACS::END' Properties: Service: ECS API: DescribeInstances Parameters: Tags: '{{ tags }}' InstanceIds: '["{{ alarmTrigger.instanceId }}"]' PropertySelector: TotalCount DesiredValues: - 1 - Name: RebootInstance Action: 'ACS::ECS::RebootInstance' Description: en: Restarts the ECS instances. Properties: instanceId: '{{ alarmTrigger.instanceId }}'
JSON format
{
"FormatVersion": "OOS-2019-06-01",
"Description": {
"en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold. The selected instance must already have the Cloud Monitor agent installed.",
"name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
"categories": [
"alarm-trigger"
]
},
"Parameters": {
"tags": {
"Type": "Json",
"Description": {
"en": "The tags to select ECS instances."
},
"AssociationProperty": "Tags"
},
"threshold": {
"Type": "Number",
"Description": {
"en": "The CPU utilization threshold."
}
},
"silenceTime": {
"Type": "Number",
"Description": {
"en": "The silence time of alarm (seconds)."
},
"Default": 60
},
"OOSAssumeRole": {
"Description": {
"en": "The RAM role to be assumed by OOS."
},
"Type": "String",
"Default": "OOSServiceRole"
}
},
"RamRole": "{{ OOSAssumeRole }}",
"Tasks": [
{
"Name": "alarmTrigger",
"Action": "ACS::AlarmTrigger",
"Description": {
"en": "Set the CPU utilization alarm for ECS instance."
},
"Properties": {
"Namespace": "acs_ecs_dashboard",
"MetricName": "cpu_total",
"Statistics": "Average",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": "{{threshold}}",
"Times": 1,
"SilenceTime": "{{ silenceTime }}",
"Period": 60,
"Interval": 60
},
"Outputs": {
"InstanceId": {
"Type": "String",
"ValueSelector": ".instanceId"
}
}
},
{
"Name": "CheckForInstances",
"Action": "ACS::CheckFor",
"Description": {
"en": "Check ECS instance has specified tag."
},
"OnError": "ACS::END",
"Properties": {
"Service": "ECS",
"API": "DescribeInstances",
"Parameters": {
"Tags": "{{ tags }}",
"InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
},
"PropertySelector": "TotalCount",
"DesiredValues": [
1
]
}
},
{
"Name": "RebootInstance",
"Action": "ACS::ECS::RebootInstance",
"Description": {
"en": "Restarts the ECS instances."
},
"Properties": {
"instanceId": "{{ alarmTrigger.instanceId }}"
}
}
]
}