Creates or modifies an alert rule.
Debugging
Authorization information
The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action
policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:
- Operation: the value that you can use in the Action element to specify the operation on a resource.
- Access level: the access level of each operation. The levels are read, write, and list.
- Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- The required resource types are displayed in bold characters.
- If the permissions cannot be granted at the resource level,
All Resources
is used in the Resource type column of the operation.
- Condition Key: the condition key that is defined by the cloud service.
- Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
Operation | Access level | Resource type | Condition key | Associated operation |
---|---|---|---|---|
arms:CreateAlertRules | Write |
|
| none |
Request parameters
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
AlertId | long | No | The ID of the alert rule.
| 546xxx |
AlertName | string | Yes | The name of the alert rule. | Alert Rule Demo |
RegionId | string | Yes | The region ID. | cn-hangzhou |
AlertType | string | Yes | The type of the alert rule. Valid values:
Valid values:
| APPLICATION_MONITORING_ALERT_RULE |
AlertStatus | string | No | The status of the alert rule. Valid values:
| RUNNING |
NotifyStrategy | string | No | The notification policy.
| 569xxx |
Pids | string | No | The process ID (PID) that is associated with the Application Monitoring or Browser Monitoring alert rule. | ["b590lhguqs@40d8deedfa9******"] |
AutoAddNewApplication | boolean | No | Specifies whether to apply the alert rule to new applications that are created in Application Monitoring or Browser Monitoring. Valid values:
| false |
MetricsType | string | No | The metric type of the Application Monitoring or Browser Monitoring alert rule. For more information, see the following table. | jvm |
Filters | string | No | The filter conditions of the Application Monitoring or Browser Monitoring alert rule. The following code shows the format of matching rules:
Valid values of FilterOpt:
| {"DimFilters": [ { "FilterOpt": "ALL", "FilterValues": [], "FilterKey": "rootIp" } ] } |
AlertRuleContent | string | No | The content of the Application Monitoring or Browser Monitoring alert rule. The following code provides an example of the AlertRuleContent parameter. For more information about the meaning of each field, see the supplementary description.
Note
The conditional fields vary depending on the values of the MetricsType and AlertRuleItems.MetricKey parameters. For more information about the types of metrics supported by Application Monitoring and Browser Monitoring and the alert rule fields corresponding to each metric, see the supplementary description.
| { "Condition": "OR", "AlertRuleItems": [ { "Operator": "CURRENT_LTE", "MetricKey": "appstat.jvm.threadcount", "Value": 1000, "Aggregate": "AVG", "N": 1 } ] } |
AlertCheckType | string | No | The alert check type of the Prometheus alert rule. Valid values:
| STATIC |
ClusterId | string | No | The ID of the monitored cluster. | ceba9b9ea5b924dd0b6726d2de6****** |
AlertGroup | long | No | The alert contact group ID of the Prometheus alert rule. Valid values:
| -1 |
PromQL | string | No | The PromQL statement of the Prometheus alert rule. | node_memory_MemAvailable_bytes{} / node_memory_MemTotal_bytes{} * 100 |
Duration | long | No | The duration of the Prometheus alert rule. Unit: minutes. | 1 |
Level | string | No | The severity level of the Prometheus alert rule.
| P2 |
Message | string | No | The alert message of the Prometheus alert rule. | Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} Memory usage exceeds 80%. Current value: {{ printf \\\\\"%.2f\\\\\" $value }}% |
Labels | string | No | The tags of the Prometheus alert rule. | [ { "Value": "cms_polardb", "Name": "_aliyun_cloud_product" } ] |
Annotations | string | No | The annotations of the Prometheus alert rule. | \[ { "Value": "PolarDB slow queries", "Name": "\_aliyun_display_name" } |
MetricsKey | string | No | The alert metrics. If you set the AlertCheckType parameter to STATIC when you create a Prometheus alert rule, you must specify the MetricsKey parameter. Note
Alert metrics vary depending on the value of the AlertGroup parameter. For more information about the correspondence between AlertGroup and MetricsKey, see the supplementary description.
| pop.status.error |
Tags | object [] | No | The list of tags. | |
Key | string | No | The tag key. | owner |
Value | string | No | The tag value. | John |
MarkTags | object [] | No | Application Tags. Used for application monitoring alert rules, to filter applications associated with alert rules. | |
Key | string | No | The Tag Key. | service |
Value | string | No | The Tag Value. | proudct |
DataConfig | string | No | Data Configuration. The dataRevision field specifies the data repair method when there is no data for the metric.
| { "dataRevision": 2 } |
Notice | string | No | Effective Time and Notification Time. Used for compatibility with legacy rules. | |
AlertPiplines | string | No | Alarm Notification Channel Configuration. Used for compatibility with legacy rules. | |
NotifyMode | string | No | Notification Mode. Normal mode or Simplified mode. Enumeration Value:
| NORMAL_MODE |
Description of the AlertRuleContent parameter
{
"Condition": "OR",
"AlertRuleItems": [
{ "Operator": "CURRENT_LTE",
"MetricKey": "appstat.jvm.threadcount",
"Value": 1000,
"Aggregate": "AVG",
"N": 1
}
]
}
- Condition: the relationship between multiple alert conditions.
- OR: meets any of the specified conditions.
- AND: meets all the specified conditions.
- Operator: the comparison operator that is used to compare the metric value with the threshold.
- CURRENT_GTE: greater than or equal to
- CURRENT_LTE: less than or equal to
- PREVIOUS_UP: the increase percentage compared with the last period
- PREVIOUS_DOWN: the decrease percentage compared with the last period
- HOH_UP: the increase percentage compared with the last hour
- HOH_DOWN: the decrease percentage compared with the last hour
- DOD_UP: the increase percentage compared with the last day
- DOD_DOWN: the decrease percentage compared with the last day
- MetricKey: the metric of the alert condition. Valid values of the MetricKey parameter vary depending on the value of the MetricsType parameter. For more information about the correspondence between the two parameters, see the following tables.
- Value: the threshold of the alert condition.
- Aggregate: the aggregation method of the alert condition.
- AVG: calculates the average value
- SUM: calculates the total value
- MAX: selects the maximum value
- MIN: selects the minimum value
- CONTINUOUS: selects the continuous value
- AVG_WEIGHTED: calculates the weighted value of the average error rate
- N: last N minutes.
Correspondence between MetricsType and AlertRuleContent.AlertRuleItems.MetricKey for Application Monitoring
MetricsType | Metric type | AlertRuleContent.AlertRuleItems.MetricKey |
---|---|---|
jvm | JVM monitoring | - appstat.jvm.gc.oldgccountinstant: the number of full heap garbage collections (Full GCs) in JVM- appstat.jvm.gc.oldgctimeinstant: the amount of time that is consumed by Full GCs in JVM- appstat.jvm.gc.younggccountinstant: the number of GCs in the young generation- appstat.jvm.gc.younggctimeinstant: the amount of time that is consumed by GCs in the young generation- appstat.jvm.heap_total: the total memory space in the JVM heap- appstat.jvm.heap_used: the used space of the JVM heap memory- appstat.jvm.non_heap_committed: the submitted space of the non-heap JVM memory- appstat.jvm.non_heap_init: the initial space of the non-heap JVM memory- appstat.jvm.non_heap_max: the maximum space of the non-heap JVM memory- appstat.jvm.non_heap_used: the used space of the non-heap JVM memory- appstat.jvm.threadblockedcount: the number of blocked JVM threads- appstat.jvm.threadcount: the total number of JVM threads- appstat.jvm.threaddeadlockcount: the number of deadlocked JVM threads- appstat.jvm.threadnewcount: the number of new JVM threads- appstat.jvm.threadrunnablecount: the number of runnable JVM threads- appstat.jvm.threadterminatedcount: the number of terminated JVM threads- appstat.jvm.threadtimedwaitcount: the number of timed-out JVM threads- appstat.jvm.threadwaitcount: the number of waiting JVM threads |
saehost | SAE host monitoring | - appstat.infra.sae.systemcpu: the CPU utilization- appstat.infra.sae.systemdiskiopsread: the disk IOPS read- appstat.infra.sae.systemdiskiopswrite: the disk IOPS write- appstat.infra.sae.systemdiskrate: the disk usage- appstat.infra.sae.systemdiskread: the read I/O throughput of the disk- appstat.infra.sae.systemdisktotal: the total number of disks- appstat.infra.sae.systemdiskused: the number of disks in use- appstat.infra.sae.systemdiskwrite: the write I/O throughput of the disk- appstat.infra.sae.systemload: the system load- appstat.infra.sae.systemmemrate: the memory usage- appstat.infra.sae.systemmemtotal: the total memory- appstat.infra.sae.systemmemused: the used memory- appstat.infra.sae.systemnetrecv: the received bytes- appstat.infra.sae.systemnetrecvdrop: the packet loss of received data- appstat.infra.sae.systemnetrecverror: the received error packets- appstat.infra.sae.systemnetrecvpacket: the received packets- appstat.infra.sae.systemnettran: the sent bytes- appstat.infra.sae.systemnettrandrop: the packet loss of sent data- appstat.infra.sae.systemnettranerror: the sent error packets- appstat.infra.sae.systemnettranpacket: the sent packets |
txn_db | SQL metrics | - appstat.sql.count: the number of database calls- appstat.sql.error: the number of database call errors- appstat.sql.rt: the response time of database calls |
db | Database metrics | - appstat.database.count: the number of database calls- appstat.database.errcount: the number of database call errors- appstat.database.rt: the response time of database calls |
threadpool | Thread pool monitoring | - appstat.threadpool.threadcorepoolsize: the number of core threads- appstat.threadpool.threadmaxpoolsize: the maximum number of threads- appstat.threadpool.threadpoolactivecount: the number of active threads- appstat.threadpool.threadpoolqueuesize: the queue size- appstat.threadpool.threadpoolsize: the current number of threads- appstat.threadpool.threadpooltaskcount: the number of executed tasks- appstat.threadpool.threadpoolusedpercent: the thread pool usage |
exception | Abnormal API calls | - appstat.exception.count: the number of abnormal API calls for the application- appstat.exception.rt: the response time of abnormal API calls for the application |
txn_type | Application dependent services | -appstat.outcall.count: the number of application-dependent service calls- appstat.outcall.errorrate: the error rate of application-dependent service calls- appstat.outcall.rt: the response time of application-dependent service calls |
txn | Application-provided services | - appstat.transaction.count: the number of API calls- appstat.transaction.error: the number of API call errors- appstat.transaction.errorrate: the error rate of API calls- appstat.transaction.rt: the response time of API calls |
host | Host monitoring | - appstat.jvm.systemcpuusage: the CPU utilization of the host- appstat.jvm.systemcpuuser: the CPU occupancy rate of the host in user mode- appstat.jvm.systemdiskfree: the idle disk space of the host- appstat.jvm.systemdiskusage: the disk usage of the host- appstat.jvm.systemload: the system load of the host- appstat.jvm.systemmemfree: the idle memory space of the host- appstat.jvm.systemmemusage: the memory usage of the host- appstat.jvm.systemnetinerrs: the number of error packets that is received by the host- appstat.jvm.systemnetouterrs: the number of error packets that is sent by the host |
scheduler | Scheduled tasks | -appstat.scheduler.rt: uptime- appstat.scheduler.count: the number of times that the task runs- appstat.scheduler.error: the number running errors- appstat.scheduler.delay: the time delay of scheduling |
Correspondence between MetricsType and AlertRuleContent.AlertRuleItems.MetricKey for Browser Monitoring
MetricsType | Metric type | AlertRuleContent.AlertRuleItems.MetricKey |
---|---|---|
api | API metrics | - webstat.api.detail.count: the number of API requests- webstat.api.detail.fail_time: the amount of time consumed by failed API requests- webstat.api.detail.fail_uv: the number of users affected by failed API requests- webstat.api.detail.success_rate: the success rate of API requests- webstat.api.detail.success_time: the amount of time consumed by successul API requests |
page.api | Page API metrics | - webstat.api.detail.page_api.count: the number of API requests- webstat.api.detail.page_api.fail_time: the amount of time consumed by failed API requests- webstat.api.detail.page_api.success_rate: the success rate of API requests- webstat.api.detail.page_api.success_time: the amount of time consumed by successul API requests |
page | Page metrics | - webstat.api.detail.page_api.fail_uv: the number of users affected by failed API requests- webstat.index.pv: the number of page views (PVs)- webstat.jserror.count: the number of JS errors- webstat.jserror.rate: the JS error rate- webstat.msg.top.error_uv: the number of users affected by JS errors- webstat.resource.sum: the number of resource errors- webstat.satisfy.satisfy: the page satisfaction- webstat.speed.avg_cfpt: the custom first paint time (FPT) of the page- webstat.speed.avg_ctti: the custom time to interact (TTI) of the page- webstat.speed.avg_dns: the DNS query time of the page- webstat.speed.avg_dom: the DOM parsing time of the page- webstat.speed.avg_fmp: the first meaningful paint (FMP) of the page- webstat.speed.avg_fpt: the FPT of the page- webstat.speed.avg_load: the amount of time consumed to completely load the page- webstat.speed.avg_ready: the ready time- webstat.speed.avg_res: the amount of time consumed to load resources- webstat.speed.avg_ssl: the amount of time consumed to establish a Secure Sockets Layer (SSL) connection- webstat.speed.avg_t1: the custom t1 time of the page- webstat.speed.avg_t10: the custom t10 time of the page- webstat.speed.avg_t2: the custom t2 time of the page- webstat.speed.avg_t3: the custom t3 time of the page- webstat.speed.avg_t4: the custom t4 time of the page- webstat.speed.avg_t5: the custom t5 time of the page- webstat.speed.avg_t6: the custom t6 time of the page- webstat.speed.avg_t7: the custom t7 time of the page- webstat.speed.avg_t8: the custom t8 time of the page- webstat.speed.avg_t9: the custom t9 time of the page- webstat.speed.avg_tcp: the amount of time consumed to establish a Transmission Control Protocol (TCP) connection- webstat.speed.avg_trans: the amount of time consumed to transfer the content of the page- webstat.speed.avg_ttfb: the response time of network requests- webstat.speed.avg_tti: the TTI of the page |
custom | Custom metrics | - webstat.avg.avg_val: the average value of reported data- webstat.sum.sum_val: the sum of reported data |
Correspondence between AlertGroup and MetricsKey for Prometheus Service
AlertGroup | Alert contact group | MetricsKey |
---|---|---|
1 | Kubernetes load | - prom.workload.container_cpu_usage: the CPU utilization of the container- prom.workload.job_execute_error: job failure- prom.workload.pod_cpu_usage: the disk usage of the pod- prom.workload.pod_start_timeout: startup timeout failure of the pod- prom.workload.pod_restart_frequent: frequent restart of the pod- prom.workload.pod_status_error: abnormal pod status- prom.workload.container_memory_usage: the memory usage of the container- prom.workload.deployment_pod_survival: the availability rate of the Deployment pod |
15 | Kubernetes node | - prom.node.node_memory_usage: the memory usage of the node- prom.node.node_cpu_usage: the CPU utilization of the node- prom.node.node_disk_usage: the disk usage of the node- prom.node.node_status_error: abnormal node status |
Response parameters
Examples
Sample success responses
JSON
format
{
"RequestId": "337B8F7E-0A64-5768-9225-E9B3CF******",
"AlertRule": {
"AlertId": 5510445,
"AlertName": "arms-test",
"UserId": "1131971649******",
"RegionId": "cn-hangzhou",
"AlertType": "APPLICATION_MONITORING_ALERT_RULE",
"AlertStatus": "RUNNING",
"CreatedTime": 1641438611000,
"UpdatedTime": 1641438611000,
"Extend": "{\\\\\"alarmContext\\\\\":\\\\\"{\\\\\\\\\\\"content\\\\\\\\\\\":\\\\\\\\Alert name: $Alert name\\\\\\\\\\\\nFilter condition: $Filter condition\\\\\\\\\\\\nAlert time: $Alert time\\\\\\\\\\\\nAlert content: $Alert content\\\\\\\\\\\\nNote: The alert persists before you receive an email that reminds you to clear the alert. You will be reminded of the alert again 24 hours later. \\\\\\\\\\\",\\\\\\\\\\\"subTitle\\\\\\\\\\\":\\\\\\\\\\\"\\\\\\\\\\\"}\\\\\",\\\\\"alertWays\\\\\":\\\\\"\\[0,1]\\\\\",\\\\\"contactGroupIds\\\\\":\\\\\"381,5075\\\\\",\\\\\"notice\\\\\":\\\\\"{\\\\\\\\\\\"endTime\\\\\\\\\\\":1480607940000,\\\\\\\\\\\"noticeEndTime\\\\\\\\\\\":1480607940000,\\\\\\\\\\\"noticeStartTime\\\\\\\\\\\":1480521600000,\\\\\\\\\\\"startTime\\\\\\\\\\\":1480521600000}\\\\\"}\n",
"NotifyStrategy": "ALERT_MANAGER",
"Pids": [
"b590lhguqs@40d8deedfa9******"
],
"AutoAddNewApplication": false,
"MetricsType": "JVM",
"AlertRuleContent": {
"Condition": "\"|\"",
"AlertRuleItems": [
{
"N": 1,
"MetricKey": "appstat.jvm.non_heap_used\n",
"Aggregate": "AVG",
"Operator": "CURRENT_GTE",
"Value": "1"
}
]
},
"Filters": {
"DimFilters": [
{
"FilterKey": "rootIp",
"FilterOpt": "ALL",
"FilterValues": [
"[]"
]
}
],
"CustomSLSFilters": [
{
"Key": "username",
"Opt": "=",
"Value": "test",
"T": "null",
"Show": false
}
],
"CustomSLSGroupByDimensions": [
"[\"page\"]"
],
"CustomSLSWheres": [
"[\"t like '%api%'\"]"
]
},
"AlertCheckType": "STATIC",
"ClusterId": "ceba9b9ea5b924dd0b6726d2de6******",
"AlertGroup": -1,
"PromQL": "node_memory_MemAvailable_bytes{} / node_memory_MemTotal_bytes{} * 100",
"Duration": "1",
"Level": "P2",
"Message": "Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} Memory usage exceeds 80%. Current value: {{ printf \\\\\\\\\\\"%.2f\\\\\\\\\\\" $value }}%\n",
"Labels": [
{
"Name": "123",
"Value": "abc"
}
],
"Annotations": [
{
"Name": "123",
"Value": "abc"
}
],
"Tags": [
{
"Key": "owner",
"Value": "John"
}
],
"NotifyMode": "NORMAL_MODE"
}
}
Error codes
For a list of error codes, visit the Service error codes.
Change history
Change time | Summary of changes | Operation | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2024-05-07 | The request parameters of the API has changed | see changesets | ||||||||
| ||||||||||
2024-04-29 | The request parameters of the API has changed | see changesets | ||||||||
| ||||||||||
2023-12-18 | The internal configuration of the API is changed, but the call is not affected | see changesets | ||||||||
| ||||||||||
2023-10-20 | The internal configuration of the API is changed, but the call is not affected | see changesets | ||||||||
| ||||||||||
2023-10-17 | The response structure of the API has changed | see changesets | ||||||||
| ||||||||||
2023-10-16 | The request parameters of the API has changed | see changesets | ||||||||
| ||||||||||
2023-10-11 | The internal configuration of the API is changed, but the call is not affected | see changesets | ||||||||
| ||||||||||
2023-09-14 | The request parameters of the API has changed | see changesets | ||||||||
| ||||||||||
2023-08-24 | The request parameters of the API has changed | see changesets | ||||||||
| ||||||||||
2023-05-11 | The request parameters of the API has changed. The response structure of the API has changed | see changesets | ||||||||
|