ApsaraDB for OceanBase provides you with an alerting feature that can be used for OceanBase clusters (OBClusters), OceanBase Migration Assessment (OMA), OceanBase Migration Service (OMS), and OceanBase Developer Center (ODC). You can use the built-in alerts to meet your basic alerting requirements. This topic helps you understand the built-in alerts.
Alert information
The following table lists the components of each alert.
Component | Description |
---|---|
Description | Describes the meaning of each alert and its trigger conditions. |
Alert rule | Describes the trigger rule of each alert, including the alert item, metric , default threshold, duration, and detection cycle. Trigger rule: The system detects the metric once in each detection cycle. When the value of the metric exceeds the default threshold for the number of cycles specified for the duration, an alert is triggered. |
Impact on the system | Describes the impact that may be caused on the system when the alert is triggered. |
Possible causes | Describes the possible causes of an alert to help you locate and handle the alert. |
Solutions | Shows you how to solve the issues that caused the alert. For more information, see Add an alert rule. |
For more information about how to add an alert rule, see Add an alert rule.
Concepts
Alert target
An alert target is a target that is monitored by the alert task and uniquely identifies an alert. It can be an OceanBase cluster, a server, or a service.
The alert information is shown in the format of "alert rule name (instance: faulty instance name)", for example: disk_log_usage_instance (instance: integration_22-ob2).
Alert scope
The alert scope defines the scope of an alert and is consistent with the metric scope.
The alert scope can be an OBCluster, OMA, OMS, or ODC.
Description
ApsaraDB for OceanBase allows you to set alert rules based on tenant statistics and node statistics. For more information, see View tenant statistics and Node statistics. The following tables describe metrics of different resource scopes. You can set the metrics as needed on the Performance Monitoring page. We recommend that you set the metrics based on our best practices.
Metrics for tenant alert items
Alert item | Metric | Metric name |
---|---|---|
Memory usage | memory_usage | Tenant/Tenant memory usage |
CPU utilization | cpu_usage_percent | Tenant/CPU utilization |
Disk usage | disk_ob_data_size | Cluster/Maximum disk usage Note: Disk usage of tenants is not separated. You can configure alerts based on cluster-level disk usage rather than tenant-level disk usage. |
Total connections | total_sessions | This metric does not support alerting. |
Read/Write connections | readwrite_sessions | This metric does not support alerting. |
Read-only connections | readonly_sessions | This metric does not support alerting. |
Write requests | tps | Tenant/Write requests |
Read requests | QPS | Tenant/Read requests |
Response time of write requests | tps_rt | Tenant/Response time of write requests |
Response time of read requests | qps_rt | Tenant/Response time of read requests |
Wait queue | request_queue_rt | Tenant/Wait queue |
Transactions committed | trans_user_trans_count | Tenant/Transactions committed |
Transaction response time | trans_commit_rt | Tenant/Transaction response time |
Metrics for OBServer (node) alert items
Alert item | Metric | Metric name |
---|---|---|
CPU utilization | cpu_util | Node/CPU utilization |
Load | load_load1 | Node/Load |
OBServer memory usage | machine_mem_used_percent | Node/Memory usage |
Disk reads | io_read_bytes | Node/Disk reads |
Disk writes | io_write_bytes | Node/Disk writes |
Disk I/O wait duration | io_await | Node/Disk I/O wait duration |
Packet inflow rate | traffic_bytin | Node/Packet inflow rate |
Packet outflow rate | traffic_bytout | Node/Packet outflow rate |
Retransmission rate | tcp_retran | Node/Retransmission rate |
Total connections | total_sessions | This metric does not support alerting. |
Read/Write connections | readwrite_sessions | This metric does not support alerting. |
Read-only connections | readonly_sessions | This metric does not support alerting. |
Alert level
Each alert item has an alert level.
Level | Meaning | Alert method | Description |
---|---|---|---|
1 | Critical | Phone call + SMS message + Email + DingTalk Chatbot | The system availability decreases and necessary measures must be taken to prevent the system from becoming completely unavailable. The system is still available but it is about to become unavailable. You must take measures to prevent the reduction of availability. For example, the server memory usage exceeds the threshold of 90% and this condition has lasted for three minutes. |
2 | Warning | SMS message + Email + DingTalk Chatbot | Based on the trend, you can tell that the important performance metrics of the system are declining. You can locate potential problems through troubleshooting to prevent the trigger of alerts. This alert level is reserved but no alert matches this level at present. |
3 | Reminder | Email + DingTalk Chatbot | Technically, a reminder is not an alert. It usually indicates that an administrator has performed an important operation. For example, the administrator deleted a cluster. After alerts at this level are cleared, no notification is generated. |