All Products
Search
Document Center

ApsaraDB for OceanBase:Alert Overview

Last Updated:Jan 03, 2023

ApsaraDB for OceanBase provides you with an alerting feature that can be used for OceanBase clusters (OBClusters), OceanBase Migration Assessment (OMA), OceanBase Migration Service (OMS), and OceanBase Developer Center (ODC). You can use the built-in alerts to meet your basic alerting requirements. This topic helps you understand the built-in alerts.

Alert information

The following table lists the components of each alert.

Component

Description

Description

Describes the meaning of each alert and its trigger conditions.

Alert rule

Describes the trigger rule of each alert, including the alert item, metric , default threshold, duration, and detection cycle.

Trigger rule: The system detects the metric once in each detection cycle. When the value of the metric exceeds the default threshold for the number of cycles specified for the duration, an alert is triggered.

Impact on the system

Describes the impact that may be caused on the system when the alert is triggered.

Possible causes

Describes the possible causes of an alert to help you locate and handle the alert.

Solutions

Shows you how to solve the issues that caused the alert.

For more information, see Add an alert rule.

Note

For more information about how to add an alert rule, see Add an alert rule.

Concepts

Alert target

An alert target is a target that is monitored by the alert task and uniquely identifies an alert. It can be an OceanBase cluster, a server, or a service.

The alert information is shown in the format of "alert rule name (instance: faulty instance name)", for example: disk_log_usage_instance (instance: integration_22-ob2).

Alert scope

The alert scope defines the scope of an alert and is consistent with the metric scope.

The alert scope can be an OBCluster, OMA, OMS, or ODC.

Description

ApsaraDB for OceanBase allows you to set alert rules based on tenant statistics and node statistics. For more information, see View tenant statistics and Node statistics. The following tables describe metrics of different resource scopes. You can set the metrics as needed on the Performance Monitoring page. We recommend that you set the metrics based on our best practices.

Metrics for tenant alert items

Alert item

Metric

Metric name

Memory usage

memory_usage

Tenant/Tenant memory usage

CPU utilization

cpu_usage_percent

Tenant/CPU utilization

Disk usage

disk_ob_data_size

Cluster/Maximum disk usage

Note: Disk usage of tenants is not separated. You can configure alerts based on cluster-level disk usage rather than tenant-level disk usage.

Total connections

total_sessions

This metric does not support alerting.

Read/Write connections

readwrite_sessions

This metric does not support alerting.

Read-only connections

readonly_sessions

This metric does not support alerting.

Write requests

tps

Tenant/Write requests

Read requests

QPS

Tenant/Read requests

Response time of write requests

tps_rt

Tenant/Response time of write requests

Response time of read requests

qps_rt

Tenant/Response time of read requests

Wait queue

request_queue_rt

Tenant/Wait queue

Transactions committed

trans_user_trans_count

Tenant/Transactions committed

Transaction response time

trans_commit_rt

Tenant/Transaction response time

Metrics for OBServer (node) alert items

Alert item

Metric

Metric name

CPU utilization

cpu_util

Node/CPU utilization

Load

load_load1

Node/Load

OBServer memory usage

machine_mem_used_percent

Node/Memory usage

Disk reads

io_read_bytes

Node/Disk reads

Disk writes

io_write_bytes

Node/Disk writes

Disk I/O wait duration

io_await

Node/Disk I/O wait duration

Packet inflow rate

traffic_bytin

Node/Packet inflow rate

Packet outflow rate

traffic_bytout

Node/Packet outflow rate

Retransmission rate

tcp_retran

Node/Retransmission rate

Total connections

total_sessions

This metric does not support alerting.

Read/Write connections

readwrite_sessions

This metric does not support alerting.

Read-only connections

readonly_sessions

This metric does not support alerting.

Alert level

Each alert item has an alert level.

Level

Meaning

Alert method

Description

1

Critical

Phone call + SMS message + Email + DingTalk Chatbot

The system availability decreases and necessary measures must be taken to prevent the system from becoming completely unavailable. The system is still available but it is about to become unavailable. You must take measures to prevent the reduction of availability. For example,

the server memory usage exceeds the threshold of 90% and this condition has lasted for three minutes.

2

Warning

SMS message + Email + DingTalk Chatbot

Based on the trend, you can tell that the important performance metrics of the system are declining. You can locate potential problems through troubleshooting to prevent the trigger of alerts. This alert level is reserved but no alert matches this level at present.

3

Reminder

Email + DingTalk Chatbot

Technically, a reminder is not an alert. It usually indicates that an administrator has performed an important operation. For example, the administrator deleted a cluster.

After alerts at this level are cleared, no notification is generated.