When key performance metrics—such as CPU utilization or memory usage—cross critical thresholds, or when instance events such as master-replica switchovers occur, CloudMonitor sends alerts to your designated contacts. Configure alert rules to catch issues before they affect your application.
Prerequisites
Before you begin, ensure that you have:
A Tair (Redis OSS-compatible) instance
An alert contact and alert contact group in CloudMonitor. See Create an alert contact or alert contact group
Configure alert rules
Log on to the console and go to the Instances page.
In the top navigation bar, select the region where your instance resides.
Find the instance and click its instance ID.
In the left-side navigation pane, click Alert Settings.
On the Alarm Settings page, review the current metrics for your instance.
Click Alert Settings in the upper-right corner to open the CloudMonitor console, then add or manage alert rules using one of the following methods:
Create an alert rule: Sends an alert when a metric value exceeds a defined threshold. For example, trigger an alert when CPU utilization exceeds 90%.
Subscribe to event notifications: Sends an alert when an instance event occurs, such as a master-replica switchover, an instance failure, or a proactive O&M task such as instance migration. Alerts are triggered by events such as InstanceMaintenance (proactive O&M) and instance exceptions.
NoteWhen creating an alert rule, select the service type that matches your instance architecture. Selecting the wrong type means your alert rules won't apply to the correct metrics.
Instance type Service types Tair DRAM-based instances and Redis Open-Source Edition instances Redis/Tair DRAM(Standard), Redis/Tair DRAM(Cluster), Redis/Tair DRAM(Read/Write Splitting) Tair persistent memory-optimized instances Tair Persistent Memory(Standard), Tair Persistent Memory(Cluster), Tair Persistent Memory(Read/Write Splitting) Tair ESSD/SSD-based instances Tair ESSD/SSD(Standard), Tair ESSD/SSD Cluster
Recommended alert thresholds
The following metrics are most sensitive to performance degradation. Configure alert rules for these metrics as a baseline.
| Metric | Recommended threshold | Why it matters |
|---|---|---|
| CPU utilization | Greater than 60% | Sustained high CPU usage means the Redis server is struggling to process requests, which leads to increased latency and potential timeouts. |
| Memory usage | Greater than 80% | When memory is nearly full, Redis may start evicting keys or rejecting writes, depending on your eviction policy. |
| Inbound bandwidth usage | Greater than 80% | Bandwidth saturation delays data delivery to clients, causing request timeouts. |
| Outbound bandwidth usage | Greater than 80% | Bandwidth saturation delays data delivery to clients, causing request timeouts. |
For the full list of available metrics, see Appendix 1: Metrics.
FAQ
What does the Blocked Clients metric mean?
The Node/Blocked Clients metric counts the number of client connections currently blocked waiting for a response from a blocking command. Blocking commands include BRPOP, BLPOP, BZPOPMIN, BZPOPMAX, and XREAD. A sustained increase in this metric indicates that clients are piling up behind long-running blocking operations, which can degrade overall instance responsiveness.