All Products
Search
Document Center

Microservices Engine:Configure default alert rules

Last Updated:May 29, 2024

This topic describes how to configure default alert rules.

Procedure

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Registry > Instances.

  3. On the Instances page, find the instance that you want to manage, and choose More > Configure Default Alert in the Actions column.

  4. In the Configure Default Alert dialog box, select an alert contact group for the Alert Contact Group parameter and click OK.

    After you click OK, the default alert rules that are described in the following table are automatically added for the selected contact group.

    Instance type

    Instance version

    Alert rule name

    Description

    Solution

    Microservices Registry

    Basic Edition, Developer Edition, and Professional Edition

    Excessively High CPU Load in Instances

    The CPU utilization of a node in an instance exceeds 80%.

    The version of the instance has defects or the capacity of the instance is insufficient. Check the risk items on the Risk Management page and resolve the issue based on the suggestions. If the alert persists after the issue is resolved, scale out the instance.

    Excessively High Memory Usage in Instances

    The memory usage of a node in an instance exceeds 90%.

    ZooKeeper

    Basic Edition, Developer Edition, and Professional Edition

    Excessive CMS GC Occurrences in ZooKeeper Instances

    The number of times that the garbage collection (GC) of Concurrent Mark Sweep (CMS) occurs in an instance exceeds five in 1 minute.

    Excessively Long CMS GC Duration in Zookeeper Instances

    The total duration of CMS GC in an instance exceeds 6 seconds in 1 minute.

    Serverless

    Snapshot Throttling

    The size of a snapshot exceeds 20 MB and is close to 25 MB, which is the upper limit.

    The size of a snapshot cannot exceed 25 MB. If you need more space to save the snapshot,

    submit a ticket.

    Nacos

    Basic Edition, Developer Edition, and Professional Edition

    Excessive Full GC Occurrences in Nacos Instances

    The number of times that full GC occurs in an instance exceeds two in 1 minute.

    The capacity of the instance is insufficient. Check whether an issue, such as connection leaks, duplicate registration, or duplicate subscription, occurs due to misconfiguration on the client. If such an issue does not occur, scale out or upgrade the instance in a timely manner.

    Excessively Long Full GC Duration in Nacos Instances

    The total duration of full GC in an instance exceeds 5 seconds in 1 minute.

    Basic Edition, Developer Edition, Professional Edition, and Serverless Edition

    Excessively High Nacos Service Usage

    The usage of services exceeds 90%.

    Excessively High Nacos Service Provider Usage

    The usage of service providers exceeds 90%.

    Excessively High Nacos Connection Usage

    The usage of connections exceeds 90%.

    Excessively High Nacos Configuration Usage

    The usage of configurations exceeds 90%.

    Excessively High Nacos Long Polling Usage

    The usage of the configured long polling exceeds 90%.

    Excessive Decrease of Proportion of Nacos Service Providers

    The number of service providers that are registered with a Nacos instance at the current time is decreased by more than 50% compared with 3 minutes ago. When this alert rule is matched, the upstream service may fail to identify the downstream service providers.

    A large number of services are disconnected within a short period of time due to application failures or the release of a large number of applications. Check whether business applications are being released after updates. If no application is being released, check whether resources such as CPU cores, memory, GC, and network are healthy for business applications.

    Serverless

    TPS Throttling

    TPS throttling is triggered in an instance.

    submit a ticket

    Service Capacity Limit

    Service capacity exceeds the upper limit in an instance.

    Connection Limit

    The number of connections exceeds the upper limit in an instance.

    Configuration Capacity Limit

    Configuration capacity exceeds the upper limit in an instance.

    Ingress

    Professional Edition

    Excessively High CPU Load in Instances

    The CPU utilization in an instance exceeds 80%.

    The capacity of the instance is insufficient. Check whether an issue such as a plug-in memory leak or logic error occurs. If such an issue does not occur, scale out the instance in a timely manner.

    Excessively High Memory Usage in Instances

    The memory usage in an instance exceeds 80%.

    Professional Edition and Serverless Edition

    Low Gateway Accuracy Rate

    The overall gateway accuracy rate is lower than 80%.

    The overall gateway accuracy rate is abnormal. Check whether gateway configuration issues or gateway business exceptions occur.

    Custom Gateway Plug-in Exception (Recovered)

    A custom gateway plug-in is abnormal. The plug-in has been automatically recovered.

    The custom plug-in fails. Check the plug-in logic.