All Products
Search
Document Center

Microservices Engine:Configure default alert rules

Last Updated:Mar 11, 2026

Microservices Engine (MSE) provides built-in alert rules that monitor CPU utilization, memory usage, garbage collection (GC) performance, and capacity limits across your MSE instances. Enable these rules to notify a contact group when any metric breaches its threshold, so you can detect and resolve issues before they affect production traffic.

Prerequisites

Before you begin, make sure that you have:

  • An MSE instance (Microservices Registry, Nacos, ZooKeeper, or Ingress gateway)

  • At least one alert contact group

Enable default alert rules

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Registry > Instances.

  3. On the Instances page, find the target instance and choose More > Configure Default Alert in the Actions column.

  4. In the Configure Default Alert dialog box, select a contact group for Alert Contact Group and click OK.

After you click OK, MSE adds the default alert rules for the selected contact group. The rules vary by instance type and edition. See the following sections for details.

Default alert rules

Microservices Registry

Applies to Basic Edition, Developer Edition, and Professional Edition instances.

Alert ruleThresholdTimeframeDescriptionSolution
Excessively High CPU Load in InstancesCPU utilization > 80% per nodeContinuousHigh CPU usage may indicate version defects or insufficient capacity.1. Check the Risk Management page and follow the suggested fixes. 2. If the alert persists, scale out the instance.
Excessively High Memory Usage in InstancesMemory usage > 90% per nodeContinuousHigh memory usage can lead to Out-of-Memory (OOM) errors and service disruption.1. Check the Risk Management page and follow the suggested fixes. 2. If the alert persists, scale out the instance.

ZooKeeper

Basic Edition, Developer Edition, and Professional Edition

Alert ruleThresholdTimeframeDescriptionSolution
Excessive CMS GC Occurrences in ZooKeeper InstancesConcurrent Mark Sweep (CMS) GC count > 51 minuteFrequent CMS GC cycles indicate memory pressure or insufficient instance capacity.1. Scale out the instance. 2. If the alert persists, check whether the instance version has known defects and upgrade if needed.
Excessively Long CMS GC Duration in Zookeeper InstancesCMS GC duration > 6 s1 minuteLong GC pauses can cause request timeouts and session disconnections.1. Scale out the instance. 2. If the alert persists, check whether the instance version has known defects and upgrade if needed.

Serverless Edition

Alert ruleThresholdTimeframeDescriptionSolution
Snapshot ThrottlingSnapshot size > 20 MB (limit: 25 MB)ContinuousThe maximum snapshot size is 25 MB. Exceeding 20 MB means the instance is approaching the limit.Reduce the data stored in ZooKeeper. If you need a higher limit, submit a ticket.

Nacos

Basic Edition, Developer Edition, and Professional Edition

These rules detect GC performance issues that indicate insufficient heap memory.

Alert ruleThresholdTimeframeDescriptionSolution
Excessive Full GC Occurrences in Nacos InstancesFull GC count > 21 minuteFrequent full GC runs indicate insufficient heap memory or client-side misconfigurations.1. Check for connection leaks, duplicate registration, or duplicate subscription caused by client misconfiguration. 2. If no such issues exist, scale out or upgrade the instance.
Excessively Long Full GC Duration in Nacos InstancesFull GC duration > 5 s1 minuteLong full GC pauses block all application threads, causing request failures.1. Check for connection leaks, duplicate registration, or duplicate subscription caused by client misconfiguration. 2. If no such issues exist, scale out or upgrade the instance.

Basic Edition, Developer Edition, Professional Edition, and Serverless Edition

These capacity alerts trigger when resource usage approaches the instance limit.

Alert ruleThresholdTimeframeDescriptionSolution
Excessively High Nacos Service UsageService usage > 90%ContinuousThe number of registered services is approaching the instance quota.Scale out or upgrade the instance to increase the service quota.
Excessively High Nacos Service Provider UsageService provider usage > 90%ContinuousThe number of service providers is approaching the instance quota.Scale out or upgrade the instance to increase the provider quota.
Excessively High Nacos Connection UsageConnection usage > 90%ContinuousThe number of connections is approaching the instance quota.Scale out or upgrade the instance to increase the connection quota.
Excessively High Nacos Configuration UsageConfiguration usage > 90%ContinuousThe number of configurations is approaching the instance quota.Scale out or upgrade the instance to increase the configuration quota.
Excessively High Nacos Long Polling UsageLong polling usage > 90%ContinuousThe number of long polling connections is approaching the instance quota.Scale out or upgrade the instance to increase the long polling quota.
Excessive Decrease of Proportion of Nacos Service ProvidersProvider count drops > 50% vs. 3 min ago3 minutesA sudden drop in provider count may cause upstream services to lose connectivity with downstream providers.1. Check whether applications are being released or restarted. 2. If no deployment is in progress, verify that CPU, memory, GC, and network resources are healthy for your applications.

Serverless Edition

Alert ruleThresholdTimeframeDescriptionSolution
TPS ThrottlingTPS throttling triggeredContinuousTransactions-per-second (TPS) throttling has activated on the instance.Submit a ticket to request a higher TPS limit.
Service Capacity LimitService capacity exceededContinuousThe number of services exceeds the instance limit.Submit a ticket to request a higher service capacity.
Connection LimitConnection count exceededContinuousThe number of connections exceeds the instance limit.Submit a ticket to request a higher connection limit.
Configuration Capacity LimitConfiguration capacity exceededContinuousThe number of configurations exceeds the instance limit.Submit a ticket to request a higher configuration capacity.

Ingress gateway

Professional Edition

Alert ruleThresholdTimeframeDescriptionSolution
Excessively High CPU Load in InstancesCPU utilization > 80%ContinuousHigh CPU usage may indicate plug-in issues or insufficient capacity.1. Check for plug-in memory leaks or logic errors. 2. If no such issues exist, scale out the instance.
Excessively High Memory Usage in InstancesMemory usage > 80%ContinuousHigh memory usage may indicate plug-in issues or insufficient capacity.1. Check for plug-in memory leaks or logic errors. 2. If no such issues exist, scale out the instance.

Professional Edition and Serverless Edition

Alert ruleThresholdTimeframeDescriptionSolution
Low Gateway Accuracy RateAccuracy rate < 80%ContinuousA low accuracy rate indicates that a significant portion of requests are failing.Check for gateway configuration errors or application-level exceptions.
Custom Gateway Plug-in Exception (Recovered)Plug-in exception detectedContinuousA custom gateway plug-in encountered an error and was automatically recovered.Review the plug-in logic and fix the root cause to prevent recurrence.