All Products
Search
Document Center

Network Intelligence Service:Event center

Last Updated:Apr 19, 2024

Network Intelligence Service (NIS) provides the event center to help you monitor resources based on events. You can view the resources that are exposed to potential risks and configure alert rules for specific events. This way, you can handle these events at your earliest opportunity to prevent your business from being impaired.

Scenarios

Alibaba Cloud defines NIS events to record and notify the information about cloud network resources, such as the execution status of O&M tasks, resource exceptions, and resource status changes.

  • Notification of risks and exceptions

    If events that are related to instance availability degradation or performance degradation occur, Alibaba Cloud pushes these events to the event center in the NIS console. Such events include instance performance degradation caused by excess instance usage, business availability degradation caused by packet loss on Internet connections, and instance expiration alerts. We recommend that you handle these events at your earliest opportunity to prevent your business from being impaired.

  • Automated O&M

    Alibaba Cloud defines the status of the events that are displayed in the NIS console. This helps you understand the execution status of system O&M tasks. Meanwhile, new events and status changes of events are reported to CloudMonitor. This allows you to build an event-driven automated O&M system based on your business requirements.

Limits

Retired instance families do not support events. For more information, see the end-of-sale notice for each Alibaba Cloud service.

Basic information

Event types

Alibaba Cloud defines events to record and notify the information about cloud network resources. Events are categorized into the types described in the following table based on event causes.

Type

Description

Example

Issue event

Exceptions that have impaired business and have been in the In Progress state for 7 days.

  • Packet loss due to excess public bandwidth usage

  • Instance suspension due to overdue payments

Risk event

Exceptions that may impair business and have been in the In Progress state for 7 days.

  • Risk of affecting business due to packet loss on physical connections

  • Risk of failure due to sudden changes in traffic usage

  • Risk of instance suspension due to overdue payments

Event levels

Alibaba Cloud defines the following levels for events based on their impacts on the operation of instances:

  • Critical: Events at this level may result in instance unavailability and must be handled at your earliest opportunity.

  • Warn: Events at this level have affected your business. You must pay close attention to these events or handle them at an appropriate point in time.

  • Info: You can decide whether to pay attention to the events at this level.

Note

For more information about the codes, names, descriptions, and handling suggestions on events, see the Events section in this topic.

Events

This section summarizes the events supported by NIS and provides suggestions on how to handle these events.

Note

Issue events do not apply to shared-resource Classic Load Balancer (CLB) instances.

Issue events

Event code

Event name

Event level

Event name in CloudMonitor

Event description and impact

Suggestion for users

Internet-facing instance

problem-internetBandwidthOverlimit

Packet loss due to excess bandwidth usage

Critical

problem-internetBandwidthOverLimit

Packets are lost because the bandwidth of an Internet-facing instance exceeds the peak bandwidth.

An instance that generates Internet data transfer is called an Internet-facing instance, such as an elastic IP address (EIP), bandwidth plan, or CLB instance.

Increase the peak bandwidth.

Internet NAT gateway

problem-nat-sessionOverLimit

Connection drop caused by excess NAT sessions

Critical

problem-nat-sessionOverLimit

The number of sessions on an Internet NAT gateway exceeds the upper limit. As a result, new sessions fail and over 100 packets are lost per second.

Upgrade the Internet NAT gateway or create multiple Internet NAT gateways. For more information, see Manage NAT Gateway quotas and Create and manage Internet NAT gateways.

problem-nat-sessionNewOverLimit

Connection drop caused by excess new NAT sessions

Critical

problem-nat-sessionNewOverLimit

The number of new sessions on an Internet NAT gateway per second exceeds the upper limit. As a result, new sessions fail and over 100 packets are lost per second.

problem-nat-portAllocationError

Allocation failure of SNAT source ports

Critical

problem-nat-portAllocationError

The EIPs bound to an Internet NAT gateway are insufficient. As a result, source ports fail to be allocated and more than 10 packets are lost per second.

Note

You cannot configure subscription policies for this event.

Increase the number of EIPs that are bound to the Internet NAT gateway. For more information, see the Associate an EIP with an Internet NAT gateway section of the Create and manage Internet NAT gateways topic.

CLB instance

problem-clb-connectionOverLimit

Discarded New connections caused by excess CLB sessions

Critical

problem-clb-connectionOverLimit

The number of new connections or concurrent connections of a CLB instance exceeds the upper limit. As a result, new sessions fail and the number of dropped connections per second is large.

Upgrade the CLB instance to a Network Load Balancer (NLB) instance or an Application Load Balancer (ALB) instance.

For more information, see Manage CLB quotas. For more information about NLB and ALB, see What is NLB? and What is ALB?

problem-clb-bandwidthOverLimit

Packet Loss due to excess bandwidth usage of CLB instances

Critical

problem-clb-bandwidthOverLimit

Packet loss occurs because the bandwidth of a CLB instance exceeds the peak bandwidth.

Increase the peak bandwidth. For more information, see FAQ about CLB.

problem-clb-connectionFail

Sharp increase in failed CLB connections

Critical

problem-clb-connectionFail

The number of failed connections of a CLB instance is sharply increased due to excess number, excess workload, or business exception of the backend servers of a CLB instance.

Upgrade the backend servers, upgrade the CLB instance, or check the service status of the backend servers.

For more information, see Manage CLB quotas.

NLB instance

problem-nlb-connectionFail

Sharp increase in failed NLB connections

Critical

problem-nlb-connectionFail

The number of failed connections between the virtual IP addresses of NLB instances and Elastic Compute Service (ECS) instances is greatly increased for consecutive 10 minutes. Possible causes:

  • Network jitter

  • Poor performance of backend servers

Check the bandwidth usage and service status of the backend servers.

problem-nlb-newConnectionSurge

Discarded new NLB connections

Critical

problem-nlb-newConnectionSurge

The number of new connections between the virtual IP addresses of NLB instances and ECS instances is greatly increased. As a result, new connection requests are discarded for consecutive milliseconds or seconds.

Purchase multiple NLB instances to distribute traffic to the NLB instances or submit a ticket to your customer manager.

problem-nlb-newConnectionOverLimit

Excess new NLB connections

Critical

problem-nlb-newConnectionOverLimit

The number of new connections between a virtual IP address of an NLB instance and ECS instances per second exceeds the upper limit. As a result, new connection requests are discarded for consecutive milliseconds or seconds.

problem-nlb-concurrentConnectionOverLimit

Excess concurrent NLB connections

Critical

problem-nlb-concurrentConnectionOverLimit

The number of concurrent connections between a virtual IP address of an NLB instance and ECS instances per second exceeds the upper limit. As a result, new connection requests are discarded for consecutive milliseconds or seconds.

ALB instance

problem-alb-intranetBandwidthOverLimit

Packet Loss due to excess private bandwidth usage of ALB instances

Critical

problem-alb-intranetBandwidthOverLimit

The outbound or inbound bandwidth on a virtual IP address of an ALB instance exceeds the upper limit. A domain name is pointed to the IP address.

Add a canonical name (CNAME) record for the ALB instance. For more information, see Add a CNAME record to an ALB instance.

problem-alb-sessionOverLimit

Discarded new connections caused by excess ALB sessions

Critical

problem-alb-sessionOverLimit

The number of new or concurrent connections that are established between a virtual IP address of an ALB instance and ECS instances exceeds the upper limit. As a result, new sessions fail. A domain name is pointed to the IP address.

problem-alb-qpsOverLimit

503 error code returned because the number of QPS sent to a virtual IP address of an ALB instance exceeds the upper limit

Critical

problem-alb-qpsOverLimit

The number of queries per second (QPS) received by a virtual IP address of an ALB instance exceeds the upper limit. A domain name is pointed to the IP address.

Cloud Enterprise Network (CEN) instance

problem-cen-routeOverLimit

Excess CEN routes

Critical

problem-cen-routeOverLimit

The number of CEN routes exceeds the quota, which may result in network issues.

Upgrade transit routers. For more information, see Upgrade Basic Edition transit routers.

Transit router

problem-cen-vpcAttachBandwidthOverLimit

Packet loss due to excess usage of VPC connection bandwidth

Critical

problem-cen-vpcAttachBandwidthOverLimit

Packet loss occurs because the bandwidth of CEN transit routers exceeds the peak bandwidth.

Increase the peak bandwidth. For more information, see Manage CEN quotas.

problem-cen-peerAttachBandwidthOverLimit

Packet loss due to excess usage of inter-region connection bandwidth

Critical

problem-cen-peerAttachBandwidthOverLimit

Packet loss occurs because the bandwidth of CEN transit routers exceeds the peak bandwidth.

Increase the peak bandwidth. For more information, see Manage CEN quotas.

Risk events

Event code

Event name

Event level

Event name in CloudMonitor

Event description and impact

Suggestion for users

Internet-facing instance

risk-internetPacketLoss

Risk of Internet connection packet loss

Warn

risk-internetPacketLoss

If a packet loss alert is triggered for a physical connection of an Internet service provider (ISP) between two regions of Alibaba Cloud, data transfer over the connection may be affected. In the next 10 minutes, the bandwidth of instances within the current Alibaba Cloud account on the connection may exceed 0.5 Mbit/s or the packet loss rate of the connection may exceed 50%.

Important

Before you monitor this event, you must enable the Internet traffic analysis capability in specific regions or for specific IP addresses. For more information, see the Enable the Internet traffic analysis capability section of the Work with the Internet traffic analysis capability topic.

Check whether the bandwidth of the instances on this physical connection meets your business requirements. For more information, see the 5-tuple data on the Internet Traffic page of the NIS console. If an exception occurs, you can migrate critical business data to other regions. If no exception occurs, ignore this alert.

risk-internetBandwidthOverlimit

Packet loss risk due to excess bandwidth usage

Warn

risk-internetBandwidthOverlimit

According to historical data, the actual bandwidth of instances may exceed the peak bandwidth at a specific point in time in the future at a probability of greater than 90%.

Take note of the bandwidth. If the peak bandwidth is exceeded, increase the peak bandwidth.

VPN gateway

risk-vpn-bpsOverLimit

Excess usage risk of VPN connection bandwidth

Warn

risk-vpn-bpsOverLimit

The bandwidth utilization of a VPN connection has exceeded 90% three times in the last 10 minutes.

Check whether the bandwidth of the VPN connection meets your business requirements. We recommend that you change the configuration of the VPN gateway or purchase one or more new VPN gateways to increase bandwidth. If no exception occurs, ignore this alert.

risk-vpn-bgpRouteLimit

Risk of excess BGP routes

Warn

risk-vpn-bgpRouteLimit

The number of routes that a VPN gateway has automatically learned by using Border Gateway Protocol (BGP) dynamic routing in the last 10 minutes exceeds 90% of the BGP route quota.

Take note of the number. If the quota is exceeded, we recommend that you aggregate the CIDR blocks of the VPN gateway based on your network planning.

Related operations

Operation

Description and references

View events

You can view events in the following ways:

Subscribe to an event

You can configure event subscription policies in the CloudMonitor console. After you configure the policies, you are notified of the occurrence and updates of events by phone call, text message, or email in a timely manner. For more information, see Configure event subscription policies.

Handle events

After you view events, you can resolve the issues based on the suggestions. For more information, see the Events section of the Event center topic.