Network Intelligence Service (NIS) provides the event center to help you monitor resources based on events. You can view the resources that are exposed to potential risks and configure alert rules for specific events. This way, you can handle these events at your earliest opportunity to prevent your business from being impaired.
Scenarios
Alibaba Cloud defines NIS events to record and notify the information about cloud network resources, such as the execution status of O&M tasks, resource exceptions, and resource status changes.
Notification of risks and exceptions
If events that are related to instance availability degradation or performance degradation occur, Alibaba Cloud pushes these events to the event center in the NIS console. Such events include instance performance degradation caused by excess instance usage, business availability degradation caused by packet loss on Internet connections, and instance expiration alerts. We recommend that you handle these events at your earliest opportunity to prevent your business from being impaired.
Automated O&M
Alibaba Cloud defines the status of the events that are displayed in the NIS console. This helps you understand the execution status of system O&M tasks. Meanwhile, new events and status changes of events are reported to CloudMonitor. This allows you to build an event-driven automated O&M system based on your business requirements.
Limits
Retired instance families do not support events. For more information, see the end-of-sale notice for each Alibaba Cloud service.
Basic information
Event types
Alibaba Cloud defines events to record and notify the information about cloud network resources. Events are categorized into the types described in the following table based on event causes.
Type | Description | Example |
Issue event | Exceptions that have impaired business and have been in the In Progress state for 7 days. |
|
Risk event | Exceptions that may impair business and have been in the In Progress state for 7 days. |
|
Event levels
Alibaba Cloud defines the following levels for events based on their impacts on the operation of instances:
Critical: Events at this level may result in instance unavailability and must be handled at your earliest opportunity.
Warn: Events at this level have affected your business. You must pay close attention to these events or handle them at an appropriate point in time.
Info: You can decide whether to pay attention to the events at this level.
For more information about the codes, names, descriptions, and handling suggestions on events, see the Events section in this topic.
Events
This section summarizes the events supported by NIS and provides suggestions on how to handle these events.
Issue events do not apply to shared-resource Classic Load Balancer (CLB) instances.
Issue events
Event code | Event name | Event level | Event name in CloudMonitor | Event description and impact | Suggestion for users |
Internet-facing instance | |||||
problem-internetBandwidthOverlimit | Packet loss due to excess bandwidth usage | Critical | problem-internetBandwidthOverLimit | Packets are lost because the bandwidth of an Internet-facing instance exceeds the peak bandwidth. An instance that generates Internet data transfer is called an Internet-facing instance, such as an elastic IP address (EIP), bandwidth plan, or CLB instance. | Increase the peak bandwidth. |
Internet NAT gateway | |||||
problem-nat-sessionOverLimit | Connection drop caused by excess NAT sessions | Critical | problem-nat-sessionOverLimit | The number of sessions on an Internet NAT gateway exceeds the upper limit. As a result, new sessions fail and over 100 packets are lost per second. | Upgrade the Internet NAT gateway or create multiple Internet NAT gateways. For more information, see Manage NAT Gateway quotas and Create and manage Internet NAT gateways. |
problem-nat-sessionNewOverLimit | Connection drop caused by excess new NAT sessions | Critical | problem-nat-sessionNewOverLimit | The number of new sessions on an Internet NAT gateway per second exceeds the upper limit. As a result, new sessions fail and over 100 packets are lost per second. | |
problem-nat-portAllocationError | Allocation failure of SNAT source ports | Critical | problem-nat-portAllocationError | The EIPs bound to an Internet NAT gateway are insufficient. As a result, source ports fail to be allocated and more than 10 packets are lost per second. Note You cannot configure subscription policies for this event. | Increase the number of EIPs that are bound to the Internet NAT gateway. For more information, see the Associate an EIP with an Internet NAT gateway section of the Create and manage Internet NAT gateways topic. |
CLB instance | |||||
problem-clb-connectionOverLimit | Discarded New connections caused by excess CLB sessions | Critical | problem-clb-connectionOverLimit | The number of new connections or concurrent connections of a CLB instance exceeds the upper limit. As a result, new sessions fail and the number of dropped connections per second is large. | Upgrade the CLB instance to a Network Load Balancer (NLB) instance or an Application Load Balancer (ALB) instance. For more information, see Manage CLB quotas. For more information about NLB and ALB, see What is NLB? and What is ALB? |
problem-clb-bandwidthOverLimit | Packet Loss due to excess bandwidth usage of CLB instances | Critical | problem-clb-bandwidthOverLimit | Packet loss occurs because the bandwidth of a CLB instance exceeds the peak bandwidth. | Increase the peak bandwidth. For more information, see FAQ about CLB. |
problem-clb-connectionFail | Sharp increase in failed CLB connections | Critical | problem-clb-connectionFail | The number of failed connections of a CLB instance is sharply increased due to excess number, excess workload, or business exception of the backend servers of a CLB instance. | Upgrade the backend servers, upgrade the CLB instance, or check the service status of the backend servers. For more information, see Manage CLB quotas. |
NLB instance | |||||
problem-nlb-connectionFail | Sharp increase in failed NLB connections | Critical | problem-nlb-connectionFail | The number of failed connections between the virtual IP addresses of NLB instances and Elastic Compute Service (ECS) instances is greatly increased for consecutive 10 minutes. Possible causes:
| Check the bandwidth usage and service status of the backend servers. |
problem-nlb-newConnectionSurge | Discarded new NLB connections | Critical | problem-nlb-newConnectionSurge | The number of new connections between the virtual IP addresses of NLB instances and ECS instances is greatly increased. As a result, new connection requests are discarded for consecutive milliseconds or seconds. |
Purchase multiple NLB instances to distribute traffic to the NLB instances or submit a ticket to your customer manager. |
problem-nlb-newConnectionOverLimit | Excess new NLB connections | Critical | problem-nlb-newConnectionOverLimit | The number of new connections between a virtual IP address of an NLB instance and ECS instances per second exceeds the upper limit. As a result, new connection requests are discarded for consecutive milliseconds or seconds. | |
problem-nlb-concurrentConnectionOverLimit | Excess concurrent NLB connections | Critical | problem-nlb-concurrentConnectionOverLimit | The number of concurrent connections between a virtual IP address of an NLB instance and ECS instances per second exceeds the upper limit. As a result, new connection requests are discarded for consecutive milliseconds or seconds. | |
ALB instance | |||||
problem-alb-intranetBandwidthOverLimit | Packet Loss due to excess private bandwidth usage of ALB instances | Critical | problem-alb-intranetBandwidthOverLimit | The outbound or inbound bandwidth on a virtual IP address of an ALB instance exceeds the upper limit. A domain name is pointed to the IP address. | Add a canonical name (CNAME) record for the ALB instance. For more information, see Add a CNAME record to an ALB instance. |
problem-alb-sessionOverLimit | Discarded new connections caused by excess ALB sessions | Critical | problem-alb-sessionOverLimit | The number of new or concurrent connections that are established between a virtual IP address of an ALB instance and ECS instances exceeds the upper limit. As a result, new sessions fail. A domain name is pointed to the IP address. | |
problem-alb-qpsOverLimit | 503 error code returned because the number of QPS sent to a virtual IP address of an ALB instance exceeds the upper limit | Critical | problem-alb-qpsOverLimit | The number of queries per second (QPS) received by a virtual IP address of an ALB instance exceeds the upper limit. A domain name is pointed to the IP address. | |
Cloud Enterprise Network (CEN) instance | |||||
problem-cen-routeOverLimit | Excess CEN routes | Critical | problem-cen-routeOverLimit | The number of CEN routes exceeds the quota, which may result in network issues. | Upgrade transit routers. For more information, see Upgrade Basic Edition transit routers. |
Transit router | |||||
problem-cen-vpcAttachBandwidthOverLimit | Packet loss due to excess usage of VPC connection bandwidth | Critical | problem-cen-vpcAttachBandwidthOverLimit | Packet loss occurs because the bandwidth of CEN transit routers exceeds the peak bandwidth. | Increase the peak bandwidth. For more information, see Manage CEN quotas. |
problem-cen-peerAttachBandwidthOverLimit | Packet loss due to excess usage of inter-region connection bandwidth | Critical | problem-cen-peerAttachBandwidthOverLimit | Packet loss occurs because the bandwidth of CEN transit routers exceeds the peak bandwidth. | Increase the peak bandwidth. For more information, see Manage CEN quotas. |
Risk events
Event code | Event name | Event level | Event name in CloudMonitor | Event description and impact | Suggestion for users |
Internet-facing instance | |||||
risk-internetPacketLoss | Risk of Internet connection packet loss | Warn | risk-internetPacketLoss | If a packet loss alert is triggered for a physical connection of an Internet service provider (ISP) between two regions of Alibaba Cloud, data transfer over the connection may be affected. In the next 10 minutes, the bandwidth of instances within the current Alibaba Cloud account on the connection may exceed 0.5 Mbit/s or the packet loss rate of the connection may exceed 50%. Important Before you monitor this event, you must enable the Internet traffic analysis capability in specific regions or for specific IP addresses. For more information, see the Enable the Internet traffic analysis capability section of the Work with the Internet traffic analysis capability topic. | Check whether the bandwidth of the instances on this physical connection meets your business requirements. For more information, see the 5-tuple data on the Internet Traffic page of the NIS console. If an exception occurs, you can migrate critical business data to other regions. If no exception occurs, ignore this alert. |
risk-internetBandwidthOverlimit | Packet loss risk due to excess bandwidth usage | Warn | risk-internetBandwidthOverlimit | According to historical data, the actual bandwidth of instances may exceed the peak bandwidth at a specific point in time in the future at a probability of greater than 90%. | Take note of the bandwidth. If the peak bandwidth is exceeded, increase the peak bandwidth. |
VPN gateway | |||||
risk-vpn-bpsOverLimit | Excess usage risk of VPN connection bandwidth | Warn | risk-vpn-bpsOverLimit | The bandwidth utilization of a VPN connection has exceeded 90% three times in the last 10 minutes. | Check whether the bandwidth of the VPN connection meets your business requirements. We recommend that you change the configuration of the VPN gateway or purchase one or more new VPN gateways to increase bandwidth. If no exception occurs, ignore this alert. |
risk-vpn-bgpRouteLimit | Risk of excess BGP routes | Warn | risk-vpn-bgpRouteLimit | The number of routes that a VPN gateway has automatically learned by using Border Gateway Protocol (BGP) dynamic routing in the last 10 minutes exceeds 90% of the BGP route quota. | Take note of the number. If the quota is exceeded, we recommend that you aggregate the CIDR blocks of the VPN gateway based on your network planning. |
Related operations
Operation | Description and references |
View events | You can view events in the following ways:
|
Subscribe to an event | You can configure event subscription policies in the CloudMonitor console. After you configure the policies, you are notified of the occurrence and updates of events by phone call, text message, or email in a timely manner. For more information, see Configure event subscription policies. |
Handle events | After you view events, you can resolve the issues based on the suggestions. For more information, see the Events section of the Event center topic. |