Network Intelligence Service (NIS) provides the network inspection feature to allow you to diagnose cloud networks in terms of stability, security, performance, cost optimization, and operational excellence. You can use the cloud network architecture observability service provided by network inspection to identify exceptions and obtain suggestions.
Scenarios
When you deploy or maintain networks or resources, your network configurations may not meet the requirement for best practices if you are unfamiliar with the cloud services that you use. After continuous network optimizations, you may need to manage an excessive number of network instances. Configuring, verifying, and inspecting these resources require large amounts of manpower. To meet this challenge, you can use the network inspection feature, which can help you diagnose the network architecture and resources deployed in the network and provide network optimization suggestions.
Inspection items
Inspected resource | Inspection category | Inspection item | Description | Risk | Severity | Optimization suggestion |
EIP | Network stability | Elastic IP address (EIP) bandwidth usage check | Check the bandwidth usage of EIPs and the frequency of packet loss due to high or excess usage of bandwidth within an inspection cycle. This helps you assess whether the current bandwidth usage meets the business development requirement and identify network risks that may cause business interruptions due to insufficient bandwidth. | An alert is triggered within the most recent inspection cycle to indicate that the usage of Internet bandwidth is about to exceed the upper limit. | Medium | Upgrade the EIP bandwidth specification. For more information, see Modify the configuration of a subscription EIP or Modify the configuration of a pay-as-you-go EIP. |
An alert is triggered within the most recent inspection cycle to indicate that packet loss occurs because the usage of Internet bandwidth exceeds the upper limit. | High | Upgrade the EIP bandwidth specification. For more information, see Modify the configuration of a subscription EIP or Modify the configuration of a pay-as-you-go EIP. | ||||
EIP status check | Check whether EIPs run as expected. | The EIP is in the Disabled or Inactive state. | Low | Check whether the EIP is in an intermediate state or another abnormal state. | ||
Network cost optimization | Idle EIP check | Check whether idle EIPs exist. | An EIP is not associated with instances. | Low | The EIP is not associated with instances but fees are still charged for the EIP. Check whether the EIP can be released to reduce costs. For more information, see Release a pay-as-you-go EIP. | |
NAT | Network stability | NAT gateway load check | Check the loads of NAT gateways within an inspection cycle, including the number of concurrent connections, number of new connections, traffic processing rate, and loads of SNAT source ports. This helps you assess whether the current resource configuration meets the business development requirements and identify network risks that arise from insufficient resources and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate that connections are dropped because the number of NAT sessions exceeds the upper limit. | Medium | Upgrade the specification of the NAT gateways or change the billing method of the NAT gateways to pay-as-you-go. For more information, see the following topics: |
An alert is triggered within the most recent inspection cycle to indicate that new NAT sessions are dropped because the number of new NAT sessions exceeds the upper limit. | High | Reallocate the traffic through NAT gateways or change the billing method of NAT gateways to pay-as-you-go. For more information, see the following topics: | ||||
An alert is triggered within the most recent inspection cycle to indicate an SNAT source port allocation failure. | High | Add more EIPs to the SNAT IP address pool. For more information, see Create and manage an Internet NAT gateway. | ||||
CEN | Network stability | Inter-region bandwidth usage check | Check the usage of the inter-region bandwidth of Cloud Enterprise Network (CEN) instances and collect statistics on the frequencies of packet loss due to high or excessive bandwidth usage within an inspection cycle. This helps you assess whether the current bandwidth meets your business requirements and identify network risks that arise from insufficient bandwidth and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate that packet loss occurs because the bandwidth usage of inter-region connections exceeds the upper limit. | High | Increase bandwidth of inter-region connections. For more information, see Modify the maximum bandwidth value of an inter-region connection. |
Packet loss occurs because traffic throttling is triggered by the quality of service (QoS) queues of inter-region connections. | High | Increase bandwidth of inter-region connections or modify the traffic scheduling configuration of inter-region connections. For more information, see Modify the maximum bandwidth value of an inter-region connection or Use traffic scheduling to limit bandwidth for inter-region connections. | ||||
Transit router connection high availability check | Check the high availability of connections between network instances and transit routers. To ensure network high availability, after you connect a network instance to a transit router, configure redundant connections on the transit router. | Only one zone (vSwitch) of the virtual private cloud (VPC) is connected to a transit router. When the zone is down, you cannot switch to other zones. This may cause business interruptions. | High | To ensure network high availability, after you connect a VPC to a transit router, make sure that the transit router has a redundant connection. When you create VPC connections, specify a vSwitch in each zone supported by Enterprise Edition transit routers to implement zone-disaster recovery and reduce the distance of data transfer. | ||
Transit router routing check | Check whether potential risks exist in the routing configuration of the current transit router and provide suggestions on how to optimize the configuration. | The number of routes in the route table of the Basic Edition transit router has reached 80% of the quota limit. When the quota limit is reached, routes can no longer be added to the route table. This may cause network failures. | Medium | Upgrade transit routers from Basic Edition to Enterprise Edition. For more information, see Upgrade transit routers from Basic Edition to Enterprise Edition. Compared with Basic Edition transit routers, each Enterprise Edition transit router supports a quota of 10,000 routes, custom route tables, and flow logs. | ||
VPC-to-TR route check | Check whether route conflicts or risks exist in VPC-to-TR connections and provide suggestions on how to optimize the routes. | If VPCs connected to the same CEN instance have overlapping private CIDR blocks, the CEN instance may experience a route conflict. | Medium | Make sure that the CIDR blocks of VPCs and vSwitches attached to the same CEN instance do not overlap. | ||
VPC connection bandwidth usage check | Check the bandwidth usage of connections between VPCs and CEN instances and the frequency of packet loss due to excessive bandwidth usage within an inspection cycle. This helps you assess whether the current bandwidth meets the business development requirements and identify network risks that may cause business interruptions due to insufficient bandwidth. | An alert is triggered within the most recent inspection cycle to indicate that packet loss occurs because the bandwidth usage of VPC connections exceeds the upper limit. | High | Enable the flow log feature for VPC connections and analyze whether the percentage of business traffic is as expected based on flow logs. For more information, see Configure a flow log. | ||
Virtual private network (VPN) | Network stability | VPN gateway load check | Check the loads of VPN gateways, risks of excessive bandwidth usage, frequency of Border Gateway Protocol (BGP) route advertisement overage within an inspection cycle. This helps you assess the health of VPN gateways and identify network risks that arise from insufficient resources and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate a BGP route advertisement overage. | High | Take note of the risk. If the quota is exceeded, aggregate the CIDR blocks of the peer VPN device based on your network planning. |
An alert is triggered within the most recent inspection cycle to indicate that the bandwidth usage of the VPN gateway exceeds the upper limit. | Medium | Check whether the bandwidth of the instances on this physical connection meets your business requirements. Increase the bandwidth of the VPN gateway or purchase one or more new VPN gateways to increase bandwidth. If no exception occurs, ignore this alert. For more information, see Modify the configuration of a VPN gateway or Create and manage a VPN gateway. | ||||
VPN redundancy check | Check the VPN redundancy configuration. | One tunnel in the IPsec-VPN dual-tunnel mode is failed to be negotiated. Consequently, cross-zone high availability becomes invalid. | High | Add all tunnels of the VPN gateway to the IPsec-VPN connection to ensure cross-zone high availability. For more information, see Create and manage IPsec-VPN connections in dual-tunnel mode. | ||
The VPN gateway is deployed in one zone. Therefore, it does not support cross-zone high availability for disaster recovery. | High | Enable cross-zone high availability for the VPN gateway and enable the dual-tunnel mode. For more information, see Upgrade a VPN gateway to enable the dual-tunnel mode. | ||||
ALB | Network stability | ALB instance VIP load check | Check the loads of the virtual IP addresses (VIPs) of ALB instances within an inspection cycle, including sessions, connections, queries per second (QPS), and bandwidth. This helps you assess whether the current resource configuration meets the business development requirements and identify network risks that arise from insufficient resources and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate that new connections are dropped because the number of ALB sessions exceeds the upper limit. | High | ALB domain name resolution has limits for new connections on a single VIP address. Use ALB based on CNAME resolution. For more information, see Add a CNAME record to an ALB instance. |
An alert is triggered within the most recent inspection cycle to indicate that the QPS of the ALB instance exceeds the upper limit. | High | ALB domain name resolution has limits for QPS on a single VIP address. Use ALB based on CNAME resolution. For more information, see Add a CNAME record to an ALB instance. | ||||
An alert is triggered within the most recent inspection cycle to indicate that packet loss occurs because the private bandwidth usage of the ALB instance exceeds the upper limit. | High | ALB domain name resolution has limits for bandwidth on a single VIP address. Use ALB based on CNAME resolution. For more information, see Add a CNAME record to an ALB instance. | ||||
ALB deployment high availability check | Check whether the backend servers associated with the ALB listener are spread across zones to ensure the high availability of the application. | The backend servers of the ALB listener are deployed in one zone (default backend server group). | Medium | The deployment architecture of the current ALB listener has zone risks. If the zone is down, your service will be interrupted. Spread the backend servers of the listener and forwarding rules across at least two zones to minimize the impact of single point of failure. If you want to migrate your servers across zones, see Migration guide. | ||
Network Load Balancer (NLB) | Network stability | NLB instance VIP load check | Check the loads of the VIPs of NLB instances within an inspection cycle, including new connections and concurrent connections. This helps you assess whether the current resource configuration meets the business development requirements and identify network risks that arise from insufficient resources and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate that the number of NLB connection failures sharply increases. | High | The issue may occur due to the following reasons:
|
An alert is triggered within the most recent inspection cycle to indicate that new NLB connections are dropped. | High | The issue may occur due to the following reasons:
| ||||
An alert is triggered within the most recent inspection cycle to indicate that the number of new NLB connections exceeds the upper limit. | High | The number of new connections on a single VIP address of an NLB instance exceeds the upper limit. As a result, new connection requests are dropped for consecutive milliseconds or seconds. Purchase multiple NLB instances or contact your account manager. | ||||
An alert is triggered within the most recent inspection cycle to indicate that the number of concurrent connections on an NLB instance exceeds the upper limit. | High | The number of new connections on a single VIP address of an NLB instance exceeds the upper limit. As a result, new connection requests are dropped for consecutive milliseconds or seconds. Purchase multiple NLB instances or contact your account manager. | ||||
NLB deployment high availability check | Check whether the backend servers associated with the NLB listener are spread across zones to ensure the high availability of the application. | Multiple backend servers of an NLB listener are deployed in a single zone. | Medium | The deployment architecture of the current NLB listener has zone risks. If the zone is down, your service will be interrupted. Spread the backend servers of the listener across at least two zones to minimize the impact of single point of failure. If you want to migrate your servers across zones, see Migration guide. | ||
Classic Load Balancer (CLB) | Network stability | CLB instance load check | Check the loads of CLB instances within an inspection cycle, including sessions, connections, and bandwidth. This helps you assess whether the current resource configuration meets the business development requirements and identify network risks that arise from insufficient resources and may cause business interruptions. | An alert is triggered within the most recent inspection cycle to indicate that packet loss occurs because the bandwidth usage of the CLB instance exceeds the upper limit. | High | Increase the bandwidth of the CLB instance. For more information, see Modify the configurations of a pay-as-you-go CLB instance. |
An alert is triggered within the most recent inspection cycle to indicate that new connections are dropped because the number of CLB sessions exceeds the upper limit. | High | Upgrade the configurations of the CLB instance or migrate the CLB instance to an ALB instance or an NLB instance. For more information, see the following topics: | ||||
An alert is triggered within the most recent inspection cycle to indicate that the number of CLB connection failures sharply increases. | High | This issue may be caused by the insufficient performance, excess loads, or business exceptions of the backend servers of CLB instances. Check the status of the backend servers. | ||||
VBR | Network stability | BGP connection status check | Check the status of BGP connections created over Express Connect circuits and the frequency of Express Connect circuit failures within an inspection cycle. This helps you monitor the quality of leased lines and identify stability risks at the earliest opportunity. | An alert is triggered within the most recent inspection cycle to indicate a BGP connection failure. | High | Contact the ISP of the line to check whether the Express Connect circuit is abnormal. |
Express Connect circuit check | Check the status of Express Connect circuits and the frequency of BGP connection failures within an inspection cycle. This helps you monitor the quality of leased lines and identify stability risks at the earliest opportunity. | An alert is triggered within the most recent inspection cycle to indicate an Express Connect circuit or connection failure. | High | Contact the ISP of the line to check whether the Express Connect circuit is abnormal. | ||
Health configuration check for static VBR routes | Check whether health checks are configured for VBR connections. | The CEN instance has a static route configured that points to the VBR, but the CEN instance has no health checks configured. | High | After you connect a VBR to a CEN instance, you can use the health check feature of CEN to probe the connectivity of the Express Connect circuits associated with the VBR. For more information, see Configure and manage health checks. If the CEN instance and a data center have redundant routes, automatic failover to an available route is supported in the event of an Express Connect circuit failure. This keeps your business uninterrupted. | ||
Health checks are not configured for the VBR-to-VPC connection. | High | When you connect a data center to a VPC on Alibaba Cloud by using redundant Express Connect circuits, configure health checks in the data center and on Alibaba Cloud to probe the connectivity of the Express Connect circuits. For more information, see Configure and manage health checks. If one of the Express Connect circuits is detected unhealthy, the system automatically routes network traffic over another Express Connect circuit that works as expected. | ||||
VBR connection redundancy check | Check the coverage of VBR connection redundancy to identify stability risks in scenarios in which Express Connect circuits are used. | No redundant routing is configured for the entire VPC-to-VBR connection. | Low | Select a routing redundancy solution based on your business requirements. For more information, see Connect a data center to Alibaba Cloud by creating a VBR-to-VPC connection. | ||
No redundant routing is configured for some CIDR blocks of the VPC-to-VBR connection. | Low | Check whether business traffic is forwarded to the CIDR blocks for which no redundant routing is configured. If yes, configure a redundant routing for the CIDR blocks. Select a routing redundancy solution based on your business requirements. For more information, see Connect a data center to Alibaba Cloud by creating a VBR-to-VPC connection. | ||||
No redundant routing is configured for some CIDR blocks of the TR-to-VBR connection. | Low | Check whether business traffic is forwarded to the CIDR blocks for which no redundant routing is configured. If yes, configure a redundant routing for the CIDR blocks. Select a routing redundancy solution based on your business requirements. For more information, see Connect a data center to Alibaba Cloud by using CEN. | ||||
No redundant routing is configured for the entire TR-to-VBR connection. | Low | Select a routing redundancy solution based on your business requirements. For more information, see Connect a data center to Alibaba Cloud by using CEN. |
Disable a network inspection task
You cannot create custom network inspection tasks. By default, NIS creates a free network inspection task for you. The task inspects your network on a weekly basis and generates reports.
You can disable the default network inspection task.
Log on to the NIS console.
In the left-side navigation pane, click Network Inspection.
On the Network Inspection page, find the default network inspection task and click Stop Inspection in the Actions column.
In the message that appears, click OK.
View network inspection reports
NIS retains your network inspection reports for one year.
Log on to the NIS console.
In the left-side navigation pane, click Network Inspection.
On the Network Inspection page, find the default network inspection task. Then, you can perform the following operations.
View the details of the latest report.
In the Newest Inspection Report column, click View the report to obtain network optimization suggestions.
On the report details page, you can view Basic Information, Inspection Summary, and Inspection Details.
In the Inspection Details section, you can view abnormal inspection items, optimization suggestions, and affected resources.
View historical network inspection reports
In the Newest Inspection Report column, click View historical reports.
In the Historical Inspection Reports section of the Historical Reports page, find the report that you want to view and click its ID. You can also click View Report in the Actions column of the report.
On the report details page, you can view Basic Information, Inspection Summary, and Inspection Details.
In the Inspection Details section, you can view abnormal inspection items, optimization suggestions, and affected resources.