Network Intelligence Service (NIS) offers a comprehensive library of cloud network diagnostics that inspects for stability, security, performance, cost optimization, and operational excellence. NIS provides observability into your cloud network architecture through network inspections, helping you accurately detect anomalies and receive optimization suggestions.
Scenarios
When you build and maintain large-scale networks and resources, configurations may deviate from best practices because of a limited understanding of cloud products. As configurations are continuously improved, the number of network resource instances grows, making it difficult to manually verify the configuration and usage of these resources. Network inspection helps you examine your entire network architecture and resources and provides optimization suggestions.
Inspection items
Resource type | Inspection category | Inspection item | Description | Risk description | Risk level | Optimization suggestions |
EIP | Network Stability | EIP bandwidth usage check | Checks the EIP bandwidth usage during the inspection period. It counts the frequency of high bandwidth utilization or packet loss due to bandwidth limits. This helps you assess whether the current resource bandwidth meets business requirements and identify network risks caused by insufficient bandwidth. | A risk alert for imminent public bandwidth overage was triggered during the last inspection period. | Medium | Change the EIP bandwidth. For more information, see Modify the bandwidth of a subscription EIP or Modify the bandwidth of a pay-as-you-go EIP. |
A packet loss alert was triggered during the last inspection period because the public bandwidth was overused. | High | Change the EIP bandwidth. For more information, see Modify the bandwidth of a subscription EIP or Modify the bandwidth of a pay-as-you-go EIP. | ||||
EIP running status check | Checks whether any EIPs are running abnormally. | The EIP is disabled or inactive. | Low | The EIP is disabled or inactive. Check whether the EIP instance is in a transitional state or another abnormal state. | ||
Network cost optimization | Idle EIP check | Checks for idle EIPs. | The EIP is not attached to an instance. | Low | The EIP is not attached to an instance but still incurs charges. To save costs, release the EIP instance if it is no longer needed. For more information, see Release an EIP instance. | |
NAT | Network Stability | NAT processing usage check | Checks the processing usage of the NAT Gateway during the inspection period. This includes identifying the overuse of concurrent connections, new connections, traffic processing rates, and SNAT source ports. This helps you assess whether the current resource configuration meets business requirements and identify network risks caused by insufficient resources. | Connections were dropped because the NAT session limit was exceeded during the last inspection period. | Medium | Upgrade the NAT Gateway instance type or change the billing method to pay-as-you-go. For more information, see: |
An alert for dropped new sessions was triggered during the last inspection period because the NAT limit was exceeded. | High | Reallocate the traffic that flows through the NAT Gateway instance or change the billing method to pay-as-you-go to increase traffic processing capacity. For more information, see: | ||||
An alert for failed SNAT source port allocation was triggered during the last inspection period. | High | Increase the number of EIPs in the address pool of the SNAT rule. For more information, see Internet NAT Gateway. | ||||
CEN | Network Stability | Inter-region bandwidth usage check | Checks the usage of CEN inter-region bandwidth during the inspection period. It counts the frequency of high bandwidth utilization or packet loss due to bandwidth limits. This helps you assess whether the current resource bandwidth meets business requirements and identify network risks caused by insufficient bandwidth. | A packet loss alert was triggered during the last inspection period because the inter-region connection bandwidth limit was exceeded. | High | Upgrade the inter-region connection bandwidth. |
Packets were dropped because of rate limiting in the traffic rerouting queue of the inter-region connection. | High | Upgrade the inter-region connection bandwidth or adjust the traffic rerouting configuration for the inter-region connection. | ||||
TR connection high availability check | Checks for potential risks where a lack of high availability for network instances connected to a transit router (TR) could cause service unavailability during a failure. For high availability, the best practice is to ensure that redundant links are configured under the TR after a network instance is connected. | A VPC is connected to a TR using resources in only a single zone. If this zone fails, a switch to another zone is not possible, which may cause service failures. | High | For high availability, after you connect a VPC to a transit router, ensure that redundant links are configured under the transit router. When you create a VPC connection, specify a vSwitch in each zone that is supported by the Enterprise Edition transit router. This provides zone-level disaster recovery for the VPC connection and reduces traffic detours. | ||
TR route configuration risk check | Checks for risks in the current TR routing configuration and provides optimization suggestions. | The number of routes in the route table of a Basic Edition TR has reached 80% of its quota. If the quota is exceeded, new routes cannot be added to the TR route table, which may cause a network disconnection. | Medium | We recommend that you upgrade to an Enterprise Edition transit router. Compared with the Basic Edition, the Enterprise Edition provides a quota of 40,000 route entries and rich features, such as custom route tables and flow logs. | ||
VPC-to-TR connection route risk check | Checks for route access conflicts and risks when a VPC is connected to a TR and provides configuration optimization suggestions. | The private CIDR blocks of VPCs connected to the same CEN overlap. This may cause route conflicts in the CEN. | Medium | Plan your VPC CIDR blocks so that the VPCs and vSwitches that are added to the same CEN use non-overlapping CIDR blocks. | ||
VPC connection bandwidth usage check | Checks the usage of CEN VPC connection bandwidth during the inspection period. It counts the frequency of packet loss due to bandwidth limits. This helps you assess whether the current resource bandwidth meets business requirements and identify network risks caused by insufficient bandwidth. | A packet loss alert was triggered during the last inspection period because the VPC connection bandwidth limit was exceeded. | High | We recommend that you enable the flow log feature for your VPC connection and use the flow logs to analyze whether the proportion of service traffic meets expectations. | ||
VPN | Network Stability | VPN usage limit check | Checks the VPN service usage during the inspection period. It counts the frequency of bandwidth overage risks and BGP dynamic routing propagation limit overages. This helps you assess the health of the VPN service and identify network risks caused by insufficient resource configuration. | A risk alert for exceeding the BGP dynamic routing limit was triggered during the last inspection period. | High | Monitor this. If the limit is exceeded, perform route aggregation on the peer VPN device based on your network planning. |
A risk alert for exceeding the VPN bandwidth limit was triggered during the last inspection period. | Medium | Check whether the instance bandwidth on this link meets your business requirements. If the bandwidth is insufficient, we recommend that you upgrade the VPN bandwidth specification or purchase a new instance to increase the VPN bandwidth. Otherwise, you can ignore this alert. | ||||
VPN redundancy check | Checks the VPN redundancy configuration. | One of the two tunnels in the VPN connection failed to negotiate, which caused the zone-level high availability to fail. | High | Establish an IPsec-VPN connection between all tunnels of the instance and the peer device to restore zone-level high availability. For more information, see IPsec-VPN connection (VPN Gateway). | ||
The VPN instance is deployed in a single zone and does not have multi-zone disaster recovery capabilities. This poses a significant risk. | High | For the VPN instance, enable AZ high availability and dual tunnels. | ||||
ALB | Network Stability | ALB virtual IP processing usage check | Checks the load on the ALB virtual IP address (VIP) during the inspection period. This includes identifying the load on sessions, connections, QPS, and bandwidth. This helps you assess whether the current resource configuration meets business requirements and identify network risks caused by insufficient resources. | An alert for lost new connections was triggered during the last inspection period because the ALB session limit was exceeded. | High | A single VIP resolved from an ALB domain name has a limit on new connections. Use ALB by configuring a CNAME record for domain name resolution. For more information, see Configure CNAME resolution for an ALB instance. |
An alert for the ALB QPS limit being exceeded was triggered during the last inspection period. | High | A single VIP resolved from an ALB domain name has a QPS limit. Use ALB by configuring a CNAME record for domain name resolution. For more information, see Configure CNAME resolution for an ALB instance. | ||||
A packet loss alert was triggered during the last inspection period because the ALB private bandwidth limit was exceeded. | High | A single VIP resolved from an ALB domain name has a bandwidth limit. Use ALB by configuring a CNAME record for domain name resolution. For more information, see Configure CNAME resolution for an ALB instance. | ||||
ALB high availability deployment check | Checks whether the backend servers under an ALB listener are deployed in multiple zones to ensure high availability of the listener service. | Multiple backend servers of an ALB listener are deployed in a single zone (for the default forwarding server group). | Medium | The current deployment architecture of the ALB listener has a zone-level risk. If a zone-level failure occurs, the service will become unavailable. Deploy the backend servers for the listener and forwarding rules across two or more zones to reduce the failure blast radius. To migrate servers across zones, see the migration guide. | ||
NLB | Network Stability | NLB virtual IP processing usage check | Checks the load on the NLB VIP during the inspection period. This includes identifying the load on new connections and concurrent connections. This helps you assess whether the current resource configuration meets business requirements and identify network risks caused by insufficient resources. | An alert for a sudden increase in failed NLB connections was triggered during the last inspection period. | High | Possible causes:
|
An alert for dropped new NLB connections was triggered during the last inspection period. | High | Possible causes:
| ||||
An alert for the NLB new connection limit being exceeded was triggered during the last inspection period. | High | The auto-scaling limit of a single NLB VIP is exceeded, and new connection requests are continuously dropped. Split the workload across multiple NLB instances or contact your account manager to request a quota increase. | ||||
An alert for the NLB concurrent connection limit being exceeded was triggered during the last inspection period. | High | The auto-scaling limit of a single NLB VIP is exceeded, and new connection requests are continuously dropped. Split the workload across multiple NLB instances or contact your account manager to request a quota increase. | ||||
NLB high availability deployment check | Checks whether the backend servers under an NLB listener are deployed in multiple zones to ensure high availability of the listener service. | Multiple backend servers of an NLB listener are deployed in a single zone. | Medium | The current deployment architecture of the NLB listener has a zone-level risk. If a zone-level failure occurs, the service will become unavailable. Deploy the backend servers for the listener across two or more zones to reduce the failure blast radius. To migrate servers across zones, see the migration guide. | ||
CLB | Network Stability | CLB processing usage check | Checks the load on the CLB instance during the inspection period. This includes identifying the load on sessions, connections, and bandwidth. This helps you assess whether the current resource configuration meets business requirements and identify network risks caused by insufficient resources. | A packet loss alert was triggered during the last inspection period because the CLB bandwidth limit was exceeded. | High | Upgrade the CLB instance bandwidth. For more information, see Upgrade or downgrade a pay-as-you-go CLB instance. |
An alert for lost new connections was triggered during the last inspection period because the CLB session limit was exceeded. | High | Upgrade the CLB instance or migrate the CLB instance to an ALB or NLB instance. For more information, see: | ||||
An alert for a sudden increase in failed CLB connections was triggered during the last inspection period. | High | Common causes for this issue include exceeded backend server specifications, high load, or service anomalies. Check the status of your backend services. | ||||
VBR | Network Stability | BGP connection status check | Checks the running status of the BGP connection for the leased line during the inspection period. It counts the frequency of dedicated connection port anomalies. This helps you monitor the quality of the carrier's leased line link and promptly detect stability risks. | A BGP connection failure was triggered during the last inspection period. | High | Contact the carrier to check the Express Connect circuit for anomalies. |
Express Connect circuit port check | Checks the running status of the Express Connect circuit port during the inspection period. It counts the frequency of BGP connection anomalies. This helps you monitor the quality of the carrier's leased line link and promptly detect stability risks. | An alert for a dedicated connection port or link failure was triggered during the last inspection period. | High | Contact the carrier to check the Express Connect circuit for anomalies. | ||
VBR static route health configuration check | Checks whether a health check is configured for the VBR connection. | A static route that points to the VBR is configured on the CEN, but no corresponding health check is configured on the CEN. | High | After you connect a VBR to CEN, you can use the CEN health check feature to probe the connectivity of the associated Express Connect circuit. If the CEN instance and a data center have redundant routes, the system can automatically fail over to an available route in the event of an Express Connect circuit failure. This ensures that your business is not interrupted. | ||
No health check is configured for the VBR uplink. | High | When you connect an on-premises data center to a VPC using redundant Express Connect circuits, we recommend that you configure health checks in your on-premises data center and on the Alibaba Cloud side to monitor the connectivity of the Express Connect circuits. If one of the Express Connect circuits is detected as unhealthy, the system automatically routes network traffic over another Express Connect circuit that is working as expected. | ||||
VBR redundancy check | Checks the completeness of the VBR redundancy configuration to identify stability risks in leased line scenarios. | No redundant lines are configured from the VPC to the VBR. | Low | No redundant lines are configured from the VPC to the VBR. You can select a line redundancy solution as needed. For more information, see Connect an on-premises data center to the cloud through a VBR. | ||
Redundant lines are not configured for some CIDR blocks from the VPC to the VBR. | Low | Confirm whether there is service traffic in the route CIDR blocks without redundancy. If so, configure redundant lines. You can select a line redundancy solution as needed. For more information, see Connect an on-premises data center to the cloud through a VBR. | ||||
Redundant lines are not configured for some CIDR blocks from the TR to the VBR. | Low | Confirm whether there is service traffic in the route CIDR blocks without redundancy. If so, configure redundant lines. You can select a line redundancy solution as needed. For more information, see Connect an on-premises data center to the cloud through an ECR. | ||||
No redundant lines are configured from the TR to the VBR. | Low | No redundant lines are configured from the TR to the VBR. You can select a line redundancy solution as needed. For more information, see Connect an on-premises data center to the cloud through an ECR. | ||||
PrivateLink | Network Stability | PrivateLink endpoint high availability deployment check | PrivateLink helps users securely and stably access services that are deployed in other VPCs from within their VPCs and on-premises data centers over a private network. This check verifies whether PrivateLink interface endpoints and endpoint services are deployed in multiple zones to ensure high availability for service access. Note This check can inspect only the zone-level high availability risk of the link from the interface endpoint to the endpoint service. It cannot determine the zone-level high availability risk of the service itself that is accessed through the interface endpoint. | The interface endpoint instance is in a single zone. | High | Add a new zone to the interface endpoint instance to ensure multi-zone disaster recovery. For more information, see Create and manage endpoint ENIs. Note An interface endpoint instance that includes one endpoint ENI in one zone is a billable instance. Adding a zone increases costs. |
PrivateLink endpoint service high availability deployment check | The endpoint service instance is in a single zone. | High | Add service resources to the endpoint service so that it can provide services in multiple zones. |
View network inspection reports
By default, Network Intelligence Service enables a free basic network inspection task for you. This task performs a comprehensive network inspection once a week and provides a report. Creating custom network inspection tasks is not supported.
Inspection reports are retained for one year.
Log on to the NIS console.
In the navigation pane on the left, choose Network Inspection.
On the Network Inspection page, find the default task and perform the following operations.
View the details of the latest inspection report
In the Latest Inspection Report column, click View the report to obtain network optimization suggestions>>.
On the inspection report details page, you can view the Basic Information, Inspection Summary, and Inspection Details sections.
On the Inspection Details page, you can review abnormal results, optimization suggestions, and affected resources.
View the details of historical reports
In the Latest Inspection Report column, click View historical reports>>.
In the Historical Reports section on the Historical Inspection Reports page, find the target report and click its report ID or View Report in the Actions column.
On the inspection report details page, you can view the Basic Information, Inspection Summary, and Inspection Details.
On the Inspection Details page, you can view abnormal results, optimization suggestions, and affected resources.
Manage network inspection tasks
Rerun a network inspection task
If your resources have changed, you can rerun the network inspection task. Before you start, make sure the inspection task is Enabled.
On the Network Inspection page, find the target network inspection task and click View the report to obtain network optimization suggestions>> in the Latest Inspection Report column. On the inspection report details page, click Rerun Inspection in the upper-right corner.
Disable or enable a network inspection task
On the Network Inspection page, find the default network inspection task and click Disable Inspection or Enable Inspection in the Actions column.
Delete a network inspection task
You must disable an inspection task before you can delete it. When you delete a network inspection task, its reports are also deleted.
On the Network Inspection page, find the default network inspection task and click Delete in the Actions column.