All Products
Search
Document Center

Link test method when the ping command is used for packet loss or fails

Last Updated: Jan 14, 2021

Disclaimer: This article may contain information about third-party products. Such information is for reference only. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

Introduction

If a client has ping loss or fails to access the target server, you can use tracert or mtr to perform a link test to determine the cause of the problem. This article describes how to test and analyze the link by using tools.

Background

Alibaba Cloud reminds you that:

  • Before you perform operations that may cause risks, such as modifying instance configurations or data, we recommend that you check the disaster recovery and fault tolerance capabilities of the instances to ensure data security.
  • If you modify the configurations and data of instances including but not limited to ECS and RDS instances, we recommend that you create snapshots or enable RDS log backup.
  • If you have authorized or submitted security information such as the logon account and password in the Alibaba Cloud Management console, we recommend that you modify such information in a timely manner.

If packet loss or network disconnection occurs using the ping command, you can perform the link test and handle it according to the following procedure. For more information about tools used in the link test, see more information.

Link test steps

Step 1: obtain the public IP address of your on-premises network

Visit Taobao IP address library in the local network of the client to obtain the public IP address corresponding to the local network.

Step 2: Test the forward link (ping and mtr)

From the client to the target server to do the following tests:

  • Perform continuous ping test from the client to the domain name or IP address of the target server. We recommend that you ping at least 100 data packets and record the test results.
  • Based on the operating system of the client, use WinMTR or mtr, set the test destination address as the domain name or IP address of the target server, perform the link test, and record the test results.

Step 3: Reverse link test (ping and mtr)

Test the system on the target server as follows:

  • Perform continuous ping test from the target server to the client IP address obtained in step 1. We recommend that you ping at least 100 data packets, and record the test results.
  • Based on the operating system of the target server, use WinMTR or mtr, set the test destination address to the client IP address, perform the link test, and record the test results.

Step 4: analyze test results

See the brief analysis of test results to analyze the test results. After you confirm the node, visit the following links or other websites where you can query the IP location to obtain the operator information of the abnormal node. If the related node in the local network of the client is abnormal, troubleshoot and analyze the local network. If the operator node is abnormal, report the problem to the operator. The query result is as follows:

Brief analysis of test results

Since mtr(WinMTR) has a higher accuracy, this article takes its test results as an example, see the following points for analysis. 

Key point 1: network area

Normally, the entire link from the client to the target server contains the following network zones:

  • Local client network: that is, local area network and local network provider network. For example, area A is shown in the preceding figure. If an exception occurs in the region and the node of the client is abnormal, you need to troubleshoot and analyze the local network. If the network of the local network provider is abnormal, report the problem to the local operator.
  • Backbone network of service providers: Region B in the preceding figure. If an exception occurs in a region, you can query the operator based on the IP address of the abnormal node and directly report the exception to the corresponding operator. You can also send feedback to the operator through Alibaba Cloud Technical support.
  • Local network of the target server: the network of the provider to which the target server belongs. For example, Area C in the preceding figure. If an exception occurs in a region, report the exception to the network operator to which the target server belongs.

Point 2: link load balancing

For example, Area D in the preceding figure. If link load balancing is enabled for some parts of the intermediate link, mtr conducts numbering and probe statistics only for the start and end nodes. Only the corresponding IP addresses or domain names are displayed for the intermediate nodes.

Key 3: comprehensive judgment based on Avg (average) and StDev (standard deviation)

Due to link jitter or other factors, the node Best and Worst values may vary greatly. Avg counts the average value of all probes since the link test, so it can better reflect the network quality of the corresponding node. The higher the StDev value, the different the latency values of the data packets on the corresponding node, that is, the higher the dispersion. Therefore, the standard deviation value can be used to help judge whether Avg actually reflects the network quality of the corresponding node. For example, if the standard deviation is large, the delay of the packet is uncertain. Some data packets may have a small latency of, for example, 25ms, while others have a high latency of, for example, 350ms. However, the final average latency may be normal. Therefore, Avg cannot reflect the actual network quality well.

Based on the above, the analysis criteria are as follows:

  • If StDev is very high, it will simultaneously observe the Best and Worst of the corresponding node to determine whether the corresponding node has exceptions.
  • If StDev is not high, Avg is used to determine whether the corresponding node has an exception.
    Note: the StDev is high or not high, and there is no specific time range standard. Make a relative evaluation based on the latency values in other columns of the same node. For example, if Avg is 30ms, then StDev is 25ms, it is considered to be a high deviation. However, if Avg is 325MS, StDev is also 25ms, which means that the deviation is not high.

Point 4: judgment of Loss%(packet Loss rate)

If the Loss%(packet Loss rate) of any node is not zero, this hop network may have a problem. The possible causes of packet loss on the corresponding node are as follows:

  • Based on the security or performance requirements of the carrier, the ICMP transmission rate is limited, resulting in packet loss.
  • The node does have an exception, resulting in packet loss.

Based on the packet loss of the abnormal node and its subsequent nodes, refer to the following content to determine the cause of packet loss.

  • If no packet loss occurs on the subsequent nodes, it generally indicates that the packet loss on the abnormal node is caused by the operator policy restrictions. Relevant packet loss can be ignored. As shown in jump 2 in the above figure.
  • If packet loss occurs on the subsequent node, it generally means that the abnormal node does have a network exception, resulting in packet loss. As shown in the fifth hop in the above figure.

In addition, the above two situations may occur at the same time, that is, the corresponding node has both a policy speed limit and a network exception. In this case, if the abnormal node and its subsequent nodes continuously suffer packet loss, and the packet loss rates of each node are different, the packet loss rate of the last few hops is generally used. As shown in the preceding figure, packet loss occurs in the 5th, 6th, and 7th hops. Therefore, the final packet loss is 40% of the 7th hop as a reference.

Key Point 5: about delay

There are two scenarios for latency:

Scenario 1: delay hopping

If the latency increases sharply after a hop, it is generally determined that the node has a network exception. As shown above, the latency of subsequent nodes after the fifth hop has increased sharply, which means that a network exception occurs on the fifth hop node. However, high latency does not necessarily mean that the corresponding node has exceptions. As shown in the preceding figure, after the fifth hop, although the latency of subsequent nodes increases sharply, the test data still reaches the target host normally. Therefore, the large latency may also be caused by the data return link. Therefore, it needs to be analyzed together with the reverse link test.

Scenario 2: increased latency caused by ICMP speed limits

The ICMP policy speed limit may also cause a sharp increase in latency for the corresponding node, but the subsequent nodes will usually return to normal. As shown in the above figure, the packet loss rate of the 3rd hop is 100%, and the delay also increases sharply. However, the latency of the node immediately returned to normal. Therefore, the sharp increase in latency and packet loss on the node are determined by the policy speed limit.

Solutions after testing

Common link exceptions

Common link exceptions and test reports are as follows:

Scenario 1: improper network configuration of the target host

Table B has the following three rows and is used as the right table for all JOIN operations in this topic.

[root@mycentos6 ~]# mtr -no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. ???
2. ???
3. 1XX.X.X.X 0.0% 10 521.3 90.1 2.7 521.3 211.3
4. 11X.X.X.X 0.0% 10 2.9 4.7 1.6 10.6 3.9
5. 2X.X.X.X 80.0% 10 3.0 3.0 3.0 3.0 0.0
6. 2X.XX.XX.XX 0.0% 10 1.7 7.2 1.6 34.9 13.6
7. 1XX.1XX.XX.X 0.0% 10 5.2 5.2 5.1 5.2 0.0
8. 2XX.XX.XX.XX 0.0% 10 5.3 5.2 5.1 5.3 0.1
9. 173.194.200.105 100.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, the packet has a 100% packet loss at the destination address. From the data point of view, it is likely that the data packet did not arrive. In fact, it is probably because ICMP is disabled in the relevant security policies (such as firewall and iptables) of the target server, that is, the target host cannot send any response. Therefore, in this scenario, you must check the security policy configuration of the target server.

Scenario 2: ICMP speed limits

Table B has the following three rows and is used as the right table for all JOIN operations in this topic.

[root@mycentos6 ~]# mtr --no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.X.XX 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. 72.14.233.56 60.0% 10 27.2 25.3 23.1 26.4 2.9
6. 209.85.254.247 0.0% 10 39.1 39.4 39.1 39.7 0.2
7. 64.233.174.46 0.0% 10 39.6 40.4 39.4 46.9 2.3
8. gw-in-f147.1e100.net 0.0% 10 39.6 40.5 39.5 46.7 2.2

In this example, obvious packet loss occurred in the fifth hop, but no exception was found on subsequent nodes. Therefore, it is inferred that the ICMP speed limit of the node is caused. This scenario has no impact on the data transmission from the final client to the target server. Therefore, it can be ignored during analysis.

Scenario 3: Loop

Table B has the following three rows and is used as the right table for all JOIN operations in this topic.

[root@mycentos6 ~]# mtr -no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.7X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.6X.X 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. 72.14.233.56 0.0% 10 0.0 0.0 0.0 0.0 0.0
6. 72.14.233.57 0.0% 10 0.0 0.0 0.0 0.0 0.0
7. 72.14.233.56 0.0% 10 0.0 0.0 0.0 0.0 0.0
8. 72.14.233.57 0.0% 10 0.0 0.0 0.0 0.0 0.0
9 ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, the packet has a circular jump after hop 5, causing the target server to eventually fail to reach. This is usually caused by an exception in the route configuration of the operator-related node. Therefore, contact the operator to which the node belongs in this scenario.

Scenario 4: link interruption

Table B has the following three rows and is used as the right table for all JOIN operations in this topic.

[root@mycentos6 ~]# mtr -no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.7X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.6X.X 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
6. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
7. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
8. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
9 ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, the packet cannot receive any feedback after the 4th hop. This is usually caused by the interruption of the corresponding node. We recommend that you combine the reverse link test for further confirmation. In this scenario, contact the corresponding operator.

References

The tools used for link test vary with operating systems. These tools are described as follows:

Linux:

Two link test tools are briefly introduced here:

Tool 1: mtr command

mtr (My traceroute) is almost the network testing tool pre-installed in all Linux releases. It combines the functions of ping and traceroute, so the function is more powerful. By default, mtr Sends ICMP data packets for link detection. You can also use the -u parameter to specify the UDP data packets for detection. Compared with traceroute, mtr performs only one link tracking test. mtr continuously detects the relevant nodes on the link and provides corresponding statistical information. Therefore, mtr can avoid the impact of node fluctuations on the test results, so the test results are more correct, it is recommended that you use it first.

Usage instructions

mtr [-BfhvrwctglxspQomniuT46] [--help] [--version] [--report]
[--report-wide] [--report-cycles=COUNT] [--curses] [--gtk]
[--csv|-C] [--raw] [--xml] [--split] [--mpls] [--no-dns] [--show-ips]
[--address interface] [--filename=FILE|-F]
[--ipinfo=item_no|-y item_no]
[--aslookup|-z]
[--psize=bytes/-s bytes] [--order fields]
[--report-wide|-w] [--inet] [--inet6] [--max-ttl=NUM] [--first-ttl=NUM]
[--bitpattern=NUM] [--tos=NUM] [--udp] [--tcp] [--port=PORT] [--timeout=SECONDS]
[--interval=SECONDS] HOSTNAME

Description of common optional parameters

  • -- report: Displays reports.
  • -- split: lists the results of each trace separately, instead of calculating the whole result.
  • -- psize: specify the size of the ping packet.
  • -- no-dns: no reverse resolution is required for the IP address.
  • -- address: the IP address that is used to send data packets when the host has multiple IP addresses.
  • -4: only uses IPv4 protocol.
  • -6: uses only the IPv6 protocol.

In addition, you can also enter letters such as the following to quickly switch the mtr mode.

  • ? Or h: the help menu is displayed.
  • d: switch the display mode.
  • n: Enables or disables DNS resolution.
  • u: switches to ICMP or UDP data packets for detection.

Sample command output

Return values

The following table describes the columns in the returned results based on the default configuration.

  • The first column (Host): node IP address and domain name. Press the n key to switch the display.
  • Second column (Loss%): node packet Loss rate.
  • The third column (Snt): The number of packets sent per second. The default value is 10, which can be specified by the -c parameter.
  • The fourth column (Last): the delay of the most recent probe.
  • Columns 5, 6, and 7 (Avg, Best, and Worst): Average, minimum, and maximum values of the detection latency, respectively.
  • Column 8 (StDev): standard deviation. The larger the standard deviation, the more unstable the corresponding node is.

Tool 2: run the traceroute command

traceroute is pre-installed in almost all Linux releases. It is used to track the path that an Internet Protocol (IP) data packet passes to a target address.

  1. traceroute first sends small UDP detection packets with the maximum TTL value (Max_TTL).
  2. Then, it listens to the ICMP TIME_EXCEEDED response from the entire link starting from the Gateway. Traceroute sends UDP packets with the TTL value starting from 1 and increase the value by 1 each time until it receives the ICMP PORT_UNREACHABLE message.
    Note:
    • This process repeats until the destination is reached or the maximum number of TTL is tested.
    • traceroute sends UDP data packets for link detection by default. You can specify -I to use ICMP data packets for link detection.

Usage instructions

traceroute [-I] [ -m Max_ttl ] [ -n ] [ -p Port ] [ -q Nqueries ] [ -r ] [ -s SRC_Addr ] [ -t TypeOfService ] [ -f flow ] [ -v ] [ -w WaitTime ] Host [ PacketSize ]

Description of common optional parameters

  • D: Socket-level troubleshooting
  • -f: sets the TTL value for the first detection packet.
  • -F: specifies not to segment the identifier.
  • -g: source routing gateways. A maximum of eight routing gateways can be set.
  • -i: host has multiple network interface controller is to use the specified network interface controller sends a packet to The.
  • -L: use ICMP data packets instead of UDP data packets for detection.
  • -m: specifies the maximum TTL of the detected data packet.
  • -n: uses the IP address directly instead of the hostname (DNS lookup is disabled).
  • -p: sets the communication port of the UDP transmission protocol.
  • -r: ignores the common Routing Table and directly sends data packets to the target host.
  • -s: specifies the IP address that the local host sends data packets to.
  • -t: sets the value of the photos of the detected data packet.
  • -v: displays the command execution process in detail.
  • -w: Set the waiting time for the remote host to return packets.
  • -x: Enables or disables data packet verification.

Sample command output

For more information about how to use traceroute, see the man help of traceroute.

Windows:

Two link test tools are briefly introduced here:

Tool 1: WinMTR (recommended priority)

WinMTR is the graphical implementation of mtr tool in Windows environment, but the function is simplified, and only some mtr parameters are supported. By default, WinMTR Sends ICMP data packets for detection and cannot switch over. This is the same as mtr. Compared with tracert,WinMTR avoids the impact of node fluctuations on the test results, so the test results are more correct. Therefore, when WinMTR is available, it is recommended that WinMTR be used first for link testing.

 

Usage instructions

WinMTR does not need to be installed, you can directly decompress it and run it. The operation method is very simple, as follows:

  1. As shown in the following figure, after running the program, enter the domain name or IP address of the target server in the Host field. Do not include spaces.
  2. Click Start to Start the test. After the Test starts, the corresponding button changes to Stop.
  3. After a period of time, click Stop to Stop the test.
  4. Other options are described as follows.
    • Copy Text to clipboard: Copy the test results in Text format to the clipboard.
    • Copy HTML to clipboard: Copy the test results in HTML format to the clipboard.
    • Export TEXT: exports the test results to a specified file in TEXT format.
    • Export HTML: exports the test results to a specified file in HTML format.
    • Options: optional parameters, including the following.
      • Interval(sec): the Interval (expiration) of each probe. The default value is 1 second.
      • ping size(bytes): the size of the data packet used for ping Detection. The default value is 64 bytes.
      • Max hosts in LRU list: the maximum number of hosts supported by the LRU list. The default value is 128.
      • Resolve names: displays the relevant nodes by domain name based on reverse lookup of IP addresses.

Return values

The following table describes the columns in the returned results based on the default configuration.

  • The first column (Hostname): the IP address or domain name of the node.
  • Column 2 (Nr): node number.
  • The third column (Loss%): node packet Loss rate.
  • The fourth column (Sent): The number of Sent packets.
  • The Fifth Column (Recv): The number of packets that have been successfully received.
  • Columns 6, 7, 8, and 9 (Best, Avg, Worst, and Last): indicate the minimum, average, maximum, and Last delay reached the corresponding node.

Tool 2: tracert command line tool

tracert(Trace Route) is a network diagnosis command line program that comes with Windows, used to track the path that an Internet Protocol (IP) data packet passes to the target address. tracert Sends ICMP data packets to determine the route to the target address. For these data packets, tracert uses different IP address lifetime values, namely TTL values. Because routers along the way are required to reduce the TTL by at least 1 before forwarding data packets, the TTL is actually equivalent to a hop counter. When the TTL of a packet reaches 0, the corresponding node sends an ICMP timeout message to the source computer.

tracert first sends the packet whose TTL is 1, increases the TTL by 1 in each subsequent transmission, until the target address responds or reaches the maximum TTL value. The ICMP timeout messages sent back from intermediate routers contain information about the corresponding nodes.

Usage instructions

tracert [-d] [-h maximum_hops] [-j host-list] [-w timeout] [-R] [-S srcaddr] [-4] [-6] target_name

Description of common optional parameters

  • -d: do not resolve the address to the host name (reverse DNS lookup is disabled).
  • -h: specifies the maximum hops in the query of a destination address.
  • -j: specifies a loose source route along the host list.
  • -w:timeout indicates the timeout period (in milliseconds) for waiting for each reply.
  • -R: tracks the round trip path (IPv6 only).
  • -S: specifies the source IP address to be used, which is srcaddr. This parameter applies to IPv6 only.
  • -4: enforces the use of IPv4.
  • -6: forcibly uses IPv6.
  • target_host: the domain name or IP address of the target host.

Sample command output

C:\> tracert -d 223.5.5.5
Routes to 223.5.5.5 are tracked through up to 30 hops
1 Request Timeout.
2 9 ms 3 ms 12 ms 192.168.X.X
3 4 ms 9 ms 2 ms X.X.X.X
4 9 ms 2 ms 1 ms XX.XX.XX.XX
5 11 ms 211.XX.X.XX
6 3 ms 2 ms 2 ms 2XX.XX.1XX.XX
7 2 ms 2 ms 1 ms 42.XX.2XX.1XX
8 32 ms 4 ms 3 ms 42.XX.2XX.2XX
9 Request Timeout.
10 3 ms 2 ms 2 ms 223.5.5.5
The tracing is complete.

Application scope

  • ECS