All Products
Search
Document Center

:Method for testing links after a packet loss or failure occurs when the ping command is used

Last Updated:May 12, 2022

Disclaimer: This article may contain information about third-party products. Such information is for reference only. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.

Overview

If a packet loss or failure occurs when the client accesses the destination server by using the ping command, the Trace Route (tracert) or My traceroute (mtr) tool can be used to perform link tests to determine the root cause of the problem. This topic describes how to use tools to perform link tests and analysis.

Background information

Alibaba Cloud reminds you that:

  • Before you perform operations that may cause risks, such as modifying instance configurations or data, we recommend that you check the disaster recovery and fault tolerance capabilities of the instances to ensure data security.
  • If you modify the configurations and data of instances including but not limited to ECS and ApsaraDB RDS (RDS) instances, we recommend that you create snapshots or enable RDS log backup.
  • If you have authorized or submitted security information such as the logon account and password in the Alibaba Cloud Management Console, we recommend that you modify such information in a timely manner.

If a packet loss or failure occurs when the ping command is used, you can refer to the following operations to perform a link test. For more information about the tools used for link tests, see Other considerations.

Procedure

Step 1: Obtain the public IP address of the local network

Access ip.taobao.com in the local network of the client to obtain the corresponding public IP address.

Step 2: Perform a forward link test by using ping and mtr

Perform the following tests on the the destination server from the client:

  • Perform continuous ping tests on the destination server domain name or IP address from the client. We recommend that you ping at least 100 packets and record the test results.
  • Use WinMTR or mtr based on the operating system of your client. Set the destination address to the domain name or IP address of the destination server. Then, perform link tests and record the test results.

Step 3: Perform a reverse link test by using ping and mtr

Log on to the destination server and perform the following tests:

  • Perform continuous ping tests on the IP address of the client obtained in Step 1 from the destination server. We recommend that you ping at least 100 packets and record the test results.
  • Use MinMTR or mtr based on the operating system of the destination server. Set the destination address to the IP address of the client. Then, perform link tests and record the test results.

Step 4: Analyze the test results

Refer to Analysis of the test results to analyze the test results. After confirming the abnormal node, visit the Taobao IP address database or other websites that can query the IP address to obtain the information of the operator of the abnormal node. If you identify that the exception occurs in the local network of the client, troubleshoot and analyze the local network. If you identify that the exception occurs in the carrier, send feedback to the carrier.

Analysis of the test results

The test results of mtr or WinMTR are used in this example because mtr or WinMTR provides more accurate results. For more information, see the following key items. 

Item 1: Network areas

Typically, a link from the client to the destination server includes the following network areas:

  • Local network of the client: the local area network (LAN) and the network of the local network provider, as shown in Section A of the preceding figure. If an exception occurs in the local network of the client, troubleshoot and analyze the corresponding local network. If an exception occurs in the network of the local network provider, send feedback to the local carrier.
  • Backbone network of the carrier: the network as shown in Section B of the preceding figure. If an exception occurs in this area, you can query the carrier based on the IP address of the exceptional node and send feedback to the carrier. You can also contact the Alibaba Cloud technical support to give feedback to the carrier.
  • Local network of the destination server: the network of the provider to which the destination server belongs, as shown in Section C of the preceding figure. If an exception occurs in this area, you must give feedback to the network carrier to which the destination server belongs.

Item 2: Link load balancing

Section D of the preceding figure shows the link load balancing. If some parts of an intermediate link use link load balancing, mtr performs numbering and probing statistics only on the first and last nodes. For an intermediate node, only the corresponding IP address or domain name information is displayed.

Item 3: Comprehensive judgment based on the average value (Avg) and standard deviation (StDev)

Due to factors such as link jitters, the Worst and Best values of a node may vary greatly. Avg counts all the detected average values since the link test and can better reflect the network quality of the corresponding node. A greater StDev indicates that the latency value of the packet in the corresponding node is more different or more discrete. Therefore, the standard deviation value can help determine whether Avg actually reflects the network quality of the corresponding node. For example, if the standard deviation is large, the latency of the packet is uncertain. Some packets may have a low latency, such as 25 ms while other packets have a high latency, such as 350 ms. However, the finally obtained average latency may be normal. Therefore, Avg cannot properly reflect the actual network quality.

Based on the preceding description, we recommend that you use the following analysis criteria:

  • If StDev is high, the Best and Worst values of the corresponding node are synchronously observed to determine whether the corresponding node has an exception.
  • If StDev is not high, Avg is used to determine whether the corresponding node has an exception.
    Note: You can determine whether StDev is high based on the latency values in other columns of the same node rather than a specific time range standard, but based on the latency values in other columns of the same node. For example, if Avg is 30 ms and StDev is 25 ms, StDev is considered high. If Avg is 325 ms and StDev is 25 ms, StDev is considered not high.

Item 4: Judgement about the packet loss rate (Loss%)

If Loss% of any node is not zero, an error may exist in this network hop. The following section describes the typical causes for the packet loss of a node:

  • The ICMP transmission rate of the node is limited by the carrier due to security or performance reasons.
  • An exception exists in the node.

Determine the cause based on the packet loss on the exceptional node and its subsequent nodes, and the following content:

  • If no packet loss occurs on the subsequent nodes, the packet loss on the exceptional node is caused by the carrier policy. The packet loss can be ignored, as shown in the second hop of the preceding figure.
  • If packet losses also occur on the subsequent nodes, a network error exists in the exceptional node, as shown in the fifth hop of the preceding figure.

The preceding situations may occur at the same time. This means that the transmission rate of the node is limited by the carrier policy, and a network error also exists on the node. In this case, if packet losses occur on the exceptional node and its subsequent nodes, and the packet loss rate of each node is different, the packet loss rate of the last few hops prevails. In the preceding figure, packet losses occur in the fifth, sixth, and seventh hops. 40% of the packet loss rate in the seventh hop is used as reference.

Item 5: Latency

The following section describes scenarios with regard to latency:

Scenario 1: Latency hopping

If the latency increases sharply after a hop, a network error is considered to exist on the corresponding node. The preceding figure shows that the latencies of the subsequent nodes increase sharply from the fifth hop. A network error is considered to exist on the fifth hop. However, a high latency does not necessarily mean that an exception exists in the corresponding node. As shown in the preceding figure, although the latencies of the subsequent nodes increase sharply after the fifth hop, the test data still reaches the destination host. Therefore, a high latency may also be caused on the link when a response is sent back. You must analyze the problem by performing a reverse link test.

Scenario 2: Increased latency due to ICMP rate limiting

The ICMP policy rate limiting may also cause a sharp increase in latency on the corresponding node, but the latencies of the subsequent nodes return to the normal status. The preceding figure shows that the third hop has a packet loss rate of 100%, and the latency increases sharply. However, the latencies of the subsequent nodes immediately return to the normal status. Therefore, it is determined that the sharp increase in latency and the packet loss on the node are caused by the policy rate limiting.

Solutions

Common scenarios for link exceptions

The following section describes the common scenarios for link exceptions and the related test reports:

Scenario 1: Inappropriate network configurations of the destination host

Sample data:

[root@mycentos6 ~]# mtr --no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. ???
2. ???
3. 1XX.X.X.X 0.0% 10 521.3 90.1 2.7 521.3 211.3
4. 11X.X.X.X 0.0% 10 2.9 4.7 1.6 10.6 3.9
5. 2X.X.X.X 80.0% 10 3.0 3.0 3.0 3.0 0.0
6. 2X.XX.XX.XX 0.0% 10 1.7 7.2 1.6 34.9 13.6
7. 1XX.1XX.XX.X 0.0% 10 5.2 5.2 5.1 5.2 0.0
8. 2XX.XX.XX.XX 0.0% 10 5.3 5.2 5.1 5.3 0.1
9. 173.194.200.105 100.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, the packet loss rate at the destination address is 100%. The packet does not reach the destination address. ICMP may be disabled in the security policies related to the destination server, such as firewalls or iptables, which causes the destination host to be unable to send responses. Therefore, you must troubleshoot the security policy configurations of the destination server.

Scenario 2: ICMP rate limiting

Sample data:

[root@mycentos6 ~]# mtr --no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.X.XX 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. 72.14.233.56 60.0% 10 27.2 25.3 23.1 26.4 2.9
6. 209.85.254.247 0.0% 10 39.1 39.4 39.1 39.7 0.2
7. 64.233.174.46 0.0% 10 39.6 40.4 39.4 46.9 2.3
8. gw-in-f147.1e100.net 0.0% 10 39.6 40.5 39.5 46.7 2.2

In this example, a packet loss occurs in the fifth hop, but no exception occurs to the subsequent nodes. Therefore, the exception is caused by the ICMP rate limiting on the node. This scenario can be ignored during analysis because it does not affect the data transmission between the client and the destination server.

Scenario 3: Loop

Sample data:

[root@mycentos6 ~]# mtr --no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.7X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.6X.X 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. 72.14.233.56 0.0% 10 0.0 0.0 0.0 0.0 0.0
6. 72.14.233.57 0.0% 10 0.0 0.0 0.0 0.0 0.0
7. 72.14.233.56 0.0% 10 0.0 0.0 0.0 0.0 0.0
8. 72.14.233.57 0.0% 10 0.0 0.0 0.0 0.0 0.0
9 ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, a circular hop occurs to the packet after the fifth hop, which causes the failure of the packet to reach the destination server. Typically, this is caused by an exception in the node routing configurations of the carrier. Therefore, you must contact the carrier to which the corresponding node belongs.

Scenario 4: Link interruption

Sample data:

[root@mycentos6 ~]# mtr --no-dns www.google.com
My traceroute [v0.75]
mycentos6.6 (0.0.0.0) Wed Jun 15 19:06:29 2016
Keys: Help Display mode Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. 63.247.7X.X 0.0% 10 0.3 0.6 0.3 1.2 0.3
2. 63.247.6X.X 0.0% 10 0.4 1.0 0.4 6.1 1.8
3. 209.51.130.213 0.0% 10 0.8 2.7 0.8 19.0 5.7
4. aix.pr1.atl.google.com 0.0% 10 6.7 6.8 6.7 6.9 0.1
5. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
6. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
7. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
8. ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0
9 ??? 0.0% 10 0.0 0.0 0.0 0.0 0.0

In this example, the packet cannot receive feedback after the fourth hop. Typically, this is caused by the interruption of the corresponding node. We recommend that you make further confirmation by performing a reverse link test. In this case, you must contact the carrier to which the corresponding node belongs.

Other considerations

The tools used for link tests vary with operating systems. The following section describes the tools:

Linux

The following section describes the link test tools for Linux:

Tool 1: mtr

mtr is a network test tool pre-installed in almost all Linux distributions. It combines the features of ping and traceroute and provides more powerful features. By default, mtr sends ICMP packets for link probing. You can also use the -u parameter to specify UDP packets for probing. traceroute performs a link tracing test only once while mtr continuously detects the relevant nodes on a link and provides statistical information. Therefore, mtr can avoid the impact of node fluctuations on the test results and provide more accurate test results. We recommend that you use mtr.

Usage notes

mtr [-BfhvrwctglxspQomniuT46] [--help] [--version] [--report]
[--report-wide] [--report-cycles=COUNT] [--curses] [--gtk]
[--csv|-C] [--raw] [--xml] [--split] [--mpls] [--no-dns] [--show-ips]
[--address interface] [--filename=FILE|-F]
[--ipinfo=item_no|-y item_no]
[--aslookup|-z]
[--psize=bytes/-s bytes] [--order fields]
[--report-wide|-w] [--inet] [--inet6] [--max-ttl=NUM] [--first-ttl=NUM]
[--bitpattern=NUM] [--tos=NUM] [--udp] [--tcp] [--port=PORT] [--timeout=SECONDS]
[--interval=SECONDS] HOSTNAME

Description of common optional parameters

  • --report: shows the output in report mode.
  • --split: lists the results of each trace separately instead of all the results.
  • --psize: specifies the size of the pinged packet.
  • --no-dns: specifies not to perform a DNS query for the domain name associated with an IP address.
  • --address: specifies the IP address for sending packets when the host has multiple IP addresses.
  • -4: uses only IPv4.
  • -6: uses only IPv6.

You can also enter characters similar to the following ones to switch the mode while mtr is running:

  • ? or h: shows the Help menu.
  • d: switches the display mode.
  • n: enables or disables the domain name system (DNS).
  • u: switches to ICMP or UDP packets for probing.

Sample response

Response description

The following section describes each data column in the response when the default configuration is used:

  • The first column (Host): the IP address and domain name of the node. Press the N key to switch the display mode.
  • The second column (Loss%): the packet loss rate of the node.
  • The third column (Snt): the number of packets sent per second. Default value: 10. The value can be specified by using the -c parameter.
  • The fourth column (Last): the latest probe latency.
  • The fifth, sixth, and seventh columns (Avg, Best, and Worst): the average, minimum, and maximum values of the probe latency.
  • The eighth column (StDev): the standard deviation. A greater StDev value indicates that the corresponding node is more stable.

Tool 2: traceroute

traceroute is a network test tool pre-installed in almost all Linux distributions. This tool traces the paths that IP packets take to a destination.

  1. traceroute first sends small UDP probe packets that have the maximum time to live (Max_TTL) value.
  2. Then, traceroute listens for ICMP TIME_EXCEEDED responses on the entire link starting from the gateway. traceroute sends UDP packets with the TTL value starting from 1 and increases the value by 1 each time until it receives the ICMP PORT_UNREACHABLE message.
    Note:
    • The ICMP PORT_UNREACHABLE message is used to indicate that the destination host is located, or the maximum TTL of the command is reached.
    • By default, traceroute sends UDP packets for link probing. You can use the -I parameter to specify ICMP packets for probing.

Usage notes

traceroute [-I] [ -m Max_ttl ] [ -n ] [ -p Port ] [ -q Nqueries ] [ -r ] [ -s SRC_Addr ] [ -t TypeOfService ] [ -f flow ] [ -v ] [ -w WaitTime ] Host [ PacketSize ]

Description of common optional parameters

  • -d:: provides Socket-level troubleshooting.
  • -f: sets the TTL value for the first probe packet.
  • -F: disables segmentation.
  • -g: specifies the source routing gateways. A maximum of eight routing gateways can be specified.
  • -i: uses the specified network interface controller (NIC) to send packets when the host has multiple NICs.
  • -I: uses ICMP packets instead of UDP packets for probing.
  • -m: specifies the maximum TTL of the probe packet.
  • -n: uses the IP address instead of the host name to disable reverse DNS lookup.
  • -p: sets the UDP communication port.
  • -r: ignores the common route tables and sends packets to the destination host.
  • -s: sets the IP address to which the local host sends packets.
  • -t: sets the TOS value for the probe packet.
  • -v: shows the command execution process in detail.
  • -w: sets the waiting time for the remote host to return packets.
  • -x: enables or disables packet verification.

Sample response

For more information about how to use the traceroute tool, visit traceroute(8) - Linux man page.

Windows

The following section describes the link test tools for Windows:

Tool 1: (Recommended) WinMTR

WinMTR is a Windows graphical implementation of mtr. It provides simplified features and supports only some parameters of mtr. By default, WinMTR sends ICMP packets for probing and cannot switch over, which is the same as mtr. Compared with tracert, WinMTR can avoid the impact of node fluctuations on the test results and provide more accurate test results. Therefore, when WinMTR is available, we recommend that you use WinMTR for link tests.

Usage notes

You can decompress and start WinMTR without installation. It is simple to use. You need to perform only the following operations:

  1. As shown in the following figure, start WinMTR and enter the domain name or IP address of the destination server in the Host field. Do not enter spaces.
  2. Click Start to perform a test. After the test starts, the corresponding button changes to Stop.
  3. Click Stop to stop the test from running.
  4. Take note of the following items:
    • Copy Text to clipboard: copies the test results in the text format to the clipboard.
    • Copy HTML to clipboard: copies the test results in the HTML format to the clipboard.
    • Export TEXT: exports the test results in the text format to a specified file.
    • Export HTML: exports the test results in the HTML format to a specified file.
    • Options: the optional parameters, including the following ones:
      • Interval (sec): the interval between probes, which is also the expiration time of each probe. Default value: 1. Unit: seconds.
      • ping size(bytes): the size of the packet used for the ping probe. Default value: 64. Unit: bytes.
      • Max hosts in LRU list: the maximum number of hosts supported by the LRU list. Default value: 128.
      • Resolve names: shows relevant nodes by domain name based on reverse lookup of IP addresses.

Response description

The following section describes each data column in the response when the default configuration is used:

  • The first column (Hostname): the IP address or domain name of the node.
  • The second column (Nr): the serial number of the node.
  • The third column (Loss%): the packet loss rate of the node.
  • The fourth column (Sent): the number of packets that have been sent.
  • The fifth column (Recv): the number of packets that have been received.
  • The sixth, seventh, eighth, and ninth columns (Best, Avg, Worst, and Last): the minimum, average, maximum, and last latency values on the corresponding node.

Tool 2: tracert

tracert is a Windows command line tool for network diagnostics tool. This tool traces the paths that IP packets take to a destination. tracert identifies the route to the destination IP address by sending ICMP packets with various TTL values to the destination IP address. Each router in the path is required to decrement the TTL value in a packet by at least 1 before the packet is forwarded. Therefore, TTL functions as a hop counter. When the TTL value is 0, the router returns an ICMP_TIME EXCEEDED message to the source computer. 

tracert sends a packet with a TTL value of 1 for the first time and increases the value by 1 on each subsequent transmission until the destination is reached or the maximum value of TTL is tested. The ICMP_TIME EXCEEDED messages returned by the intermediate routers contain the router information.

Usage notes

tracert [-d] [-h maximum_hops] [-j host-list] [-w timeout] [-R] [-S srcaddr] [-4] [-6] target_name

Description of common optional parameters

  • -d: disables reverse DNS lookup, which does not resolve IP addresses to hostnames.
  • -h: maximum_hops. It specifies the maximum number of hops to search for the destination IP address.
  • -j: host-list. It specifies the loose source route along the host list.
  • -w: timeout. It indicates the timeout period to wait for each reply. Unit: milliseconds.
  • -R: tracks the round-trip path, which applies only to IPv6.
  • -S or srcaddr: specifies the source address to use, which applies only to IPv6.
  • -4: uses only IPv4.
  • -6: uses only IPv6.
  • target_host: specifies the domain name or IP address of the destination host.

Sample response

C:\> tracert -d 223.5.5.5
Routes to 223.5.5.5 are tracked by using up to 30 hops.
2 9 ms 3 ms 12 ms 192.168.X.X
3 4 ms 9 ms 2 ms X.X.X.X
4 9 ms 2 ms 1 ms XX.XX.XX.XX
5 11 ms 211.XX.X.XX
6 3 ms 2 ms 2 ms 2XX.XX.1XX.XX
7 2 ms 2 ms 1 ms 42.XX.2XX.1XX
8 32 ms 4 ms 3 ms 42.XX.2XX.2XX
9 The request times out.
10 3 ms 2 ms 2 ms 223.5.5.5
The tracing is complete.

Application scope

  • ECS