Kenan
Assistant Engineer
Assistant Engineer
  • UID621
  • Fans1
  • Follows0
  • Posts55
Reads:3003Replies:0

Troubleshooting guide of ECS server access exceptions

Created#
More Posted time:Dec 12, 2016 14:09 PM

When you access services on ECS via a private network or a local public network, an exception may occur due to many reasons. This article explains the symptoms and causes of access exceptions on the entire link, and describes the troubleshooting and appropriate solutions. It also describes considerations for submitting a ticket.
Note: Exceptions caused by Alibaba Cloud CDN or any third-party CDN are not described here.
Schematic Diagram of Symptoms and Causes for ECS Access Exceptions
Major causes for access exceptions on the entire link from the client to the service end are described in the Schematic Diagram of Symptoms and Causes for ECS Access Exceptions below:


Major symptoms are introduced in the Diagram of ECS Access Exception Symptoms below:
 
Causes for ECS access exceptions
Causes for exceptions of access via private networks
If the client accesses the ECS server via a private network, the link is relatively simple. Possible causes for access exceptions and symptoms at the client are described below:
1. Source server internal configurations
 Cause description: Access exceptions may be caused by security policies of the firewall or security software in the source server, or computer viruses and other internal problems of the operating system.
 Possible symptoms and causes:
 Ping packet loss: Network exceptions are caused by computer viruses and other internal problems of the operating system in the source server.
 Ping failure: Ping to the internet is prohibited by security policies, such as security software, of the source server.
 All port connections via Telnet fail: Network exceptions are caused by computer viruses and other internal problems of the operating system in the source server.
 Some port connections via Telnet fail: Access to these ports is prohibited by the security policy, such as security software, of the source server.
2. Security group configuration of the source server
 Cause description: Access to the target server is blocked by rules of the security group of the source server.
 Possible symptoms and causes:
 Ping failure: Ping is disabled on the source server.
 All port connections via Telnet fail: The Drop rule is configured for the specified port on the source server.
 Some port connections via Telnet fail: The Drop rule is configured for all ports on the source server.
3. Server Load Balancer white list
 Cause description: If the target server enables the Server Load Balancer, when the white list is enabled on the listener port, only specified IP addresses or IP address segments can access the server.
 Possible symptoms and causes: Some port connections via Telnet fail: The source server's IP address is not in the white list, and thus cannot access the listener port.
4. Security group configurations of the target server
 Cause description: Access from the source server is blocked by rules of the security group to which the target server belongs.
 Possible symptoms and causes:
 Ping failure: Ping is disabled on the target server.
 All port connections via Telnet fail: The Drop rule is configured for the specified port on the target server.
 Some port connections via Telnet fail: The Drop rule is configured for all ports on the target server.
5. Internal configurations of the target server
 Cause description: Access exceptions may be caused by security policies of the firewall or security software in the target server, or computer viruses and other problems of the operating system.
 Possible symptoms and causes:
 Ping packet loss: Access exceptions are caused by computer viruses and other problems of the operating system on the target server.
 Ping failure: The ping rule is disabled by security policies, such as security software, of the target server.
 All port connections via Telnet fail: Access exceptions are caused by computer viruses and other problems of the operating system on the target server.
 Some port connections via Telnet fail: Access to these ports is prohibited by the security policy, such as security software, of the target server.


Causes for exception of access via public networks
Client's network environment
Possible causes for access exceptions related to the client's network environment and symptoms at the client are described below:
1. User's local network
 Cause description: Some or all IP addresses cannot access the ECS server due to exceptions in user's local network.
 Possible symptoms and causes: Non-Alibaba Cloud service IP addresses cannot access the ECS server either: Neither the target server nor other non-Alibaba Cloud IP addresses can access the ECS server.
2. Local DNS hijacking
 Cause description: If the user's local network or local operator has DNS hijacks, the webpage cannot switch properly or is injected with advertisement during access to services on the target server.
 Possible symptoms and causes:
 Abnormal jump: Due to DNS hijacks, the server jumps to an unrelated webpage during access to services related to the target server.
 Advertisement injected: Due to DNS hijacks, the webpage is injected with advertisement during access to services related to the target server.


Operator's network environment
Possible causes for access exceptions related to the operator's network environment and symptoms at the client are described below:
1. Operator’s network policy
 Cause description: The operator's policies may generate DNS hijacks or block access from some IP addresses, domain names or ports.
 Possible symptoms and causes:
 Advertisement injected: Due to DNS hijacks, the webpage is injected with advertisement during access to services related to the target server.
 Access from domain names is rejected but access from IP addresses is allowed: The operator blocks access from some illegal domain names.
 All port connections via Telnet fail: The operator blocks access from some illegal IP addresses.
 Some port connections via Telnet fail: The operator blocks access from some high-risk ports.
2. Record filing
 Cause description: Servers located in China shall be filed according to administrative regulations.
 Possible symptoms and causes:
 Abnormal jump: The domain name of the target server is not filed, and thus the webpage jumps to the record filing alert webpage when you access services related to the target server.
 Access from domain names is rejected but access from IP addresses is allowed: The domain name of the target server is not filed, and thus the webpage jumps to the record filing alert webpage when you access services related to the target server. However, access from IP addresses is not affected.


Alibaba Cloud network environment
Possible causes for access exceptions related to the Alibaba Cloud network environment and symptoms at the client are described below:
1. Alibaba Cloud Security - Bot-caused shutdown
 Cause description: The target server continuously attacks internet due to bot and viruses, and thus is shut down by Alibaba Cloud Security.
 Possible symptoms and causes:
 Ping failure: The server is shut down and thus cannot be pinged.
 All port connections via Telnet fail: The server is shut down, and thus access from all ports fails.
2. Alibaba Cloud Security - Access interception
 Cause description: The source server has continuous scanning, detection and attack acts, and thus is blocked by Alibaba Cloud Security.
Notes: When the local network of the source server accesses the public network via NAT sharing, the attack may not necessarily come from the user's server; instead, it may come from other servers in the network. The shared public network IP address is blocked by Alibaba Cloud Security, and access from the source server is also affected.
 Possible symptoms and causes:
 Ping failure: The source server's IP address is intercepted by Alibaba Cloud Security, and thus the source server cannot be pinged.
 All port connections via Telnet fail: The source server's IP address is intercepted by Alibaba Cloud Security, and thus access from all ports fails.
3. AliGreenNet - Violation-caused shielding
 Cause description: The target server's URLs contain illegal content, and thus access is blocked.
 Possible symptoms and causes:
 Abnormal jump: The source target has service exceptions, and the webpage jumps to Anti-DDoS Service or Web application firewall source site exception webpage.
 Access from some URLs fails: If the URLs conform to Web application firewall rules, access from the client fails, and the webpage jumps to the block alert webpage.
4. Cloud Anti-DDoS Service and Web application firewall
 Cause description: The target server has service exception, or access from the source server is intercepted by Anti-DDoS Service or Web application firewall rules, and thus the access fails.
 Possible symptoms and causes:
 Ping failure: The server is shut down and thus cannot be pinged.
 All port connections via Telnet fail: The server is shut down, and thus access from all ports fails.
5. Server Load Balancer white list
 Cause description: If the target server enables the Server Load Balancer, when the white list is enabled on the listener port, only specified IP addresses or IP address segments can access the server.
 Possible symptoms and causes:
 Some port connections via Telnet fail: The source server's IP address is not in the white list, and thus cannot access the listener port.
6. Security group configurations of the target server
 Cause description: Access from the source server is blocked by rules of the security group to which the target server belongs.
 Possible symptoms and causes:
 Ping failure: Ping is disabled on the target server.
 All port connections via Telnet fail: The Drop rule is configured for the specified port on the target server.
 Some port connections via Telnet fail: The Drop rule is configured for all ports on the target server.


Internal environment of the target ECS server
Cause for access exceptions related to the environment of the target ECS server and resulting symptoms are described below:
1. Arrears-caused service suspension of the target server
 Cause description: The target server has been suspended due to arrears and thus cannot be accessed.
 Possible symptoms and causes:
 Ping failure: The target server is suspended due to overdue payment and thus cannot be pinged.
 All port connections via Telnet fail: The target server is suspended due to arrears and thus access from all ports fails.
2. Internal configurations of the target server
 Cause description: Access exceptions may be caused by security policies of the firewall or security software in the target server, or computer viruses and other problems of the operating system.
 Possible symptoms and causes:
 Ping packet loss: Access exceptions are caused by computer viruses and other problems of the operating system on the target server.
 Ping failure: The ping rule is disabled by security policies, such as security software, of the target server.
 All port connections via Telnet fail: Access exceptions are caused by computer viruses and other problems of the operating system on the target server.
 Some port connections via Telnet fail: Access to these ports is prohibited by the security policy, such as security software, of the target server.
3. Access control for software source addresses
 Cause description: The service software in the target server enables access control over source IP addresses, and thus access from the source server fails.
 Possible symptoms and causes:
 Some port connections via Telnet fail: The service software corresponding to these ports enables access control over source IP addresses, and thus access from the source server is blocked.


Troubleshooting flowchart for ECS server access exceptions
To solve ECS access exceptions, refer to the Troubleshooting Flowchart for ECS Server Access Exceptions as follows:




Troubleshooting and countermeasures for ECS access exceptions
Troubleshooting exceptions of access via private networks
If the client accesses the ECS server via a private network, you can take the following steps to judge, analyze and handle the access exception:
1. Does the exception always occur when any of the servers accesses the target server?
That is, compare the access statuses when different servers access the target server simultaneously.
 (1-A) Yes (Access from all servers to the target server is abnormal):
If access from all servers is abnormal, it may be caused by the rules of security group to which the target server belongs, or exceptions of the target server. Further analysis is needed.
 1-A.1 Is the internal access to the server normal? Log in to the server through Management Terminal, and access the server via 127.0.0.1, to check whether the server can be accessed.
 (1-A.1-A) Yes (Internal access to the server is still abnormal): If internal access to the target server is still abnormal, you need to contact the service provider or service O&M personnel to check the code configurations and software running status.
 (1-A.1-B) No (Internal access to the server is normal): If internal access to the target server is normal, you need to check whether access from the source server is blocked by the security configurations of the security group to which the target server belongs and the security software in the operating system. In case of exceptions found during troubleshooting the configurations of the security group and the operating system's security software, capture packets simultaneously from the client and the service end, submit the packet capture results, and contact After-sales Technical Support.
 (1-B) No (Only access from the source server to the target server is abnormal): If only access from the source server is abnormal, it may be caused by the rules of security group to which the source server belongs, exceptions of the source server, or exceptions in the network from the source server to the target server. Further analysis is needed.
 1-B.1 Is the access from Telnet normal? The source server only cannot ping the target server, but port access is normal.
(1-B.1-A) Yes (The source server cannot ping the target server, but access from Telnet ports is normal): In such a case, you need to check whether the ping rule is disabled for the source server by the security configurations of the security group to which the target server belongs and the security software of the operating system.
 (1-B.1-B) No (Exceptions occur during access from the source server to the target server, the Telnet port test and the ping test): If exceptions occur both in the ping test and Telnet port test, further troubleshooting is needed:
1-B.1-B.1 Can the source server ping its gateway? Check whether the source server's gateway can be pinged through the source server.
(1-B.1-B.1-A) No (The source server cannot ping its own gateway): If the source server cannot ping its own gateway (having access failure or packet loss), you need to analyze system logs to check the running status of the source server, for example, its load or network configurations.
 (1-B.1-B.1-B) Yes (The source server can ping its own gateway): In such a case, further troubleshooting is needed:
1-B.1-B.1-B.1 Can the source server ping the target server's gateway? Check whether the source server can ping the target server's gateway.
(1-B.1-B.1-B.1-A) No (The source server can ping the target server's gateway): If the source server can ping both its own gateway and the target server's gateway, you need to analyze system logs to check the running status of the target server, for example, its load or network configurations.
 (1-B.1-B.1-B.1-B) Yes (The source target cannot ping the target server's gateway): If the source server can ping its own gateway, but cannot ping the target server's gateway (having access failure or packet loss), it may be caused by exceptions in the intermediate network. In case of exceptions, capture packets simultaneously from the client and the service end, submit the packet capture results, and contact After-sales Technical Support.


Troubleshooting exceptions of access via public networks
If the client accesses the ECS server via a public network, you can take the following steps to judge, analyze and process the access exception:
1. URL-based access exception diagnosis:
1.1 Is the webpage injected with advertisement? Check whether the webpage is injected with advertisement when the client accesses services on the target server.
 (1.1-A) Yes (The webpage is injected with advertisement): In this case, you need to take the steps below for further troubleshooting:
 1.1-A.1 Is the internal access to the system normal? Log in to the target server through Management Terminal, and access the server via 127.0.0.1, to check whether the server can be accessed.
 (1.1-A.1-A) Yes (The internal access to the target server is abnormal): If internal access to the target server is still abnormal, you need to contact the service provider or service O&M personnel to check the code configurations and software running status.
 (1.1-A.1-B) No (Internal access to the target server is normal) In this case, you need to check whether the exception is caused by exceptions in the local network or hijacking by the local operator. You can modify the local DNS server address to check whether the problem is resolved. If the problem persists, contact the local network department for troubleshooting or submit feedback to the local operator.
 (1.1-B) No (The webpage is not injected with advertisement): In this case, you need to take the steps below for further troubleshooting.
 1.1-B.1 Does the webpage jump abnormally? Check whether relevant URL webpage jumps abnormally when the client accesses services on the target server.
 (1.1-B.1-A) Yes (The webpage jumps abnormally): In this case, you need to take the steps below for further troubleshooting:
 1.1-B.1-A.1 Is the internal access to the system normal? Log in to the target server through Management Terminal, and access the server via 127.0.0.1, to check whether the server can be accessed.
 (1.1-B.1-A.1-A) No (Internal access to the target system is also abnormal): If internal access to the target server is still abnormal, you need to contact the service provider or service O&M personnel to check the code configurations and software running status.
 (1.1-B.1-A.1-B) Yes (Internal access to the target server is normal): In this case, you can handle the problem based on the page displayed.
 • (1.1-B.1-B) No (The webpage jumps normally): In this case, you need to take the steps below for further troubleshooting.
2. Problem scope judgment:
In case of non-URL access exceptions, you need to determine the problem scope through comparative analysis:
2.1 Is access from all networks abnormal? Use a third-party dial test platform to compare the access status of networks located all over the country, so as to check whether access from all networks to the target server is abnormal.
 (2.1-A) Yes (Access from all networks is abnormal): If access from all external networks is abnormal in the test, you need to take the steps below for further troubleshooting:
 2.1-A.1 Is the internal access to the system normal? Log in to the target server through Management Terminal, and access the server via 127.0.0.1, to check whether the server can be accessed.
 (2.1-A.1-A) No (Internal access to the server is also abnormal): If internal access to the target server is still abnormal, you need to contact the service provider or service O&M personnel to check the code configurations and software running status.
 (2.1-A.1-B) Yes (Internal access to the target server is normal): In this case, you need to check whether the access control over the source server is enabled in the target server's security group or the system security configurations.
 (2.1-B) No (Only access from the source server to the target server is abnormal): In this case, you need to take the steps below for further troubleshooting.
3. Symptoms judgment:
If only access from the source server is abnormal, you need to perform the ping test or Telnet test for further troubleshooting.
3.1 Is ping normal? Check whether the client can ping to the target server's IP address.
 (3.1-A) No (The target server can be pinged): If the client cannot ping the target server or packet loss occurs, a possible cause may be the exceptions in the intermediate link or the peer end server. You need to test the MTR link for further troubleshooting.
 (3.1-B) Yes (The target server can be pinged, but access from ports fails): In this case, you need to take the steps below for further troubleshooting.
 3.1-B.1 Is the port intercepted by the target server? Check whether access from the client to the corresponding port is blocked by policies set in the target server's security group or the system security configurations.
 (3.1-B.1-A) Yes (Access from the client to some ports is blocked by the target server): You need to reset the policies which block access from the source server.
 (3.1-B.1-B) No (The target server has no access blocking policy): If the target server has no policy that blocks access from the source server, the access may be intercepted by the operator. You need to use tracetcp and other tools to further trace and analyze the block status of the ports.


Tips for submitting tickets related to ECS access exceptions
Access from the client via private networks
For access from the client via a private network, perform the test according to the following steps and record the results:
1. Launch the same access to the target server via different servers, to check whether identical exception symptoms exist.
2. Check whether the client can ping the target server's IP address.
3. Check whether the client can access Telnet ports of the target server.
4. Check whether the source server can ping its own gateway.
5. Check whether the target server can ping its own gateway.
6. Check whether the source server can ping the target server's gateway.
7. Check whether the target server can ping the source server's gateway.


Access from the client via public networks
For access from the client via a public network, perform the test according to the following steps and record the results:
1. Access the target server from different regions and different network environment, to check whether identical exception symptoms exist.
2. Check whether the webpage is injected with advertisement.
3. Check whether the webpage jumps abnormally.
4. Check whether the client can ping the target server.
5. Check whether the client can access Telnet ports of the target server.
6. In case of ping failures (packet loss or interruption), perform the test and record the data.
7. If the server can be pinged, but access from ports fails, perform the test and record the data.


Record test results or data of the above steps, and contact After-sales Technical Support.
Guest