500 Internal Server Error,
502 Bad Gateway, and
504 Gateway Timeout occur when attempting to access a service through SLB (SLB), steps you can take to resolve the issues are as follows:
- Blocked by the security protection software of the backend ECS instance
- Parameter error of the Linux kernel of the backend ECS instance
- Performance bottleneck of the backend ECS instance
- SLB reports 502 error due to health check failure
- The health check is normal but the web application reports 502 error
- The HTTP header is too long
- Problem of service access logic
- Troubleshooting procedures
- Open a ticket
Cause: Blocked by security protection software of the backend ECS instance.
The IP address ranges 100.64.0.0/10, 10.158.0.0/16, 10.159.0.0/16 and 10.49.0.0/16 are used by SLB to perform health checks and forward requests.
Resolution: You can add these IP address ranges to the firewall exceptions to prevent 500 or 502 errors.
Cause: Parameter error of the Linux kernel of the backend ECS instance
Resolution: If the backend ECS instance is using the Linux system, disable the
rp_filterfeature in the system kernel parameters when changing the Layer-7 listener to the Layer-4 listener.
Set the values of the following parameters in the system configuration file
/etc/sysctl.confto zero, and then run
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
Cause: Performance bottleneck of the backend ECS instance
High CPU utilization or no extra bandwidth may cause access exceptions.
Resolution: Check the performance of the backend ECS instance to solve performance bottlenecks. If the overall system capacity is insufficient, you can increase the number of backend ECS instances.
Cause: SLB reports 502 error due to health check failure.
A 502 error may occur if the health check function of SLB is disabled. Then the web service in the backend server cannot process HTTP requests. For more information, see Resolve health check failures.
Cause: The health check is normal but the web application reports 502 errors.
502 Bad Gatewayerror message indicates that SLB can forward requests from the client to the backend servers, but the web application in the backend ECS instance cannot process the requests. Therefore, you must check the configurations and running status of the web application in the backend server.
For example, the time used by the web application to process HTTP requests exceeds the timeout value of SLB. For Layer-7 listeners, if the time used by the backend server to process PHP requests exceeds the proxy_read_timeout of 60 seconds, SLB reports
504 Gateway Time-out. For Layer-4 listeners, the timeout value is 900 seconds.
Resolution: Make sure that the web service and related services run normally. Check if PHP requests are processed properly, and optimize the processing of PHP requests by the backend server. For example with the web server of Nginx and php-fpm:
The number of PHP requests being processed has reached the limit.
If the total number of PHP requests being processed in the server has reached the limit set by
php-fpm, and more PHP requests are being sent to the server, then 502 or 504 errors may occur:
If existing PHP requests in the backend server are processed timely, new PHP requests can be processed successively.
If the existing PHP requests are not processed timely, new PHP requests will remain in a waiting mode. If the value of
fastcgi_read_timeoutof Nginx is exceeded, a
504 Gateway Time-outerror occurs.
If the existing PHP requests are not processed in a timely manner, new PHP requests will remain in a waiting mode. If the value of
request_terminate_timeoutin Nginx is exceeded, a
502 Bad Gatewayerror occurs.
If the PHP script execution time exceeds the limit, namely, the time used by
php-fpmto process PHP scripts exceeds the value of
request_terminate_timeoutin Nginx, a 502 error occurs and the following error log is shown in Nginx logs:
[error] 1760#0: *251777 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: localhost, request: “GET /timeoutmore.php HTTP/1.1”, upstream: “fastcgi://127.0.0.1:9000”
The health check is performed on static pages. Errors occur when exceptions are detected in the process handling dynamic requests. For example,
php-fpmis not running.
Cause: The HTTP header is too long.
An HTTP header that is too long may make SLB unable to process relevant data, resulting in 502 errors.
Resolution: Decrease the amount of data transmitted by the header or change the Layer-7 listener to the Layer-4 listener.
Cause: Problem of service access logic.
Make sure that no backend ECS instance in SLB accesses the public IP of SLB. When the backend server accesses its own port through the IP address of SLB, the requests may be scheduled to the server itself based on the scheduling rules of SLB. This will lead to an infinite loop, thus resulting in 500 or 502 error for the requests.
Resolution: Make sure SLB is correctly used and that no backend ECS instance is accessing the public IP of SLB.
Troubleshoot 500 errors as follows:
Check the screenshot of 500/502/504 error to determine the cause of the error. The cause of the error could be with SLB, of Anti-DDoS or QuickShield, and/or a problem with backend ECS instance configurations.
If Anti-DDoS or QuickShield is used, make sure that the Layer-7 forwarding rules are correctly configured.
Check whether the problem occurs in all clients. If not, check whether the client indicating an error has been blocked by Alibaba Cloud Security. Also, check whether the domain name or IP of SLB is intercepted by the carrier.
Check the status of SLB and whether there are any health check failures in any backend ECS instances. If so, resolve the detected health check failure.
Bind the service address of SLB to the IP address of the backend server by using the hosts file on the client. If a 5XX error occurs at intervals, it is possible that a backend ECS server is not correctly configured.
Change the Layer-7 SLB instance to a Layer-4 SLB instance to see whether the problem occurs again.
Check the performance of backend ECS servers and whether there is performance bottleneck of the CPU, memory, disk, or bandwidth.
If it is determined that the error is due to the backend server, check whether there are any related errors in web server logs of the backend ECS instance. Check whether the web service is running normally and whether the web access logic is correct. Test by uninstalling anti-virus software on the server and restarting the server.
Check whether the TCP kernel parameters of the Linux system on the backend ECS instance are correctly configured.
Perform the troubleshooting procedures step by step and record the test results in detail. Provide the test results when you open the ticket so that our after-sales technical support can help you solve the problem as soon as possible.