This topic provides answers to some frequently asked questions about health checks of Classic Load Balancer (CLB).
What are the recommended configurations for health checks in SLB?
What happens if I set the weight of an ECS instance to zero?
What is the IP address that HTTP listeners use to perform health checks on ECS instances?
How do I handle a health check failure caused by a faulty backend database?
Why does a backend server that works as expected fail a health check?
How do health checks work?
To perform health checks is to periodically send requests to backend servers to check the servers' status.
CLB instances are deployed in clusters. Nodes in the clusters are responsible for forwarding network traffic and performing health checks. If a backend server fails health checks performed by a node in such a cluster, all nodes in the cluster stop distributing requests to the unhealthy backend server.
CLB health checks use the CIDR block 100.64.0.0/10, which cannot be blocked by the backend servers. You do not need to configure a security group rule to allow access from the CIDR block 100.64.0.0/10 unless security rules such as iptables are configured. Permitting 100.64.0.0/10 does not increase potential risks because the CIDR block is reserved by Alibaba Cloud. IP addresses within the CIDR block are not allocated to users.
For more information, see CLB health checks.
What are the recommended settings for health checks?
Refer to the following table:
Parameter | For TCP, HTTP, and HTTPS listeners | For UDP listeners |
Health Check Response Timeout | 5 Seconds | 10 Seconds |
Health Check Interval | 2 Seconds | 5 Seconds |
Healthy Threshold | 3 Times | 3 Times |
Unhealthy Threshold | 3 Times | 3 Times |
CLB declares a backend server healthy or unhealthy only after the backend server passes or fails health checks consecutively for the specified number of times within the specified time window. For more details, see Configure and manage CLB health checks.
We recommend that you use these settings to ensure that your service recovers immediately after a backend server fails health checks. You can specify a shorter response timeout period as needed. However, you must ensure that the specified timeout period is longer than the normal response time of your backend server.
Can I disable the health check feature?
Yes, you can. For more information, see Disable health checks.
If you disable the health check feature, requests may be distributed to unhealthy backend servers. This can cause service interruptions.
If your business is highly sensitive to traffic fluctuations, frequent health checks may affect the availability of your business. To reduce the impacts of health checks on your business, you can reduce the health check frequency, increase the health check interval, or change Layer 7 health checks to Layer 4 health checks. To ensure business continuity, we recommend that you enable the health check feature.
How do TCP listeners perform health checks?
TCP listeners support HTTP and TCP health checks.
TCP health checks: Listeners check the availability of backend ports by sending SYN packets.
HTTP health checks: Listeners check the availability of backend servers by sending HEAD or GET requests, which is similar to the way in which a browser accesses servers.
A TCP health check consumes fewer server resources. If the workloads on your backend servers are heavy and you only want to check whether backend ports are open, you can configure TCP health checks. If you want more precise health check results, configure HTTP health checks.
What happens if I set the weight of an ECS instance to zero?
If you set the weight of an ECS instance to zero, CLB no longer forwards network traffic to the ECS instance. However, this does not affect the health check result.
After you set the weight of an ECS instance to zero, the ECS instance no longer serves your workloads. You can set the weight of an ECS instance to zero when you restart or modify the configuration of the ECS instance.
What method does an HTTP listener use to perform health checks on backend ECS instances?
HTTP listeners perform health checks by sending HEAD requests.
ECS instances that do not support HEAD requests will fail health checks. We recommend that you run the following command on your ECS instances to access an IP address to check whether the ECS instances support HEAD requests:
curl -v -0 -I -H "Host:" -X HEAD http://IP:port
What is the IP address that HTTP listeners use to perform health checks on ECS instances?
CLB uses 100.64.0.0/10 for health checks. Make sure that requests sent from this CIDR block are allowed by the ECS instances. You do not need to configure a security group rule to allow access from the CIDR block 100.64.0.0/10 unless security rules such as iptables are configured. Permitting 100.64.0.0/10 does not increase potential risks because this CIDR block is reserved by Alibaba Cloud. IP addresses within the CIDR block are not allocated to users.
Why are the health check rates recorded in web logs different from the health check configurations in the console?
Health checks are performed by groups of servers to prevent single points of failure. CLB is deployed across multiple servers. Each server performs health checks independently, which increases the total number of health checks that are actually performed. Therefore, the health check rates recorded in logs are different from the configurations in the console.
How do I handle a health check failure caused by a faulty backend database?
Problem
The static website
www.example.com
and the dynamic websitealiyundoc.com
are deployed on an ECS instance. CLB is used to provide load balancing services for the websites. The backend database is down. As a result, the HTTP 502 error occurs whenwww.example.com
is accessed.Possible causes
The health check domain name is set to
aliyundoc.com
. When the ApsaraDB RDS instance or self-managed database is down, access toaliyundoc.com
fails, which causes the health check failure.Solutions
Change the health check domain name to
www.example.com
.
Why does the log data indicate connection failure to a backend port even though the backend port has passed TCP health checks?
Problem
The log data shown in the following figure indicates frequent connection failures to a backend port of a TCP listener. A packet capture tool is used to identify the source of the connection requests. The result shows that the connection requests are sent by CLB. The packet capture tool has also captured RST packets sent by CLB.
Possible causes
The issue is related to the health check mechanism.
In a TCP health check on a backend port, CLB establishes a connection with the backend port, completes a three-way handshake, and sends an RST packet to close the connection. The process is as follows:
CLB sends an SYN packet.
The backend port returns an SYN-ACK packet.
After CLB receives the response, CLB considers the backend port to be reachable. In this case, the health check succeeds.
Then, CLB sends an RST packet to close the connection instead of sending service requests through the connection.
After CLB completes the health check, the TCP connection is closed. The status of the TCP connection is not updated to the connection pool of services at the application layer, for example, Java connection pools. Therefore, the
Connection reset by peer
error occurs.Solutions
Configure CLB to perform HTTP health checks instead of TCP health checks.
In addition, screen out the log entries that record requests from the CIDR block of CLB and ignore the related error messages.
Why does a backend server that works as expected fail a health check?
Problem
The HTTP health check always fails, but when you run the
curl -I
command, the status code returned is normal.Possible causes
If the status code returned is not specified in the health check configuration in the console, the backend server fails the health check. For example, if you specified the HTTP 2xx status code in the console and another status code is returned, the backend server fails the health check.
When you run the
curl
command on Tengine or NGINX, the result shows that the destination is reachable. However, when you run theecho
command to access the test file test.html, you are directed to the default website and the HTTP 404 error code is returned.Solutions
Modify the main configuration file and comment out the default site.
Add the domain name that is used for health checks to the health check configuration.