All Products
Search
Document Center

Server Load Balancer:FAQ about health checks

Last Updated:Feb 02, 2024

This topic provides answers to some frequently asked questions about health checks of Classic Load Balancer (CLB).

How do health checks work?

CLB instances are deployed in clusters. Nodes in Layer 4 or Layer 7 clusters are responsible for forwarding network traffic and performing health checks.

Nodes in Layer 4 clusters are independent of each other, and forward network traffic and perform health checks based on forwarding rules. If a backend server fails health checks performed by a node in a Layer 4 cluster, the backend server is declared unhealthy. Requests destined for the backend server are distributed to other backend servers. All nodes in the Layer 4 cluster stop distributing requests to the unhealthy backend server.

CLB health checks use the CIDR block 100.64.0.0/10, which cannot be blocked by the backend servers, as shown in the following figure. You do not need to configure a security group rule to allow access from the CIDR block 100.64.0.0/10 unless security rules such as iptables are configured. Permitting 100.64.0.0/10 does not increase potential risks because the CIDR block is reserved by Alibaba Cloud. IP addresses within the CIDR block are not allocated to users.

image

For more information, see How CLB health checks work.

What are the recommended configurations for health checks?

Network traffic frequently switching between backend servers may compromise service availability. To prevent this issue, network traffic is switched between backend servers only after the backend servers consecutively pass or fail health checks during the specified time period. For more information, see Configure and manage CLB health checks.

The following table describes the recommended health check configurations for TCP, HTTP, and HTTPS listeners.

Parameter

Recommended value

Health Check Response Timeout

5 Seconds

Health Check Interval

2 Seconds

Healthy Threshold

3 Times

Unhealthy Threshold

3 Times

The following table describes the recommended health check configurations for UDP listeners.

Parameter

Recommended value

Health Check Response Timeout

10 Seconds

Health Check Interval

5 Seconds

Healthy Threshold

3 Times

Unhealthy Threshold

3 Times

Important

We recommend that you use these settings to ensure that your service recovers immediately after a backend server fails health checks. You can specify a shorter response timeout period as needed. However, you must make sure that the specified timeout period is longer than the normal response time of your backend server.

Can I disable the health check feature?

Yes, you can disable the health check feature. For more information, see Disable the health check feature.

Important
  • If you disable the health check feature, requests may be distributed to unhealthy ECS instances. This can cause service interruptions. Therefore, we recommend that you enable the health check feature.

  • If your business is highly sensitive to traffic fluctuations, frequent health checks may affect the availability of your business. To reduce the impacts of health checks on your business, you can reduce the health check frequency, increase the health check interval, or change Layer 7 health checks to Layer 4 health checks. To ensure business continuity, we recommend that you enable the health check feature.

How do TCP listeners perform health checks?

TCP listeners support HTTP and TCP health checks.

  • TCP health checks: Listeners check the availability of backend ports by sending SYN packets.

  • HTTP health checks: Listeners check the availability of backend servers by sending HEAD or GET requests, which is similar to the way in which a browser accesses servers.

A TCP health check consumes fewer server resources. If the workloads on your backend servers are heavy, you can configure TCP health checks. Otherwise, you can configure HTTP health checks.

What happens if I set the weight of an ECS instance to zero?

If you set the weight of an ECS instance to zero, CLB no longer forwards network traffic to the ECS instance. However, this does not affect the health check result.

After you set the weight of an ECS instance to zero, the ECS instance no longer serves your workloads. You can set the weight of an ECS instance to zero when you restart or modify the configuration of the ECS instance.

What method does an HTTP listener use to perform health checks on backend ECS instances?

HTTP listeners perform health checks by sending HEAD requests.

ECS instances that do not support HEAD requests will fail health checks. We recommend that you run the following command on your ECS instances to access an IP address to check whether the ECS instances support HEAD requests:

curl -v -0 -I -H "Host:" -X HEAD http://IP:port

What is the IP address that HTTP listeners use to perform health checks on ECS instances?

CLB uses 100.64.0.0/10 for health checks. Make sure that requests sent from this CIDR block are allowed by the ECS instances. You do not need to configure a security group rule to allow access from the CIDR block 100.64.0.0/10 unless security rules such as iptables are configured. Permitting 100.64.0.0/10 does not increase potential risks because this CIDR block is reserved by Alibaba Cloud. IP addresses within the CIDR block are not allocated to users.

Why are the health check rates recorded in web logs different from the health check configurations in the console?

Health checks are performed by groups of servers to prevent single points of failure. CLB is deployed across multiple servers. Therefore, the health check rates recorded in logs are different from the configurations in the console.

How do I handle a health check failure caused by a faulty backend database?

  • Problem

    The static website www.example.com and the dynamic website aliyundoc.com are deployed on an ECS instance. CLB is used to provide load balancing services for the websites. The backend database is down. As a result, the HTTP 502 error occurs when www.example.com is accessed.

  • Possible causes

    The health check domain name is set to aliyundoc.com. When the ApsaraDB RDS instance or self-managed database is down, access to aliyundoc.com fails, which causes the health check failure.

  • Solutions

    Change the health check domain name to www.example.com.

Why does the log data indicate connection failure to a backend port even though the backend port has passed TCP health checks?

  • Problem

    The log data indicates frequent connection failure to a backend port of a TCP listener. A packet capture tool is used to identify the source of the connection requests. The result shows that the connection requests are sent by CLB. The packet capture tool has also captured RST packets sent by CLB.

  • Possible causes

    The issue is related to the health check mechanism.

    To perform a TCP health check, CLB sends an SYN packet to check the availability of a backend port. If an SYN-ACK packet is returned, CLB considers the backend port to be reachable and sends an RST packet to close the connection after the three-way handshake is completed. This reduces the loads on backend servers and prevents service interruptions. However, the application layer is not aware of the status of the TCP connection. The following section describes the data exchange process:

    1. CLB sends an SYN packet to the backend port.

    2. If the backend port works as expected, an SYN-ACK packet is returned.

    3. After CLB receives the response from the backend port, CLB considers the backend port to be reachable. In this case, the health check succeeds.

    4. Then, CLB sends an RST packet to the backend port to close the connection instead of sending service requests through the connection.

    After CLB completes the health check, the TCP connection is closed. The status of the TCP connection is not updated to the connection pool of services at the application layer, for example, Java connection pools. Therefore, the Connection reset by peer error occurs.

  • Solutions

    • Configure CLB to perform HTTP health checks instead of TCP health checks.

    • In addition, screen out the log entries that record requests from the CIDR block of CLB and ignore the related error messages.

Why does a backend server that works as expected fail a health check?

  • Problem

    The HTTP health check always fails, but when you run the curl -I command, the status code returned is normal.

  • Possible causes

    If the status code returned is not specified in the health check configuration in the console, the backend server fails the health check. For example, if you specified the HTTP 2xx status code in the console and another status code is returned, the backend server fails the health check.

    When you run the curl command on Tengine or NGINX, the result shows that the destination is reachable. However, when you run the echo command to access the test file test.html, you are directed to the default website and the HTTP 404 error code is returned, as shown in the following figure.

  • Solutions

    • Modify the main configuration file and comment out the default site.

    • Add the domain name that is used for health checks to the health check configuration.