This topic provides answers to some frequently asked questions about the health check feature of Classic Load Balancer (CLB).

How do health checks work?

CLB performs health checks to check the availability of backend Elastic Compute Service (ECS) instances. If an ECS instance is unhealthy, CLB does not forward requests to the ECS instance until the ECS instance recovers.

CLB uses 100.64.0.0/10 for health checks. Make sure that requests sent from this CIDR block are allowed by the ECS instances. You do not need to configure a security group rule to allow access from 100.64.0.0/10 unless you have configured security rules such as iptables.
Note 100.64.0.0/10 is reserved by Alibaba Cloud and IP addresses from this CIDR block are not allocated to users. You can allow access from this CIDR block. This does not increase the potential for security risks.

For more information, see Health check overview.

What are the recommended configurations for health checks?

Network traffic frequently switching between backend servers may compromise service availability. To prevent this issue, network traffic is switched between backend servers only after the backend servers consecutively pass or fail health checks during the specified time period. For more information, see Configure health check.

The following table describes the recommended health check configurations for TCP, HTTP, and HTTPS listeners.

Parameter Recommended value
Response Timeout 5. Unit: seconds.
Health Check Interval 2. Unit: seconds.
Unhealthy Threshold 3

The following table describes the recommended health check configurations for UDP listeners.

Parameter Recommended value
Response Timeout 10. Unit: seconds.
Health Check Interval 5. Unit: seconds.
Unhealthy Threshold 3
Healthy Threshold 3
Note We recommend that you use these settings to ensure that your service recovers immediately after a backend server fails health checks. You can specify a shorter response timeout period as needed. However, you must make sure that the specified timeout period is longer than the normal response time of your backend server.

Can I disable the health check feature?

Yes, you can disable the health check feature. For more information, see Disable the health check feature.

Note If you disable the health check feature, requests may be distributed to unhealthy ECS instances. This may cause service interruptions. Therefore, we recommend that you enable the health check feature.

How do TCP listeners perform health checks?

TCP listeners support HTTP and TCP health checks.

  • TCP health checks: Listeners check the availability of backend ports by sending SYN packets.
  • HTTP health checks: Listeners check the availability of backend servers by sending HEAD or GET requests, which is similar to the way in which a browser accesses servers.

A TCP health check consumes fewer server resources. If the workloads on your backend servers are heavy, you can configure TCP health checks. Otherwise, you can configure HTTP health checks.

What happens if I set the weight of an ECS instance to zero?

If you set the weight of an ECS instance to zero, CLB no longer forwards network traffic to the ECS instance. However, this does not affect the health check result.

After you set the weight of an ECS instance to zero, the ECS instance no longer serves your workloads. You can set the weight of an ECS instance to zero when you restart or modify the configuration of the ECS instance.

How do HTTP listeners perform health checks?

HTTP listeners perform health checks by sending HEAD requests.

ECS instances that do not support HEAD requests will fail the health checks. We recommend that you run the following command on your ECS instances with their IP addresses to check whether the ECS instances support HEAD requests:
curl -v -0 -I -H "Host:" -X HEAD http://IP:port

What is the IP address that HTTP listeners use to perform health checks on ECS instances?

CLB uses 100.64.0.0/10 for health checks. Make sure that requests sent from this CIDR block are allowed by the ECS instances. You do not need to configure a security group rule to allow access from 100.64.0.0/10 unless you have configured security rules such as iptables.
Note 100.64.0.0/10 is reserved by Alibaba Cloud and IP addresses from this CIDR block are not allocated to users. You can allow access from this CIDR block. This does not increase the potential for security risks.

Why are the health check rates recorded in web logs different from the health check configurations in the console?

Health checks are performed by groups of severs to prevent single points of failure. CLB is deployed across multiple servers. Therefore, the health check rates recorded in logs are different from the configurations in the console.

Do health checks consume Internet resources?

No, health checks do not consume Internet resources because CLB uses private IP addresses to perform health checks.

How do I handle a health check failure caused by a faulty backend database?

  • Description

    The static website www.example.com and the dynamic website aliyundoc.com are deployed on an ECS instance. CLB is used to provide load balancing services for the websites. The backend database is down. As a result, the HTTP 502 error occurs when www.example.com is accessed.

  • Causes

    The health check domain name is set to aliyundoc.com. When the ApsaraDB RDS instance or self-managed database is down, access to aliyundoc.com fails, which causes the health check failure.

  • Solutions

    Change the health check domain name to www.example.com.

Why does the log data indicate connection failure to a backend port even though the backend port has passed TCP health checks?

  • Description

    The log data indicates frequent connection failure to a backend port of a TCP listener. A packet capture tool is used to identify the source of the connection requests. The result shows that the connection requests are sent by CLB. The packet capture tool has also captured RST packets sent by CLB.

  • Causes

    The issue is related to the health check mechanism.

    To perform a TCP health check, CLB sends a SYN packet to check the availability of a backend port. If a SYN-ACK packet is returned, CLB considers the backend port to be reachable and sends a RST packet to close the connection after the three-way handshake is completed. This reduces the loads on backend servers and prevents service interruptions. However, the application layer is not aware of the status of the TCP connection. The following section describes the data exchange process:
    1. CLB sends a SYN packet to the backend port.
    2. If the backend port works as expected, a SYN-ACK packet is returned.
    3. After CLB receives the response from the backend port, CLB considers the backend port to be reachable. In this case, the health check succeeds.
    4. Then, CLB sends a RST packet to the backend port to close the connection instead of sending service requests through the connection.

    After CLB completes the health check, the TCP connection is closed. The status of the TCP connection is not updated to the connection pool of services at the application layer, for example, Java applications. Therefore, the Connection reset by peer error occurs.

  • Solutions
    • Configure CLB to perform HTTP health checks instead of TCP health checks.
    • In addition, screen out the log entries that record requests from the CIDR block of CLB and ignore the related error messages.

Why does a backend server that works as expected fail a health check?

  • Description
    A backend server consecutively fails HTTP health checks. However, after you run the curl -I command, the status code returned indicates that the backend server works as expected.
    echo -e 'HEAD /test.html HTTP/1.0\r\n\r\n' | nc -t 192.168.0.1 80
  • Causes

    If the status code returned is not specified in the health check configuration in the console, the backend server fails the health check. For example, you specified the HTTP 2xx status code in the console and another status code is returned, the backend server fails the health check.

    When you run the curl command on Tengine or NGINX, the result shows that the destination is reachable. However, when you run the echo command to access the test file test.html, you are directed to the default website and the HTTP 404 error code is returned. The following figure shows the result.

  • Solutions
    • Remove the default website from the main configuration file by commenting out the corresponding code block.
    • Add the domain name that is used for health checks in the health check configuration.