Stress testing methods for NLB - Server Load Balancer - Alibaba Cloud Documentation Center

Network Load Balancer (NLB) provides ultra-high performance and supports automatic scaling. This topic describes how to perform stress testing on NLB instances.

Stress testing frameworks

This section describes the standard stress testing framework and the single-virtual IP address (VIP) stress testing framework. In most scenarios, the standard stress testing framework is used. You can use the single-VIP stress testing framework only in a few scenarios. For more information, see Stress testing methods and Possible causes of low scores on stress tests.

Standard stress testing framework
Single-VIP stress testing framework

Stress testing methods

Stress testing metrics

NLB is tested based on three metrics: the number of new connections, the number of concurrent connections, and the data forwarding capacity (including both requests and responses). Each metric requires a different stress testing method.

We recommend that you use short-lived connections to test the number of new connections that can be maintained between an NLB instance and its backend servers.
We recommend that you use a heartbeat service that relies on a simple mechanism to prevent high bandwidth consumption. Make sure that the NLB instance has sufficient frontend ports to connect to clients if you use short-lived connections in a stress test.
We recommend that you use persistent connections to test the number of concurrent connections that can be maintained between an NLB instance and its backend servers.
We recommend that each persistent connection use a heartbeat service that relies on a simple mechanism to maintain session persistence. Make sure that the NLB instance has sufficient frontend ports to connect to clients if you use persistent connections in a stress test.
We recommend that you use persistent connections to test the forwarding capacity of an NLB instance. Persistent connections are suitable for testing the bandwidth limit or specific services.
Set the timeout period on the stress testing tool to a proper value, such as 5 seconds. If you set the timeout period to a large value, the average response time of stress tests increases, which makes it difficult to determine whether the NLB instance can withstand the load. If you set the timeout period to a small value, you can determine whether the NLB instance can withstand the load based on the request success rate shown in the test results.

Suggestions on listener configurations

We recommend that you use the following listener configurations:

Associate at least five elastic IP addresses (EIPs) with the NLB instance if the NLB instance supports up to 5,000 concurrent connections.

Suggestions on server group configurations

We recommend that you use the following server group configurations:

We recommend that you do not use consistent hashing that is based on source addresses. Otherwise, requests are forwarded to a specific backend server.
We recommend that you disable health checks to reduce the number of requests sent to backend servers.
We recommend that you use the single-VIP stress testing framework if you enable the client IP preservation feature. If you use the standard stress testing framework, the possibility of session conflicts between backend servers increases when the number of clients that are used in the stress test is small or consistent hashing based on source IP addresses is used.
After you disable the client IP preservation feature, the NLB instance uses its own IP address to establish connections with backend servers. This is because the system allocates only a limited number of source IP addresses to each NLB instance. To maintain concurrent connections, the system must ensure a sufficient number of backend servers for stress testing.

Stress testing tools

We recommend that you do not use Apache ab to perform stress tests. In high concurrency scenarios, the waiting time of Apache ab increases by increments of 3 seconds, such as 3 seconds, 6 seconds, and 9 seconds. Apache ab determines whether requests are successful based on the specified content length. If the NLB instance that you want to benchmark is associated with multiple backend servers, the actual length of the response content returned by these backend servers may be different from the specified content length. This makes the stress testing results inaccurate.

Possible causes of low scores on stress tests

Possible causes of low scores on stress tests:

Insufficient client ports
During a stress test, clients fail to establish connections with the NLB instance if the NLB instance does not have sufficient frontend ports. The NLB instance removes the timestamp property of TCP connections by default. As a result, the tw_reuse flag in the Linux stack becomes invalid. The tw_reuse flag is used to reuse connections that are in the time_wait state. Therefore, if this flag becomes invalid, connections in the time_wait state will accumulate and occupy the frontend ports of the NLB instance.
Solution: Use persistent connections instead of short-lived connections on clients. In addition, use Reset (RST) packets to close connections by setting the SO_LINGER socket option.
Full accept queues on backend servers
If the accept queue on the backend server is full, the backend server can no longer return syn_ack packets. As a result, the client times out.
Solution: Run the sysctl -w net.core.somaxconn=1024 command to change the value of net.core.somaxconn and restart the application on the backend server. The default value of net.core.somaxconn is 128.
Insufficient backend servers
After you disable the client IP preservation feature, the NLB instance uses its own IP address to establish connections with the backend servers. By default, each backend server supports up to 120,000 concurrent connections. If the number of backend servers is small, the number of concurrent connections is limited.
Solution: If the NLB instance uses TCP or UDP listeners, we recommend that you enable the client IP preservation feature. If the NLB instance uses listeners that use SSL over TCP, we recommend that you add more backend servers to the NLB instance or enable auto scaling.
The dependency of the application on the backend server becomes the performance bottleneck
The traffic loads on the backend server are below the performance limit of the backend server. However, the application on the backend server may depend on another application, such as a database. Therefore, the dependency may also limit the performance of the NLB instance in stress tests.
Solution: Clear applications that are no longer in use on the backend servers.
Unhealthy backend servers
If a backend server is declared unhealthy or the health status of the backend server changes frequently, the performance of the NLB instance in stress tests may be degraded.
Solution: Disable health checks to reduce the number of requests sent to the backend servers.
Session conflicts on backend servers
If you use the standard stress testing framework, the possibility of session conflicts between backend servers increases when the number of clients used in the stress test is small or consistent hashing based on source IP addresses is used. As a result, the backend servers frequently send RST packets to close connections, as shown in the following figure.
Solution: Use the single-VIP stress testing framework. For more information, see Stress testing frameworks.