When you remove a backend server or it fails a health check, established connections are not immediately terminated. These persistent connections can prevent services from shutting down gracefully or lead to request errors. To avoid this, you can use the connection draining feature of Application Load Balancer (ALB). When a backend server is removed or becomes unhealthy, connection draining allows existing connections to continue processing for a configured period. After this period, ALB actively closes the connections, ensuring a graceful service shutdown.
Use cases
Use connection draining in the following two scenarios.
Removing a backend server: Before you remove a backend server, set a longer connection draining timeout. This allows in-flight requests to be completed.
Health check failure: When a backend server fails a health check, set a shorter timeout. This helps terminate faulty connections quickly and prevents clients from experiencing errors.
Because both scenarios use the same connection draining feature, you need to set an appropriate connection draining timeout based on your specific business needs.
Scenario 1: Removing a backend server
This topic uses the scenario shown in the following figure. When you remove the backend server ECS01, ALB stops sending new requests to it. ECS01 then only processes in-flight requests and does not accept new requests.
If connection draining is disabled, connections to ECS01 are terminated abruptly, which may interrupt in-flight requests.
If connection draining is enabled and a timeout is set:
If the backend server ECS01 has in-flight requests, ALB closes the existing connections on ECS01 when the connection draining timeout is reached.
If the backend server ECS01 has no in-flight requests or active connections, ALB immediately completes the removal process without waiting for the connection draining timeout to elapse.
If the backend server ECS01 is still processing a request when the connection draining timeout is reached, the connection is terminated. The client then receives a 500-level error response. For example, if the connection draining timeout is set to 15 seconds but the request requires 30 seconds to process, the connection is terminated before ECS01 can send the response. In this case, the client receives a 500-level error response.
NoteIf you re-add ECS01 to the server group after removing it, the draining process for connections on the removed instance is not affected. Its state remains unchanged: it only processes in-flight requests and accepts no new ones. ALB closes these existing connections on ECS01 when the connection draining timeout is reached.
The following figure shows the state transition of ECS01 after it is removed from a server group where connection draining is enabled.
Scenario 2: Health check failure
When backend server ECS01 fails a health check, ALB stops sending new requests to it. At this point, ECS01 only processes in-flight requests and does not accept new ones.
If connection draining is disabled, ECS01 remains in this state until it passes the health check again, at which point it will start accepting new requests.
If connection draining is enabled and a timeout is set:
ALB closes the existing connections on ECS01 when the connection draining timeout is reached.
If the server group is updated (for example, the configuration of ECS01 is changed), the connection state of ECS01 does not change. It continues to only process in-flight requests and does not accept new ones. Even if ECS01 passes the health check after the update, ALB still closes the existing connections on ECS01 when the connection draining timeout is reached.
NoteWhen ALB closes the existing connections on ECS01 after the connection draining timeout is reached, ECS01 can accept new requests if it passes the health check. If it fails the health check, ECS01 does not accept new requests.
Connection draining is not triggered if a health check failure is caused by a configuration update. The system only initiates connection draining for health check failures that are caused by backend service issues.
The following figure shows the state transition of ECS01 after it fails a health check.
You can configure connection draining based on your business scenario. This topic uses Scenario 1: Removing a backend server as an example and demonstrates how ALB interrupts WebSocket and HTTP sessions.
Usage notes
Only Standard and WAF-enabled ALB instances support connection draining. Basic ALB instances do not support this feature.
Server groups of the Function Compute type do not support connection draining.
You can enable connection draining when you use the WebSocket protocol. In HTTP scenarios, requests have timeout limits. We recommend that you set the connection draining timeout to a value greater than the connection request timeout of the ALB instance to prevent HTTP requests from being terminated prematurely. The default connection draining timeout set by ALB is already greater than the connection request timeout. For more information about how to set the connection request timeout, see Add an HTTP listener.
Prerequisites
You have created a Standard or WAF-enabled ALB instance and a server group of type Server for the instance. This topic uses a Standard ALB instance as an example. For more information, see Create and manage an ALB instance and Create and manage a server group.
You have configured an HTTP listener for the ALB instance on port
80and associated the listener with the server group. For more information, see Add an HTTP listener.You have created two backend servers: ECS01 and ECS02. For more information, see Create an instance by using the wizard.
You have added the ECS02 instance to the server group, and clients can access the service running on ECS02. For more information, see Use ALB to achieve load balancing for IPv4 services and Use ALB to achieve load balancing for IPv6 services.
NoteThis topic uses an Alibaba Cloud Linux 3.2104 64-bit operating system for the client. Make sure that Python is installed on your client and on the backend server ECS01. If Python is not installed on your operating system, see the official Python website for installation instructions. This topic uses Python 3.x as an example.
In this topic, the backend server ECS02 runs a representative service. If you already have a similar server, you are not required to create a new one.
Procedure
This topic demonstrates how ALB, with connection draining enabled, handles WebSocket and HTTP requests.
WebSocket session interruption behavior
Step 1: Enable connection draining
This step describes how to enable connection draining for the server group from the prerequisites. You can also enable this feature when creating a server group.
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose .
On the Server Groups page, find the target server group and click its ID.
On the Details tab, find the Basic Information section and click Modify Basic Information.
In the Modify Basic Information dialog box, click Advanced Settings, and then enable Connection Draining.
Set Timeout Period to 300 seconds and click Save.
Step 2: Verify the results
Configure the server
Log on to the ECS01 instance. For more information, see Connection methods.
Run the following commands to create a directory named WebSocket and navigate to it.
mkdir WebSocket cd WebSocketRun the following commands to install the required dependencies.
pip install tornado pip install websocket-clientRun the following command to edit the server.py configuration file.
vim server.pyPress the
ikey to enter edit mode and add the following code to start a WebSocket service.#!/usr/bin/env python3 # encoding=utf-8 import tornado.websocket import tornado.ioloop import tornado.web from datetime import datetime # WebSocket handler class WebSocketHandler(tornado.websocket.WebSocketHandler): def open(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "WebSocket connection opened") def on_message(self, message): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "Received message:", message) self.write_message("Server received your message: " + message) def on_close(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "WebSocket connection closed") # Routing application = tornado.web.Application([ (r"/websocket", WebSocketHandler), ]) if __name__ == "__main__": print("WebSocket Server Start on 8080 ...") application.listen(8080) tornado.ioloop.IOLoop.current().start()After you modify the configuration, press the
Esckey, enter:wq, and press the Enter key to save and close the file.
Navigate to the directory where server.py is located and run the following command to start the WebSocket server.
python3 server.pyThe following response indicates that the WebSocket backend service has started.
Websocket Server Start on 8080 ...
Add backend server ECS01 to the server group
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose .
On the Server Groups page, find the target server group and click Modify Backend Server in the Actions column.
On the Backend Servers tab, click Add Backend Server. In the Add Backend Server panel, select ECS01 and click Next.
In the Ports/Weights step, select the ECS01 instance, set the port to
8080, and then click OK.
Configure the client
Log on to the client and open a command-line interface. Run the following commands to create a directory named WebSocket and navigate to it.
mkdir WebSocket cd WebSocketRun the following command to install the required dependency.
pip install websocket-clientRun the following command to edit the client.py file.
vim client.pyPress the
ikey to enter edit mode and add the following code for a WebSocket client to access the service.#!/usr/bin/env python3 # encoding=utf-8 import websocket import time from datetime import datetime def on_message(ws, message): print("Received message from server:", message) if __name__ == "__main__": ws = websocket.WebSocket() ws.connect("ws://<domain_name>:80/websocket") # Replace <domain_name> with your actual domain name. print("WebSocket connection opened") try: while True: current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Sending time:", formatted_time) ws.send("Hello, Server!") result = ws.recv() on_message(ws, result) time.sleep(1) except Exception: print("WebSocket connection closed")After you modify the configuration, press the
Esckey, enter:wq, and press the Enter key to save and close the file.
Navigate to the directory where client.py is located and run the following command to access the server ECS01.
python3 client.pyThe following response indicates that the access is successful.
WebSocket connection opened Sending time: 2024-04-28 17:00:53 Received message from server: Server received your message: Hello, Server! Sending time: 2024-04-28 17:00:54 Received message from server: Server received your message: Hello, Server!The server ECS01 returns the following response.
WebSocket Server Start on 8080 ... Time: 2024-04-28 17:00:53 WebSocket connection opened Time: 2024-04-28 17:00:53 Received message: Hello, Server! Time: 2024-04-28 17:00:54 Received message: Hello, Server!
Remove the backend server
You must set the connection draining timeout before you remove the backend server.
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose .
Find the target server group and click its ID.
Click the Backend Servers tab, find the target backend server ECS01, and then click Remove in the Actions column.
In the Remove dialog box, click OK.
Wait for connection draining to complete
The connection draining timeout is set to 300 seconds in this topic. The test results show that ALB terminates the connection approximately 300 seconds after ECS01 is removed.
In the test results, the time difference between when the WebSocket connection on the ECS01 server opens and when it closes is 330 seconds. The connection draining timeout is the period from when the backend server ECS01 is removed until the WebSocket connection is closed, which is approximately 300 seconds.
The client returns the following response.
Sending time: 2024-04-28 17:06:23 Received message from server: Server received your message: Hello, Server! Sending time: 2024-04-28 17:06:24 WebSocket connection closedThe ECS01 server instance returns the following response.
Time: 2024-04-28 17:06:22 Received message: Hello, Server! Time: 2024-04-28 17:06:23 Received message: Hello, Server! Time: 2024-04-28 17:06:23 WebSocket connection closed
HTTP session interruption behavior
In HTTP scenarios, the response received by the client depends on the connection draining timeout, connection request timeout, and the backend server processing time.
If connection draining timeout < backend server processing time, the connection is interrupted before ECS01 finishes sending the response. In this case, the client receives a 500-level error response.
If backend server processing time > connection request timeout, the request to ECS01 times out. In this case, the client receives a 504 error response.
This topic uses a connection draining timeout of 15 seconds and a backend server processing time of 30 seconds. In this scenario, the connection to ECS01 is interrupted before the response is sent, and the client receives a 500-level error response.
In Step 2: Set the ALB connection request timeout, the connection request timeout is set to 60 seconds (the default value), which is greater than the backend server processing time (30 seconds). Therefore, a 504 error is not returned. Instead, a 500-level error is returned because the connection draining timeout (15 seconds) is less than the backend server processing time (30 seconds).
In this topic, the
time.sleepfunction in the Python code is used to simulate the backend server processing time.
Step 1: Enable connection draining
This step describes how to enable connection draining for the server group from the prerequisites. You can also enable this feature when creating a server group.
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose .
On the Server Groups page, find the target server group and click its ID.
On the Details tab, find the Basic Information section and click Modify Basic Information.
In the Modify Basic Information dialog box, click Advanced Settings, and then enable Connection Draining.
Set Timeout Period to 15 seconds and click Save.
Step 2: Set the ALB connection request timeout
Log on to the ALB console.
In the top navigation bar, select the region where the ALB instance is deployed.
On the Instances page, find the target ALB instance and click its ID.
Click the Listener tab, find the target HTTP listener, and then click the listener ID.
In the Basic Information area, click Modify Listener.
In the Modify Listener dialog box, click Modify next to Advanced Settings.
Set the Connection Request Timeout to 60 seconds (the default value), and then click Save.
Step 3: Configure domain name resolution
In production environments, we recommend that you use a custom domain name and point it to the ALB instance domain name by creating a CNAME record.
In the left-side navigation pane, choose .
On the Instances page, copy the domain name of the ALB instance.
Perform the following steps to create a CNAME record:
NoteIf your domain name is not registered by using Alibaba Cloud Domains, you must add your domain name to Alibaba Cloud DNS before you can configure a DNS record. For more information, see Manage domain names.
Log on to the Alibaba Cloud DNS console.
On the Authoritative DNS Resolution page, find your domain name and click Settings in the Operations column.
On the Settings tab of the domain name details page, click Add Record.
In the Add Record panel, configure the parameters and click OK. The following table describes the parameters.
Parameter
Description
Record Type
Select CNAME from the drop-down list.
Hostname
Enter the prefix of the domain name. In this example, @ is entered.
NoteIf you use a root domain name, enter
@.Query Source
Select Default.
Record Value
Enter the CNAME, which is the domain name of the ALB instance.
TTL
Select a time-to-live (TTL) value for the CNAME record to be cached on the DNS server. In this example, the default value is used.
Step 4: Verify the results
Log in and configure the server
Log on to the ECS01 instance. For more information, see Select a method to connect to an ECS instance.
Run the following command to create an HTTP folder and go to the HTTP directory.
mkdir http cd httpRun the following command to edit the http_server.py configuration file.
vim http_server.pyPress the
ikey to enter edit mode. Then, add the following content to start an HTTP server.#!/usr/bin/env python3 # encoding=utf-8 from http.server import SimpleHTTPRequestHandler, HTTPServer from datetime import datetime import time class DelayedHTTPRequestHandler(SimpleHTTPRequestHandler): def do_GET(self): current_time = datetime.now() formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S") print("Time:", formatted_time, "GET request received. Responding after a 30 second delay....") time.sleep(30) # Use the time.sleep function to simulate the processing time of the backend server. SimpleHTTPRequestHandler.do_GET(self) PORT = 8080 server = HTTPServer(("", PORT), DelayedHTTPRequestHandler) print(f"Serving HTTP on 0.0.0.0 port {PORT} (http://0.0.0.0:{PORT}/) ...") server.serve_forever()After you finish editing, press the
Esckey, enter:wq, and press Enter to save and close the file.
Go to the directory where http_server.py is located. Run the following command to start the HTTP server.
python3 http_server.pyIf the following response is returned, the HTTP backend server is started.
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ...
Add backend server ECS01 to the server group
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
On the Server Groups page, find the target server group and click Modify Backend Server in the Actions column.
On the Backend Servers tab, click Add Backend Server. In the Add Backend Server panel, select ECS01 and click Next.
In the Ports/Weights wizard, select the ECS01 instance, set the port to
8080, and then click OK.
Log in and configure the client
Log on to the client. Open a command-line window. Run the following command to access the backend server ECS01.
curl http://<domain_name>:80/ -vThe following response indicates that the ALB can access the backend service.
* About to connect() to www.example.com port 80 (#0) * Trying 10.X.X.225... * Connected to www.example.com (10.X.X.225) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: www.example.com > Accept: */*The server receives the following response.
Serving HTTP on 0.0.0.0 port 8080 (http://0.0.0.0:8080/) ... Time: 2024-02-07 13:57:33 Received a GET request. Responding after a 30-second delay....
Remove backend servers
Set the graceful connection timeout before removing the backend server.
Log on to the ALB console.
In the top navigation bar, select the region where the server group is deployed.
In the left-side navigation pane, choose .
Find the target server group and click the server group ID.
Click the Backend Servers tab, find the target backend server ECS01, and then in the Actions column, click Remove.
In the Remove dialog box, click OK.
Waiting for connection draining
The test results show that when the graceful connection timeout set for the ALB < the backend server processing time, the client receives a 500 error response.
* About to connect() to www.example.com port 80 (#0)
* Trying 10.X.X.224...
* Connected to www.example.com (10.XX.XX.224) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: www.example.com
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 07 Feb 2024 06:02:24 GMT
< Content-Type: text/html
< Content-Length: 186
< Connection: close
< Via: HTTP/1.1 SLB.87
<
<html>
<head><title>500 Internal Server Error</title></head>
<body bgcolor="white">
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Closing connection 0Related documents
To enable connection draining when you create a server group, see Create and manage server groups.
To gracefully roll out your services, enable the slow start feature. For more information, see Achieve graceful service rollouts with ALB slow start.
To learn about the WebSocket and HTTP protocols, see Add an HTTP listener and Push real-time information by using the WebSocket protocol with ALB.