dedicated gateway timeout configuration best practices - Platform For AI

Properly configuring gateway timeouts is crucial for dedicated gateway stability and a positive user experience. Improper settings can cause unexpected connection resets, request backlogs, or even cascading failures. This topic covers two core timeout types for dedicated gateways: Idle Connection Timeout and Request Timeout. It provides client configuration recommendations, code samples for Java, Go, and the Alibaba Cloud Elastic Algorithm Service (EAS) SDK, and an analysis of common timeout issues to help you build more stable and efficient services.

Background information

Why configure timeouts?

In a distributed system, services often have long call chains. Latency or failure at any point can cause the entire request chain to slow down or fail. Timeout mechanisms help solve these problems. By setting appropriate timeouts, you can:

Prevent resource exhaustion: Stop clients or gateways from waiting indefinitely for unresponsive backend services. This practice avoids tying up resources, such as connections and threads, which can lead to cascading failures.
Improve user experience: A fail-fast approach is better than a long, unresponsive wait. This approach gives users immediate feedback so they can retry or take other actions.
Ensure system stability: By releasing resources promptly, timeouts prevent slow or failing services from exhausting system resources and protect the overall stability of the system.

Configure the idle connection timeout

The Idle Connection Timeout specifies the maximum time that an inactive persistent connection can remain open before the gateway closes it. This setting is essential for managing the connection pool and releasing inactive resources.

Configuration recommendations

The dedicated gateway has the following fixed idle connection timeouts, which you cannot modify:

Gateway as a server (facing the client): Fixed at 600 seconds. The gateway closes a connection between a client and the gateway if it is idle for longer than this period.
Gateway as a client (facing the model service): Fixed at 30 seconds. The gateway closes a connection between the gateway and a backend model service if it is idle for longer than this period.

Set the client's idle connection timeout to a value less than the dedicated gateway's idle connection timeout of 600 seconds. This practice ensures that the client actively manages and closes connections, which prevents errors that occur when the client tries to use a connection that the gateway has already closed.

The following diagram illustrates the connection path:

Client configuration examples

The following examples show how to manage or set the client idle connection timeout in different programming languages.

Important

These examples are for reference only and must not be used directly in a production environment. You must configure parameters based on your system's traffic, load, and the specific version of the client that you are using.

Java

If you use Apache HttpClient 4.x: The connection manager of Apache HttpClient manages idle connections by periodically calling the closeIdleConnections() method. You must start a separate thread to perform this operation.

import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.client.config.RequestConfig;
import java.util.concurrent.TimeUnit;

public class HttpClientIdleTimeout {
    public static void main(String[] args) throws InterruptedException {
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
        cm.setDefaultMaxPerRoute(20); // Example value: Maximum connections per route
        cm.setMaxTotal(100); // Example value: Maximum total connections

        RequestConfig requestConfig = RequestConfig.custom()
                .setConnectTimeout(5000) // Example value: Connection timeout 5 seconds
                .setSocketTimeout(10000) // Example value: Read data timeout 10 seconds
                .build();

        try (CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(cm)
                .setDefaultRequestConfig(requestConfig)
                .build()) {

            // Start a backend thread to clean up idle connections periodically
            Thread cleanerThread = new Thread(() -> {
                try {
                    while (!Thread.currentThread().isInterrupted()) {
                        Thread.sleep(5000); // Example value: Check every 5 seconds
                        // Close connections that have been idle for more than 500 seconds (less than the gateway's idle timeout of 600 seconds)
                        cm.closeIdleConnections(500, TimeUnit.SECONDS);
                        // Close expired connections (for example, connections closed by the server)
                        cm.closeExpiredConnections();
                    }
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt(); // Reset the interrupt status
                }
            });
            cleanerThread.setDaemon(true); // Set as a daemon thread to exit when the main thread exits
            cleanerThread.start();

            // Execute HTTP requests...
            // For example: httpClient.execute(new HttpGet("http://your-gateway-url"));

            // Simulate the program running for a while
            Thread.sleep(60000); // Example value: Run for 1 minute

            // Stop the cleaner thread (in a real application, you should stop it gracefully when the application shuts down)
            cleanerThread.interrupt();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Go

The net/http package in Go provides fine-grained control over the connection pool through the Transport struct. Ensure that the IdleConnTimeout value is less than 600 seconds.

package main

import (
    "net/http"
    "time"
)

func main() {
    // Create a custom Transport
    tr := &http.Transport{
        MaxIdleConns:        100,              // Maximum number of idle connections
        IdleConnTimeout:     500 * time.Second, // Idle connection timeout, for example, 500 seconds (less than 600 seconds)
        DisableKeepAlives:   false,            // Enable Keep-Alive
    }
}

EAS SDK (Java)

For the Alibaba Cloud EAS SDK, you can enable idle connection cleanup and set the idle connection timeout in the client configuration.

import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;

public class EasSdkTimeoutJava {
    public static void main(String[] args) {
        // 1. Global client configuration
        HttpConfig httpConfig = new HttpConfig();
        // Enable the idle connection cleanup switch, unit: milliseconds. It is recommended to configure this based on client conditions.
        httpConfig.setConnectionCleanupInterval(5000);
        // Set the idle connection timeout to be less than the gateway's idle timeout of 600 seconds
        httpConfig.setIdleConnectionTimeout(500);

        PredictClient client = new PredictClient(httpConfig);
        client.setEndpoint("your-eas-service-endpoint");
        client.setModelName("your-model-name");
        // client.setToken("your-token"); // If authentication is required
        ...
    }
}

Configure the request timeout

The Request Timeout covers the entire lifecycle of a request, from establishing a TCP connection and sending the request to receiving the complete response. This is the most common client-side timeout.

Configuration recommendations

You can adjust the request timeout based on your business needs. The following recommendations are for reference only and are not strict rules. In a production environment, you must balance factors such as fault tolerance, response time, and potential network jitter to find the optimal setting for your business scenario.

Dedicated gateway request timeout configuration: Set a reasonable timeout based on the business logic and expected processing time of the service.
- Default value: 10 minutes (600 seconds).
- User-defined settings (not currently supported by Application Load Balancer (ALB)-based dedicated gateways): In the service configuration file, you can customize the request timeout for a specific service by setting the metadata.rpc.keepalive field. The gateway reads this value and applies it as the request timeout. For more information, see metadata parameter descriptions.
  Important
  Short-lived connections are constrained by both request timeouts and idle connection timeouts. Because the idle connection timeout for a dedicated gateway is fixed at 600 seconds, setting a request timeout greater than 600 seconds has no effect. For long-running tasks that exceed 10 minutes, you can use one of the following solutions:
  - Streaming: for scenarios such as large file downloads or AI content generation.
  - WebSocket: for real-time, bidirectional communication scenarios.
Client request timeout configuration: To avoid false timeout errors that occur when the client times out while the server is still processing the request, you can set the client request timeout to be slightly longer than the server request timeout.
- A common recommendation is Client timeout = Server timeout + a small buffer (for example, 1 to 5 seconds).
- The buffer accounts for minor network latency and fluctuations in the server-side processing time.

The following diagram illustrates the connection path:

Client configuration examples

The following examples show how to configure the client-side request timeout in different programming languages.

Important

Java

Using Apache HttpClient 4.x:

import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

public class ApacheHttpClientTimeout {
    public static void main(String[] args) {
        // The recommended client request timeout should be slightly greater than or equal to the gateway request timeout (600 seconds by default). Here, it is set to 610 seconds.
        RequestConfig requestConfig = RequestConfig.custom()
                .setConnectTimeout(5000) // Example value: Connection timeout, in milliseconds
                .setSocketTimeout(610000) // Data transmission timeout (read timeout), in milliseconds (610 seconds)
                .build();
    }
}

Go

The net/http package in Go provides several ways to set a request timeout. The most common methods are setting the Timeout field on an http.Client or using context.WithTimeout to set a timeout for a single request.

package main

import (
    "context"
    "fmt"
    "io"
    "net/http"
    "time"
)

func main() {
    // The recommended client request timeout should be slightly greater than or equal to the gateway request timeout (600 seconds by default). Here, it is set to 610 seconds.
    client := &http.Client{
        Timeout: 610 * time.Second, // Timeout for the entire request
    }

    req, err := http.NewRequest("GET", "http://your-gateway-url", nil)
    if err != nil {
        fmt.Println("Error creating request:", err)
        return
    }

    // You can also set a shorter timeout for a single request
    ctx, cancel := context.WithTimeout(req.Context(), 610*time.Second) // 610 seconds
    defer cancel()
    req = req.WithContext(ctx)

    resp, err := client.Do(req)
    if err != nil {
            fmt.Println("Error sending request:", err)
            // Check if it is a timeout error
            if t, ok := err.(interface{ Timeout() bool }); ok && t.Timeout() {
                    fmt.Println("Request timed out!")
            }
            return
    }
    defer resp.Body.Close()

    body, err := io.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Error reading response body:", err)
        return
    }
    fmt.Printf("Response Status: %s\n", resp.Status)
    fmt.Printf("Response Body: %s\n", body)
}

EAS SDK (Java)

For the Alibaba Cloud EAS SDK, you can set the connection and read timeouts in either the global client configuration or at the request level.

import com.aliyun.openservices.eas.predict.http.PredictClient;
import com.aliyun.openservices.eas.predict.http.HttpConfig;

public class EasSdkTimeoutJava {
    public static void main(String[] args) {
        // 1. Global client configuration
        HttpConfig httpConfig = new HttpConfig();
        // Connection timeout
        httpConfig.setConnectTimeout(5);
        // The recommended client request timeout should be slightly greater than or equal to the gateway request timeout (600 seconds by default). Here, it is set to 610 seconds.
        httpConfig.setReadTimeout(610); // Read timeout
       
    }
}

FAQ

Q: Improper idle connection timeout settings

Scenario 1: The client idle connection timeout is greater than the dedicated gateway's server-side idle connection timeout (client idle timeout > 600 seconds)

Problem: The client assumes that a connection is still valid, but the dedicated gateway has already closed the connection due to inactivity. When the client attempts to reuse this connection, the request fails.
Common client errors:
- Connection reset by peer
- Broken pipe
- java.net.SocketException: Connection reset (Java)
- requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) (Python requests)
- read: connection reset by peer (Go)
- The client might receive an HTTP 503 Service Unavailable error. This error typically occurs when the client attempts to reuse a closed connection and the gateway cannot establish a new connection or forward the request to the backend service in time.

Scenario 2: The dedicated gateway's client-side idle connection timeout is greater than the model service's idle connection timeout (model service idle timeout < 30 seconds)

Problem: The dedicated gateway assumes that a connection is still valid, but the backend model service has already closed the connection due to inactivity. When the gateway tries to reuse this connection to forward a request, it discovers that the connection is broken.
Potential dedicated gateway error: The gateway logs show connection reset errors that are similar to the client errors described in the preceding section.
Potential client error: The client typically receives an HTTP 503 error code from the dedicated gateway. This error indicates that the gateway could not forward the request to the backend service.

Q: Improper request timeout settings

Scenario 1: The client request timeout is shorter than the dedicated gateway's request timeout

Problem: The client's configured request timeout is too short. This causes the client to abort the connection before it receives a complete response, which leads to frequent false timeout errors that disrupt business operations.
Important
In some scenarios, a client may intentionally set a short request timeout to implement a fail-fast strategy. Therefore, the recommended timeout value is for reference only and can be adjusted based on specific needs. The client timeout does not always need to be greater than the server-side timeout. This advice applies to common use cases. You must adapt it to your specific business context.
Common client errors:
The client frequently throws its own timeout exceptions, such as:
- java.net.http.HttpTimeoutException (Java Http Client)
- java.net.SocketTimeoutException: Read timed out (Java Apache HttpClient)
- requests.exceptions.ReadTimeout (Python requests)
- context deadline exceeded (Go, triggered by context.WithTimeout or Client.Timeout)

Scenario 2: The dedicated gateway request timeout is shorter than the actual model service processing time

Problem: The timeout that is configured on the dedicated gateway for forwarding requests to a model service (default 10 minutes or a custom value) is shorter than the time that the model service actually needs to process the request. As a result, the gateway stops waiting before the model service finishes and returns a timeout error to the client.
Client-side error: The client typically receives an HTTP 504 Gateway Timeout error from the dedicated gateway. This error indicates that the gateway timed out while it was waiting for a response from the backend service. The client might also receive an HTTP 502 Bad Gateway or HTTP 500 Internal Server Error error, depending on how the model service handles its internal timeout and reports errors to the gateway.
Model service status: The model service might still be processing the request. The logs of the model service do not show a timeout error but might indicate that the request was interrupted if the gateway closed the connection to the model service after the timeout.
Troubleshooting recommendations: To troubleshoot these errors, you can check the logs on the client, gateway, and model service.
- Check client logs: Review the specific exception types and stack traces that are thrown by the client.
- Check dedicated gateway logs: Examine the request logs of the gateway and pay special attention to error codes and forwarding statuses that are related to the failed request.
- Check model service logs: If the gateway returns a 5xx error code, you can inspect the model service logs to confirm whether an internal error occurred or processing took too long.