When an application fails to connect to a Microservices Engine (MSE) Nacos instance, the following error messages may appear in the application logs:
Connection timed outRead TimeoutTimeoutException: Waited 3000 millisecondsUse the following sections to diagnose and resolve each root cause, from network failures to client-side resource exhaustion.
Before you begin
Determine whether the timeout affects a single client node or all client nodes. This distinction narrows the scope of your investigation:
Single node -- The issue is likely network-related or caused by resource exhaustion on that node. Start with Check network connectivity.
All nodes -- The issue is likely server-side: capacity limits, resource saturation, or access control misconfiguration. Start with Check server-side metrics.
Check network connectivity
If only one client node reports the timeout, verify that it can reach the MSE Nacos instance.
Run the following commands on the affected node. Replace ${mse.nacos.host} with the endpoint of your MSE Nacos instance.
ping ${mse.nacos.host}
telnet ${mse.nacos.host} 8848
curl ${mse.nacos.host}:8848/nacos/v1/ns/service/listInterpreting the results:
| Command | Success | Failure |
|---|---|---|
ping | Replies with round-trip times | Request timeout or 100% packet loss |
telnet | Connected to ... | Connection timed out or Connection refused |
curl | Returns a JSON response with a service list | curl: (7) Failed to connect or curl: (28) Connection timed out |
If any command fails, the network path between the client and the MSE Nacos instance is broken. Proceed to the following sections to identify the cause.
Public endpoint blocked by an ACL
If the application connects through a public endpoint, an access control list (ACL) may block the traffic.
Verify that the client IP address is included in the whitelist. For instructions, see Configure a public IP address whitelist.
VPC or region mismatch (internal endpoint)
If the application connects through an internal endpoint, it must be deployed in the same VPC and the same region as the MSE Nacos instance.
To verify the region and VPC ID of your MSE Nacos instance, go to the Basic information page of the MSE console.
If the application and the MSE Nacos instance are in different VPCs or regions, choose one of the following approaches:
Redeploy the application to the VPC and region where the MSE Nacos instance resides.
Use Cloud Enterprise Network (CEN) to connect the two VPCs. For more information, see What is CEN?
Invalid VPN settings
If a virtual private network (VPN) is used, verify that the VPN tunnel is active and the routing rules are correct. If the settings are invalid, disable the VPN or update the configuration, and then retry the connection.
Client-side resource exhaustion
Even when the network is healthy, the client may fail to process responses in time due to local resource pressure. Check the following metrics on the affected node:
CPU load -- Sustained high CPU usage delays packet processing.
Full garbage collection (GC) -- Frequent full GCs in the Java Virtual Machine (JVM) cause long pause times.
Out of memory (OOM) -- An OOM error prevents the client from allocating buffers for incoming data.
If any of these conditions exist, resolve the resource issue on the client node before further investigation.
Check server-side metrics
If all client nodes report timeouts, the MSE Nacos instance itself may be under pressure. Open the Monitoring Center page in the MSE console to inspect the following metrics. For details, see Monitor engines.
QPS and operations per second
On the Overview tab, check whether Queries per second or Operations per second exceeds the transactions per second (TPS) limit for your instance specifications.
For TPS limits by specification, see Estimate instance capabilities.
Connection count
On the Number of connections monitoring tab, check whether Number of client versions or Number of long links exceeds the connection limit for your instance specifications.
For connection limits by specification, see Estimate instance capabilities.
JVM health
On the jvm Monitoring tab, check whether full GCs are performed frequently.
If No data is displayed, no full GCs have occurred.
Network bandwidth (Internet network type only)
If the MSE Nacos instance uses the Internet network type, open the Resource monitoring tab and check whether inbound traffic or outbound traffic exceeds the bandwidth purchased for the instance.
CPU and memory utilization
On the Resource monitoring tab, check whether memory usage or CPU load is close to or exceeds 100%. High utilization triggers throttling and causes timeouts.
To resolve this, upgrade the instance specifications. For instructions, see Change instance specifications.
Configure timeout parameters for intermittent timeouts
If the timeout occurs only occasionally and the root causes above have been ruled out, increase the client-side timeout values.
The JVM parameters differ by Nacos Java client version.
Version 1.0.0 to 1.4.x
Add the following JVM parameters to the application process:
-Dcom.alibaba.nacos.client.naming.ctimeout=<connection-timeout-ms>
-Dcom.alibaba.nacos.client.naming.rtimeout=<request-timeout-ms>
-DNACOS.CONNECT.TIMEOUT=<config-center-connection-timeout-ms>| Parameter | Description | Default |
|---|---|---|
com.alibaba.nacos.client.naming.ctimeout | Connection timeout for the service registry (ms) | 3000 |
com.alibaba.nacos.client.naming.rtimeout | Request timeout for the service registry (ms) | 50000 |
NACOS.CONNECT.TIMEOUT | Connection timeout for the configuration center (ms) | 1000 |
Version 2.0.0 to 2.1.1
These versions do not support configurable timeout parameters. Upgrade the Java client to version 2.1.2 or later, then configure the parameters described in the next section.
Version 2.1.2 or later
Add the following JVM parameters to the application process:
-Dnacos.remote.client.grpc.timeout=<request-timeout-ms>
-Dnacos.remote.client.grpc.server.check.timeout=<server-health-check-timeout-ms>
-Dnacos.remote.client.grpc.health.timeout=<connection-health-check-timeout-ms>| Parameter | Description | Default |
|---|---|---|
nacos.remote.client.grpc.timeout | gRPC request timeout (ms) | 3000 |
nacos.remote.client.grpc.server.check.timeout | Server health check timeout. If the server is unhealthy, the client reconnects. (ms) | 3000 |
nacos.remote.client.grpc.health.timeout | Connection health check timeout. If the connection is unhealthy, the client reconnects. (ms) | 3000 |