All Products
Search
Document Center

Microservices Engine:What do I do if the connection between an application and an MSE Nacos instance times out?

Last Updated:May 11, 2023

This topic describes how to troubleshoot a connection timeout issue that may occur when an application attempts to connect to a Microservices Engine (MSE) Nacos instance.

Problem description

When an application attempts to connect to an MSE Nacos instance, the following error messages may be returned:

  • Connection timed out

  • Read Timeout

  • TimeoutException: Waited 3000 milliseconds

Possible causes

The connection timeout issue may be caused by the following reasons:

  • The network transmission between the client and the server fails. As a result, requests cannot be sent from the client to the server, or responses from the server cannot be returned to the client. In some cases, the client also considers that a connection timeout issue occurs if the request processing speed of the server is slow.

  • Access to the public endpoint is blocked by a network access control list (ACL).

  • A virtual private network (VPN) is used, but the VPN settings are invalid.

  • The processing thread of the client is blocked or abnormal, or the client cannot process data packets from the server at the earliest opportunity due to a full garbage collection (GC), an out-of-memory (OOM) error, or CPU resource preemption. As a result, the client considers that a connection timeout issue occurs.

Solutions

  • If the connection timeout error is reported for only one of the client nodes, the network transmission between the client node and the MSE Nacos instance may fail, or the client node is abnormal or blocked.

    In this case, run a command such as ping, telnet, or curl on the faulty client node to access the MSE Nacos instance. View the metric data on the client to check whether issues such as excessively high CPU load, frequent full GCs, or OOM occur. If the issues occur, the network transmission fails.

    ping ${mse.nacos.host}
    telnet ${mse.nacos.host}:8848
    curl ${mse.nacos.host}:8848/nacos/v1/ns/service/list
  • If a public endpoint is used, check whether access to the public endpoint is blocked by an ACL.

    For more information about how to check whether access to a public endpoint is blocked by an ACL, see Configure a public IP address whitelist.

  • If you use an internal endpoint, check whether the application is deployed in the same VPC in the same region as the MSE Nacos instance. If the application and MSE Nacos instance are deployed in different regions or belong to different VPCs, you need to deploy the application in the region or VPC where the MSE Nacos instance resides. You can also use Cloud Enterprise Network (CEN) to establish connections between VPCs. For more information about CEN, see What is CEN?

    On the Basic information page of the MSE console, you can view the region where the MSE Nacos instance resides and the VPC ID that corresponds to the internal endpoint.

  • If a VPN is used, check whether the VPN settings are valid. If the VPN settings are invalid, disable the VPN or modify the VPN settings and try again.

  • If the connection timeout error is reported for all client nodes, go to the Monitoring Center page in the MSE console, and view the metric data of the MSE Nacos instance.

    For more information about how to view the metric data of an MSE Nacos instance on the Monitoring Center page, see Monitor engines.

    • On the Overview tab, check whether the value of Queries per second or Operations per second of the MSE Nacos instance exceeds the transactions per second (TPS) value.

      For the TPS values that correspond to different specifications, see Estimate instance capabilities.

    • On the Number of connections monitoring tab, check whether the value of Number of client versions or the value of Number of long links exceeds the number of connections.

      For the number of connections that correspond to different specifications, see Estimate instance capabilities.

    • On the jvm Monitoring tab, check whether full GCs are frequently performed.

      Note

      If No data is displayed, full GCs are not performed.

    • If the network type of the MSE Nacos instance is Internet, click the Resource monitoring tab and check whether the inbound traffic or outbound traffic exceeds the bandwidth that is specified when you purchase the MSE Nacos instance.

    • On the Resource monitoring tab, check whether the memory usage or CPU load is close to or exceeds 100%. If the memory usage or CPU load is close to or exceeds 100%, throttling is triggered. Try to change the instance specifications to upgrade the configurations of the MSE Nacos instance.

      For more information about how to change the instance specifications, see Change instance specifications.

    • If the connection timeout issue occasionally occurs, configure a longer timeout period to prevent the issue.

      • If the version of your Java client is 1.0.0 to 1.4.X, add the following parameters to the Java virtual machine (JVM) parameters of the application processes:

        -D com.alibaba.nacos.client.naming.ctimeout=${Connection timeout period of a registry. Unit: milliseconds. Default value: 3000}
        -D com.alibaba.nacos.client.naming.rtimeout=${Request timeout period of a registry. Unit: milliseconds. Default value: 50000.}
        -D NACOS.CONNECT.TIMEOUT=${Connection timeout period of a configuration center. Unit: milliseconds. Default value: 1000}
      • If the version of your Java client is 2.0.0 to 2.1.1, upgrade the version of the Java client to 2.1.2 or later and then configure the timeout period.

      • If the version of your Java client is 2.1.2 or later, add the following parameters to the JVM parameters of the application processes:

        -Dnacos.remote.client.grpc.timeout=${Request timeout period. Unit: milliseconds. Default value: 3000}
        ## Check whether the connected server is healthy. If the connected server is unhealthy, reconnect the server to the client.
        -Dnacos.remote.client.grpc.server.check.timeout=${Timeout period of a server health check. Unit: milliseconds. Default value: 3000}
        ## Check whether the connection is healthy. If the connection is unhealthy, reconnect the server to the client.
        -Dnacos.remote.client.grpc.health.timeout=${Timeout period of a connection health check. Unit: milliseconds. Default value: 3000}