All Products
Search
Document Center

Microservices Engine:How to implement graceful release or graceful start and shutdown

Last Updated:Aug 04, 2025

This topic describes how to gracefully release, start, or shut down a microservice application using Microservices Governance.

Problem description

When a downstream microservice application is released or restarted, an upstream application may initiate a call to the downstream application that is being shut down. As a result, a service traffic error is reported. Such errors include connection timeouts and business errors.

Possible causes

  • The downstream application is shut down after the call is initiated. As a result, the downstream application does not respond to the call request.

  • The downstream application takes a longer time to shut down because of its complex logic. This results in a delay before the application is deregistered from the registry.

  • The downstream service stops as expected, but the upstream service fails to promptly process the new list of downstream service addresses from the Accreditation Centre. This can be caused by network faults, insufficient resources, or abnormal processing logic.

  • The client in use is of an earlier version and does not remove the IP address list of the downstream application that is shut down at the earliest opportunity due to an invalid mechanism.

Solutions

The best solution is to use the graceful release feature of microservice administration. This feature provides a comprehensive solution for graceful release, start, and shutdown, which eliminates complex troubleshooting and resolution procedures. For more information, see Configure graceful rolling deployment.

If you cannot use the graceful release feature of microservice administration, you can view the Nacos client logs of the upstream service. In the logs, retrieve the name of the downstream service and the keyword current ips, and then compare the time when the downstream service was changed, the time of the Nacos client log message, and the time when the upstream service reported an error.

  • If the three timestamps are closely aligned (meaning the upstream service starts reporting errors after the downstream service change is initiated, but stops after the Nacos client log message appears), this indicates that the program is behaving as expected. You can use the common solution described below. For more information, see Common solution.

  • If the time when the downstream service was changed and the time of the Nacos client log message are closely aligned, but the error from the upstream service persists, this indicates that the Nacos server pushed the address correctly and the Nacos client received it, but the application did not use it. You can perform the following steps to troubleshoot this issue.

    • If you do not use an open source framework, check the application logic to determine whether a cache mechanism is used and whether a cache update fails.

    • If you are using an open source framework, you can seek help from the corresponding community.

  • If the time when the downstream service was changed and the time of the Nacos client log message are closely aligned, but the upstream service is very slow to recover from the error, this indicates that the Nacos server pushed the address correctly and the Nacos client received it, but the application did not use it immediately. You can perform the following steps to troubleshoot this issue.

    1. If you are not using an open source framework, check if your application has a caching mechanism and if there is a latency in the cache update.

    2. Check if you are using auxiliary frameworks such as ribbon, feign, or loadbalance. These frameworks have an address list cache that updates slowly. You can modify the cache refresh configuration for the specific framework.

    3. If you are using an open source framework, you can seek help from the corresponding community.

    4. If the issue persists after you take the preceding steps, use the common solution described below. For more information, see Common solution.

  • If there is a large discrepancy between the time when the downstream service was changed and either the time of the Nacos client log message or the time when the upstream service reported an error, this indicates that the Nacos client did not detect the downstream service change. You can perform the following steps to troubleshoot this issue.

    1. Upgrade the Nacos clients for the upstream and downstream services to version 2.X or later.

    2. Check if the upstream service has issues such as network faults or insufficient resources.

    3. Check for any blocking logic in the downstream service's shutdown process. This logic can prevent the application from responding to requests while it is still registered in the Accreditation Centre.

    4. If the issue persists after you take the preceding steps, use the common solution described below. For more information, see Common solution.

Common solution

  1. Before stopping the downstream service, use the Nacos OpenAPI to update the instance by setting enabled=false, or unpublish the instance in the MSE console. Then, use monitoring and logs to confirm that the downstream service node is no longer receiving requests. For more information, see OpenAPI and Unpublish an application instance.

  2. Stop the downstream service node and execute the change.

  3. After you confirm that the change to the downstream service node is complete and that the node is providing services correctly, use the Nacos OpenAPI to update the instance by setting enabled=true, or publish the instance in the MSE console. For more information, see OpenAPI and Publish an application instance.