Application graceful start - Microservices Engine - Alibaba Cloud Documentation Center

Releases, scale-outs, and restarts are unavoidable for any online application. The graceful start feature in Microservices Engine (MSE) protects applications during these operations. The graceful start feature includes three main capabilities: delayed service registration, low-traffic prefetching, and service readiness probes. This topic describes how to use the graceful start feature in MSE.

Feature overview

Delayed registration

A microservice provider instance registers with a service registry during application startup. After registration, consumer applications can subscribe to and call the provider. For Java applications built on the Spring framework, registration usually occurs after the Spring context is refreshed. If a provider registers before its asynchronous initialization logic is complete, calls from consumers may fail. For example, a big data service may need to pull hundreds of megabytes of data from Object Storage Service (OSS) before it can serve requests. If the application registers immediately after startup, incoming traffic can cause errors because the resources are not yet ready. The delayed registration feature lets you set a delay to postpone service registration. This ensures the application is fully initialized before it registers with the service registry and accepts traffic. This prevents call failures that can occur when consumers call a provider that is not ready.

Low-traffic prefetching

A newly started instance is often in a "cold" state. In this state, the instance needs to perform operations such as lazy loading of connection pools, cache warming, and just-in-time (JIT) compilation of hot spot code. As a result, these new instances can handle far fewer requests than instances that have been running for a while. Without intervention, the system's overall average response time (RT) can increase when a new instance comes online. In severe cases, the service may hang, causing many requests to time out or fail.

The following figure compares the request call durations for an instance that requires resource loading, before and after the resources are loaded. If many requests arrive at the instance during resource loading, they may all be blocked:

Low-traffic prefetching controls the amount of traffic that consumer applications send to a new service instance right after it is published. This prevents the limited request-handling capability of a cold Java application from increasing the overall system RT. It also protects the new instance from being overwhelmed by a surge in traffic. Traffic to the instance gradually increases over time according to a predefined rule. When the configured ramp-up period is over, the prefetching process ends, and the instance receives traffic normally.

Note

Low-traffic prefetching uses traffic from online consumers. Therefore, the consumer applications for the service must also be connected to MSE Microservices Governance. For more information about the principles of this feature, see Principles of low-traffic prefetching.

Service readiness probe

Kubernetes provides a readiness probe mechanism. During a service release, after a new instance passes the readiness probe, the old instance is taken offline. The exact behavior depends on the configured release policy. However, Kubernetes cannot determine when a microservice is truly ready. It often considers an application ready as soon as its port starts listening. This can lead to a situation where a new instance is marked as ready before it has registered with the service registry. Kubernetes then proceeds with the release and takes the old instance offline. This causes consumer calls to fail, with errors such as service no provider/instance.

The service readiness probe feature provides a non-intrusive HTTP endpoint through an agent. This endpoint checks if the application has completed its registration with the service registry. If the application is not registered, the endpoint returns an HTTP 500 status code. After the application is registered, it returns an HTTP 200 status code. You can configure your application's readiness probe to use this endpoint. This helps Kubernetes accurately determine if the application is ready. This ensures that consumers always have an available provider during a service release in Kubernetes and prevents provider-not-found errors.

Use graceful start

Prerequisites

Notes

Graceful start is supported only for instances that use a service registry, such as Nacos, for service registration and discovery. It is not supported for microservice instances that rely on the Kubernetes Service discovery mechanism.
For Spring Cloud applications, low-traffic prefetching is supported only for applications that use Nacos, ZooKeeper, or Eureka as the service registry.
The low-traffic prefetching feature for Spring Cloud is based on the default ZoneAwareLoadBalancer, RoundRobinLoadBalancer, or RandomLoadBalancer load balancers of the Spring Cloud framework. If you modify the application's load balancer configuration, this feature will not work.
Low-traffic prefetching takes effect only when MSE Microservices Governance is enabled for both the provider and the consumer. For example, gateway applications receive external traffic by directly exposing APIs. Therefore, the low-traffic prefetching feature of MSE does not apply to these types of applications.

Procedure

Step 1: Enable graceful start

Log on to the MSE console, and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. On the page that appears, click the resource card of the application that you want to manage.
On the application details page, click Traffic management in the left-side navigation pane, and click the Graceful Start/Shutdown tab.
In the Configuration section, click Modify. Then, turn on the Graceful Start switch and click OK.

Step 2: Configure the Kubernetes service readiness probe

Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, click the target cluster. In the navigation pane on the left, choose Workloads > Stateless. Find the application deployment and click Edit in the Actions column. In the Health Check section, click Enable to the right of Readiness Probe and configure the following parameters. When you are finished, click Update.
- Path: /readiness. (If the agent version of your application is earlier than 4.1.10, set Path to /health. To view the agent version, log on to the MSE console, choose Governance Center > Application Governance, click the target application, and then click Node Details. The agent version is displayed on the right.)
- Port: 55199.
- Initial Delay (s): We recommend that you set this value to be greater than the sum of the application startup time and the Delayed Registration Duration (s) (default: 0s) configured for the graceful start feature. However, the feature still works if you do not follow this recommendation.
- For more information about other parameters, see Create a stateless workload (Deployment). The configuration takes effect after the application restarts. The readiness probe passes only after service registration is complete.

Important

This operation immediately restarts the application. If you are in a production environment, perform this operation during a scheduled release window.

(Optional) Step 3: Configure the delayed registration duration

This step is optional. For more information, see Delayed registration. The procedure is as follows:

Enable the graceful start feature and configure the Kubernetes service readiness probe as described in the previous steps.
In the Configuration section for graceful start and shutdown, click the arrow to the left of the graceful start section to expand the options. Set Delayed Registration Duration (s), and then click OK.

Note

The setting takes effect the next time the application starts.

(Optional) Step 4: Adjust the low-traffic prefetching duration

After you enable graceful start, this feature is enabled automatically. The default ramp-up period is 120 seconds. You can adjust it as needed:

Enable the graceful start feature and configure the Kubernetes service readiness probe as described in the previous steps.
In the Configuration section for graceful start and shutdown, click the arrow to the left of the graceful start section to expand the options. Click Advanced Options, set Low-traffic Prefetching Duration (s), and then click OK.
If the caller of the service is an MSE cloud-native gateway, the prefetching configuration here does not take effect. The solution is to configure prefetching for the service in the MSE cloud-native gateway console. To do this, click the target gateway instance. In the navigation pane on the left, choose Routing Management > Services. Find the service and click More > Policy Configuration in the Actions column. On the Policy Configuration tab, in the Traffic Management > Load Balancing Configuration section, click Edit. Adjust the Prefetch Time setting. Note that the default prefetching queries per second (QPS) curve for a cloud-native gateway is a linear curve. This is slightly different from the quadratic curve provided by MSE Microservices Governance. However, the actual effects are similar.

Note

The setting takes effect the next time the application starts.
The low-traffic prefetching method works on the service consumer side. It calculates weights for each provider instance based on its startup time. It uses a load balancing algorithm to control the traffic to a newly started application, allowing the traffic to gradually increase to the normal level. This helps warm up the new service instance. This also requires the service consumer to be connected to MSE Microservices Governance.
When you first use the low-traffic prefetching feature, we recommend that you use the default ramp-up period. If you notice that the prefetching effect is not ideal and traffic loss occurs, you can adjust this parameter for optimization.
To ensure complete prefetching, see Best practices for low-traffic prefetching.

Observe graceful start

After you complete the configuration, when your application restarts, you can view the online and offline times of the application instance and its QPS curve during that period on the graceful start and shutdown page:

Log on to the MSE console, and select a region in the top navigation bar.
In the left-side navigation pane, choose Microservices Governance > Application Governance. On the page that appears, click the resource card of the application that you want to manage.
On the application details page, click Traffic management in the left-side navigation pane, and click the Graceful Start/Shutdown tab.
In the Start and Shutdown Overview, click an instance in the list on the left. On the right, you can view the QPS changes and related events for the instance during its startup phase.

The overview shows that the service registration, prefetching started, and prefetching ended events occur in sequence. The Kubernetes readiness probe passed event also occurs after the service registration event. The QPS curve shows a gradual increase to its maximum value over the ramp-up period (default: 120s), rather than a sharp spike. If the event sequence or the shape of the QPS curve does not meet your expectations when your application starts, see the FAQ for troubleshooting information.

Note

In the application shown in the example figure, the Kubernetes readiness probe is configured as 55199/readiness, and the minimum ready time (minReadySeconds) is set to 120 seconds, which matches the default prefetching duration.

References

Configure graceful start and shutdown using YAML