EAS service rolling updates and graceful shutdown - Platform For AI

Rolling updates replace old service replicas with new ones in batches during restarts or configuration changes, so your service stays available throughout the update. Two parameters control update speed and safety; two more control how replicas handle in-flight requests during termination.

How it works

When you trigger an update, EAS replaces replicas in batches rather than all at once:

EAS creates a batch of new replicas based on rolling_strategy.max_surge.
Traffic continues routing to the old replicas while new replicas start up.
Once new replicas pass health checks, EAS stops a batch of old replicas based on rolling_strategy.max_unavailable.
EAS repeats steps 1–3 until all replicas are replaced.

If a new replica fails to start, the update pauses. Failed replicas receive no traffic, and the remaining old replicas continue serving requests. You can roll back or start a new update. A new update automatically cleans up failed replicas from the previous incomplete update.

Rolling update parameters

`rolling_strategy.max_surge`

Label in console: Exceeds the expected number of replicas

Maximum number of extra replicas allowed to run during the update. Set as an integer or percentage.

Default: 2% of total replicas, minimum 1.

Example: Setting this to 20 for a 100-replica service creates 20 new replicas simultaneously in the first batch.

Tradeoff: Higher values speed up updates but increase resource usage. Setting this too high causes many new replicas to launch at once, which immediately replaces old replicas. Without prewarming, sudden traffic shifts may destabilize the service.

`rolling_strategy.max_unavailable`

Label in console: Maximum unavailable replicas

Maximum number of replicas that can be unavailable during the update. Setting this above zero lets EAS free resources from old replicas before creating new ones, preventing resource starvation.

Default:

Resource group	Before September 1, 2025	After September 1, 2025
Dedicated resource group	1	0 (with elastic pool) / 1 (without elastic pool)
Public resource group	0	0
Lingjun Intelligent Computing Quota	0	2% of replicas, minimum 1

Example: Setting this to N stops N old replicas immediately when the update starts.

Usage notes:

For single-replica services with this value set to 1, the old replica stops before the new one starts, causing brief unavailability.
Setting this too high terminates too many replicas at once and may overwhelm the remaining replicas.

Graceful shutdown parameters

Graceful shutdown parameters control how replicas handle in-flight requests during termination.

`eas.termination_grace_period`

Label in console: Graceful shutdown time

Grace period in seconds before a replica is forcibly terminated. When a replica enters the Terminating state, EAS stops routing traffic to it and waits this duration for pending requests to complete. When the grace period expires, EAS forcibly terminates the replica regardless of pending requests.

Default: 30 seconds.

Increase this value for long-running requests. Setting it too low drops in-flight requests; setting it too high slows down updates unnecessarily. Only change this value when the default does not fit your workload.

`rpc.enable_sigterm`

Label in console: Send SIGTERM

Controls whether EAS sends a SIGTERM signal to the replica process at the start of the grace period.

Default: false.

Value	Behavior
`false`	EAS waits for the grace period, then forcibly terminates the replica. No SIGTERM is sent.
`true`	EAS sends SIGTERM immediately when the replica enters the Terminating state. Your application must handle SIGTERM, drain in-flight requests, and exit cleanly before the grace period expires. If the process is still running when the grace period ends, EAS forcibly terminates it.

SIGTERM is disabled by default because most application containers do not implement SIGTERM handlers. An unhandled SIGTERM causes immediate process termination, bypassing any cleanup logic.

When to enable SIGTERM

Enable SIGTERM when request durations vary widely—for example, from a few seconds to 30 minutes. A fixed grace period cannot efficiently accommodate both short and long requests. With SIGTERM enabled:

EAS sends SIGTERM when the replica enters the Terminating state.
Your custom shutdown logic runs—drain in-flight requests and release resources.
The process exits cleanly.
If the process has not exited when eas.termination_grace_period expires, EAS forcibly terminates it.

Make sure your shutdown logic completes within the eas.termination_grace_period window. If your shutdown logic needs more time, increase eas.termination_grace_period accordingly.

Note

Asynchronous inference services do not require SIGTERM. The EAS control plane automatically handles shutdown by rejecting new requests and waiting for existing requests to complete before terminating replicas.