EAS provides release strategies, canary traffic shifting, and version management to safely roll out service changes, validate new versions on production traffic, and quickly roll back when issues arise.
Overview
Release management covers the process of rolling out service changes — new images, parameters, models, and more — to production. It includes three capabilities:
-
Publish changes to production: Control the pace of updates with release strategies (rolling, batched, or paused) to avoid request interruptions. By default, rolling updates gradually replace replicas while graceful shutdown drains in-flight requests. For services with dozens or hundreds of replicas that need staged rollouts with checkpoints, use update plans for finer control.
-
Validate on production traffic: Use canary releases to group a production service and a canary service, split real traffic between them by ratio, and gradually shift all traffic to the canary once it proves healthy.
-
View history and roll back: Every update creates a version snapshot. You can view the deployment history, compare versions, and roll back to any previous version if issues arise.
Publish changes to production
Choose a release strategy based on the risk level of the change:
-
Rolling update (default): The underlying mechanism for all updates. The system gradually creates new replicas and replaces old ones based on
max_surge(replicas allowed above the desired count) andmax_unavailable(maximum unavailable replicas). Old replicas gracefully shut down after draining in-flight requests. This is the default behavior for service restarts and parameter updates. For details, see Rolling updates and graceful shutdown. -
Update plan: Adds batching and pause capabilities on top of rolling updates. Use update plans when the service has dozens or hundreds of replicas and you need to advance in stages, verifying business metrics between batches. Both manual and automatic batching modes are supported, and you can pause, adjust, or roll back at any time. For details, see Update plan.
Use the default rolling strategy for routine updates. For large-scale, high-risk, or staged rollouts, add an update plan.
Validate on production traffic
Canary releases work through service groups: a production service and a canary service share a single access endpoint that distributes traffic by replica count or custom weight. Route a small share of traffic to the canary first, verify that it is healthy, then gradually increase the ratio until the canary handles all traffic. For details, see Canary release.
View history and roll back
In the inference service list, find the target service and click the current version in the Version column to open the version list. The version list supports:
-
View deployment configuration: View the deployment parameters of a version, including the image, resource specifications, environment variables, and startup command.
-
View the creator: Identify who made each change for audit purposes.
-
Compare versions: Select two versions and compare their deployment configurations to pinpoint what changed. This helps assess the impact before a rollback or troubleshoot issues after an update.
-
Roll back to this version: Restore the service to the deployment configuration of a selected version. The button is grayed out for the current version.
Two things to keep in mind about rollbacks:
-
Cross-version compatibility: A rollback requires the target version's image, model files, environment variables, and other resources to still be available. If the image has been deleted or external dependencies have changed, the rollback may fail or cause service issues.
-
Choosing between two rollback methods:
-
If the update has fully completed and you need to switch back to an older version, use the version list rollback. It follows the rolling update strategy and isn't affected by update plans.
-
If the update is still in progress (some replicas have switched but not all) and the service has an update plan enabled, use the update plan rollback instead (manual batching with target replica count set to 0). Already-updated replicas roll back gradually, which is faster.
-