When you deploy a new model version, routing all traffic to it at once risks downtime if the model has issues. Canary releases let you shift a small percentage of traffic to the new version first, validate its behavior, and then either promote it or roll back -- without affecting most users.
How it works
KServe tracks two revisions during a canary release:
| Revision | KServe term | Role |
|---|---|---|
| Current (stable) | LatestRolledoutRevision | The revision that was last promoted to receive 100% of traffic |
| Canary (new) | LatestReadyRevision | The newest revision that is ready to serve traffic |
The canaryTrafficPercent field in the InferenceService spec controls the percentage of traffic routed to the canary revision. KServe routes the remainder to the current revision automatically. For example, canaryTrafficPercent: 10 sends 10% to the canary and 90% to the current revision.
If a revision is unhealthy or defective, KServe does not route traffic to it.
Prerequisites
Before you begin, make sure that you have:
A working inference service deployed by following Integrate the cloud-native inference service KServe with ASM
Verify initial traffic distribution
After deploying the inference service, confirm that 100% of traffic goes to the initial revision.
Check the sklearn-iris inference service:
kubectl get isvc -n kserve-test sklearn-irisExpected output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-00001 79sThe LATEST column shows 100, which confirms that all traffic goes to the single existing revision (sklearn-iris-predictor-00001).
Configure a canary release
Split traffic between the current and canary revisions by adding the canaryTrafficPercent field to the InferenceService spec.
Apply the following configuration to route 10% of traffic to a new model version:
kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: canaryTrafficPercent: 10 model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2" EOFThis sets
canaryTrafficPercentto10, which tells KServe to send 10% of traffic to the canary revision and 90% to the current revision. ThestorageUripoints to the updated model (model-2).Verify the traffic split:
kubectl get isvc -n kserve-test sklearn-irisExpected output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris.kserve-test.example.com True 90 10 sklearn-iris-predictor-00001 sklearn-iris-predictor-00002 11mPREVshows90(current revision) andLATESTshows10(canary revision).Confirm that both revisions are running:
kubectl get pod -n kserve-testExpected output:
NAME READY STATUS RESTARTS AGE sklearn-iris-predictor-00001-deployment-7965bcc66-grdbq 2/2 Running 0 12m sklearn-iris-predictor-00002-deployment-6744dbbd8c-wfghv 2/2 Running 0 86sTwo pods are running: one for each revision. The pod name suffix identifies the revision --
predictor-00001for the current revision andpredictor-00002for the canary.
Promote the canary to receive all traffic
After you validate the canary revision, promote it by removing the canaryTrafficPercent field and reapplying the InferenceService configuration.
Apply the configuration without
canaryTrafficPercent:kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2" EOFWithout
canaryTrafficPercent, KServe routes 100% of traffic to the latest ready revision.Verify that all traffic goes to the new revision:
kubectl get isvc -n kserve-test sklearn-irisExpected output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-00002 18mAll traffic goes to
sklearn-iris-predictor-00002.
Roll back to the previous revision
If the canary revision has issues, set canaryTrafficPercent to 0 to shift all traffic back to the previous stable revision.
Set
canaryTrafficPercentto0:kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: canaryTrafficPercent: 0 model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2" EOFVerify the rollback:
kubectl get isvc -n kserve-test sklearn-irisExpected output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 0 sklearn-iris-predictor-00001 sklearn-iris-predictor-00002 22mPREVshows100-- all traffic goes to the previous revision (sklearn-iris-predictor-00001).
Route traffic to a specific revision by tag
Tag-based routing lets you send requests directly to a specific revision by URL, independent of the percentage-based traffic split. Use this to test a canary revision in isolation before routing live traffic to it.
Enable tag-based routing by adding the
serving.kserve.io/enable-tag-routingannotation:kubectl apply -n kserve-test -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" annotations: serving.kserve.io/enable-tag-routing: "true" spec: predictor: canaryTrafficPercent: 10 model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2" EOFCheck the tagged URLs in the InferenceService status:
kubectl get isvc -n kserve-test sklearn-iris -oyamlThe
status.components.predictor.trafficsection shows two tagged entries:traffic: - latestRevision: true percent: 10 revisionName: sklearn-iris-predictor-00003 tag: latest url: http://latest-sklearn-iris-predictor.kserve-test.example.com - latestRevision: false percent: 90 revisionName: sklearn-iris-predictor-00001 tag: prev url: http://prev-sklearn-iris-predictor.kserve-test.example.comEach revision gets a dedicated URL with a prefix that matches its tag:
Revision Tag URL Canary latesthttp://latest-sklearn-iris-predictor.kserve-test.example.comCurrent prevhttp://prev-sklearn-iris-predictor.kserve-test.example.comSend a request to a specific revision by setting the
Hostheader to its tagged URL:curl -v \ -H "Host: latest-sklearn-iris-predictor.kserve-test.example.com" \ http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict \ -d @./iris-input.jsonVariable Description ${INGRESS_HOST}Hostname or IP address of the ASM ingress gateway ${INGRESS_PORT}Port of the ASM ingress gateway To route the request to the current revision instead, change the
Hostheader toprev-sklearn-iris-predictor.kserve-test.example.com.