In Raw Deployment mode, you can implement a canary release for an application based on a gateway. This topic describes how to implement a canary release for an inference service and update the inference service from the version v1 to v2. In this example, the NGINX Ingress controller is used as a gateway.
Prerequisites
The Arena client is installed.
Procedure
In this topic, two inference services of different versions, v1 and v2, are deployed from a model named canary. This topic describes two canary release policies for upgrading the inference service from v1 to v2:
Traffic splitting based on requests
Forward requests whose
foo headers are set to bar
to themodel-v2-svc
Service. Other requests are forwarded to themodel-svc
Service by default.Traffic splitting based on Service weights
Forward 20% of requests to the
model-v2-svc
Service and the remaining requests to themodel-svc
Service.
For more information about how to use the NGINX Ingress controller to implement canary releases, see Use the NGINX Ingress controller to implement canary releases and blue-green releases.
Step 1: Deploy and verify inference services
v1
| v2
|
Step 2: Create an Ingress
Create a file named model-ingress.yaml.
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: model-ingress spec: rules: - host: model.example.com # Replace the value with your hostname. http: paths: # Information about the Service that is created for the inference service of v1. - path: / backend: service: name: model-svc port: number: 80 pathType: ImplementationSpecific
Create an Ingress.
kubectl apply -f model-ingress.yaml
Step 3: Create and verify canary release policies
Scenario 1: Traffic splitting based on client requests
Create a file named gray-release-canary.yaml.
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: gray-release-canary annotations: # Enable the canary release feature. nginx.ingress.kubernetes.io/canary: "true" # Set the request header to foo. nginx.ingress.kubernetes.io/canary-by-header: "foo" # Set the foo header to bar. In this case, requests whose foo header is set to bar are routed to the new service version (model-v2). nginx.ingress.kubernetes.io/canary-by-header-value: "bar" spec: rules: - host: model.example.com http: paths: # Information about the Service that is created for model-v2. - path: / backend: service: name: model-v2-svc port: number: 80 pathType: ImplementationSpecific
Deploy the canary release policy.
kubectl apply -f gray-release-canary.yaml
Check whether the Service of the default version returns responses for requests without specific headers.
# Replace the hostname in the following sample code with the hostname that is specified in the Ingress. curl -H "Host: model.example.com" -H "Content-Type: application/json" \ http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \ -d '{"data": "test"}'
Expected output:
{"id":"4d8c110d-c291-4670-ad0a-1a30bf8e314c","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v1"]}]}%
The output shows that model-v1 returns a response. This indicates that the service can return inference results for the requests that are sent to model-v1 as expected by default. In this case, traffic is forwarded to model-v1.
Run the following command to verify whether client requests whose
foo headers are set to bar
are forwarded to model-v2.curl -H "Host: model.example.com" -H "Content-Type: application/json" \ -H "foo: bar" \ http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \ -d '{"data": "test"}'
Expected output:
{"id":"4d3efc12-c8bd-40f8-898f-7983377db7bd","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v2"]}]}%
The output shows that model-v2 returns a response. This indicates that the service can return inference results for the requests with specific headers that are sent to model-v2 as expected. The canary release policy has taken effect.
Scenario 2: Traffic splitting based on Service weights
Create a file named gray-release-canary.yaml.
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: gray-release-canary annotations: # Enable the canary release feature. nginx.ingress.kubernetes.io/canary: "true" # Forward only 20% of requests to model-v2. # The default total weight is 100. nginx.ingress.kubernetes.io/canary-weight: "20" spec: rules: - host: model.example.com http: paths: # Information about the Service that is created for the inference service of v2. - path: / backend: service: name: model-v2-svc port: number: 80 pathType: ImplementationSpecific
Deploy the canary release policy.
kubectl apply -f gray-release-canary.yaml
Check traffic distribution.
curl -H "Host: model.example.com" -H "Content-Type: application/json" \ http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \ -d '{"data": "test"}'
Run the preceding command multiple times. The results show that about 20% of the requests are forwarded to model-v2, and the remaining are forwarded to the earlier service version (model-v1). This indicates that the canary release policy has taken effect.
Step 4: Switch traffic to the new service version
If model-v2 runs as expected for a period of time, you need to bring model-v2 offline and provide only the new service version for access.
Update the model-svc.yaml file to specify model-v2 as the backend application of the model-svc Service.
apiVersion: v1 kind: Service metadata: name: model-svc spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: serving.kserve.io/inferenceservice: model-v2 # Replace model-v1 with model-v2. type: ClusterIP
Run the following command to redeploy the Service:
kubectl apply -f model-svc.yaml
Test access to the Ingress.
curl -H "Host: model.example.com" -H "Content-Type: application/json" \ http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \ -d '{"data": "test"}'
Expected output:
{"id":"a13f2089-73ce-41e3-989e-e58457d14fed","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v2"]}]}%
Run the preceding command multiple times. The results show that about 100% of the requests are forwarded to model-v2.
Run the following command to delete model-v1 and the relevant resources:
kubectl delete ingress gray-release-canary arena serve delete model-v1 kubectl delete svc model-v2-svc