All Products
Search
Document Center

Container Service for Kubernetes:Implement a canary release for an inference service based on the NGINX Ingress controller

Last Updated:May 07, 2025

In Raw Deployment mode, you can implement a canary release for an application based on a gateway. This topic describes how to implement a canary release for an inference service and update the inference service from the version v1 to v2. In this example, the NGINX Ingress controller is used as a gateway.

Prerequisites

Procedure

In this topic, two inference services of different versions, v1 and v2, are deployed from a model named canary. This topic describes two canary release policies for upgrading the inference service from v1 to v2:

  • Traffic splitting based on requests

    Forward requests whose foo headers are set to bar to the model-v2-svc Service. Other requests are forwarded to the model-svc Service by default.

  • Traffic splitting based on Service weights

    Forward 20% of requests to the model-v2-svc Service and the remaining requests to the model-svc Service.

For more information about how to use the NGINX Ingress controller to implement canary releases, see Use the NGINX Ingress controller to implement canary releases and blue-green releases.

Step 1: Deploy and verify inference services

v1

  1. Deploy an inference service of v1 from the canary model.

    arena serve kserve \
        --name=model-v1 \
        --image=kube-ai-registry.cn-shanghai.cr.aliyuncs.com/ai-sample/kserve-canary:1.0.0 \
        --cpu=1 \
        --memory=2Gi \
        "python app.py --model_name=canary"
  2. Create a Service for the inference service of v1.

    1. Create a file named model-svc.yaml

      apiVersion: v1
      kind: Service
      metadata:
        name: model-svc
      spec:
        ports:
        - port: 80
          protocol: TCP
          targetPort: 8080
        selector:
          serving.kserve.io/inferenceservice: model-v1
        type: ClusterIP
    2. Create the Service.

      kubectl apply -f model-svc.yaml
  3. Access the inference service named model-v1 by using an NGINX Ingress to verify whether model-v1 is correctly deployed.

    curl -H "Host: $(kubectl get inferenceservice model-v1 -o jsonpath='{.status.url}' | cut -d "/" -f 3)" \
         -H "Content-Type: application/json" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

v2

  1. Deploy an inference service of v2 from the canary model.

    arena serve kserve \
        --name=model-v2 \
        --image=kube-ai-registry.cn-shanghai.cr.aliyuncs.com/ai-sample/kserve-canary:1.0.0 \
        --cpu=1 \
        --memory=2Gi \
        "python app-v2.py --model_name=canary"
  2. Create a Service for the inference service of v2.

    1. Create a file named model-v2-svc.yaml.

      apiVersion: v1
      kind: Service
      metadata:
        name: model-v2-svc
      spec:
        ports:
        - port: 80
          protocol: TCP
          targetPort: 8080
        selector:
          serving.kserve.io/inferenceservice: model-v2
        type: ClusterIP
    2. Create the Service.

      kubectl apply -f model-v2-svc.yaml
  3. Access the inference service named model-v2 by using an NGINX Ingress to verify whether model-v2 is correctly deployed.

    curl -H "Host: $(kubectl get inferenceservice model-v2 -o jsonpath='{.status.url}' | cut -d "/" -f 3)" \
         -H "Content-Type: application/json" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

Step 2: Create an Ingress

  1. Create a file named model-ingress.yaml.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: model-ingress
    spec:
      rules:
      - host: model.example.com # Replace the value with your hostname. 
        http:
          paths:
          # Information about the Service that is created for the inference service of v1. 
          - path: /
            backend:
              service: 
                name: model-svc
                port:
                  number: 80
            pathType: ImplementationSpecific
  2. Create an Ingress.

    kubectl apply -f model-ingress.yaml

Step 3: Create and verify canary release policies

Scenario 1: Traffic splitting based on client requests

  1. Create a file named gray-release-canary.yaml.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: gray-release-canary
      annotations:
        # Enable the canary release feature. 
        nginx.ingress.kubernetes.io/canary: "true"
        # Set the request header to foo. 
        nginx.ingress.kubernetes.io/canary-by-header: "foo"
        # Set the foo header to bar. In this case, requests whose foo header is set to bar are routed to the new service version (model-v2). 
        nginx.ingress.kubernetes.io/canary-by-header-value: "bar"
    spec:
      rules:
      - host: model.example.com
        http:
          paths:
          # Information about the Service that is created for model-v2. 
          - path: /
            backend:
              service: 
                name: model-v2-svc
                port:
                  number: 80
            pathType: ImplementationSpecific
  2. Deploy the canary release policy.

    kubectl apply -f gray-release-canary.yaml
  3. Check whether the Service of the default version returns responses for requests without specific headers.

    # Replace the hostname in the following sample code with the hostname that is specified in the Ingress. 
    curl -H "Host: model.example.com" -H "Content-Type: application/json" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

    Expected output:

    {"id":"4d8c110d-c291-4670-ad0a-1a30bf8e314c","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v1"]}]}%  

    The output shows that model-v1 returns a response. This indicates that the service can return inference results for the requests that are sent to model-v1 as expected by default. In this case, traffic is forwarded to model-v1.

  4. Run the following command to verify whether client requests whose foo headers are set to bar are forwarded to model-v2.

    curl -H "Host: model.example.com" -H "Content-Type: application/json" \
         -H "foo: bar" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

    Expected output:

    {"id":"4d3efc12-c8bd-40f8-898f-7983377db7bd","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v2"]}]}%   

    The output shows that model-v2 returns a response. This indicates that the service can return inference results for the requests with specific headers that are sent to model-v2 as expected. The canary release policy has taken effect.

Scenario 2: Traffic splitting based on Service weights

  1. Create a file named gray-release-canary.yaml.

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: gray-release-canary
      annotations:
        # Enable the canary release feature. 
        nginx.ingress.kubernetes.io/canary: "true"
        # Forward only 20% of requests to model-v2. 
        # The default total weight is 100. 
        nginx.ingress.kubernetes.io/canary-weight: "20"
    spec:
      rules:
      - host: model.example.com
        http:
          paths:
          # Information about the Service that is created for the inference service of v2. 
          - path: /
            backend:
              service: 
                name: model-v2-svc
                port:
                  number: 80
            pathType: ImplementationSpecific
  2. Deploy the canary release policy.

    kubectl apply -f gray-release-canary.yaml
  3. Check traffic distribution.

    curl -H "Host: model.example.com" -H "Content-Type: application/json" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

    Run the preceding command multiple times. The results show that about 20% of the requests are forwarded to model-v2, and the remaining are forwarded to the earlier service version (model-v1). This indicates that the canary release policy has taken effect.

Step 4: Switch traffic to the new service version

If model-v2 runs as expected for a period of time, you need to bring model-v2 offline and provide only the new service version for access.

  1. Update the model-svc.yaml file to specify model-v2 as the backend application of the model-svc Service.

    apiVersion: v1
    kind: Service
    metadata:
      name: model-svc
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 8080
      selector:
        serving.kserve.io/inferenceservice: model-v2 # Replace model-v1 with model-v2. 
      type: ClusterIP
  2. Run the following command to redeploy the Service:

    kubectl apply -f model-svc.yaml 
  3. Test access to the Ingress.

    curl -H "Host: model.example.com" -H "Content-Type: application/json" \
         http://$(kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'):80/v1/models/canary:predict -X POST \
         -d '{"data": "test"}'

    Expected output:

    {"id":"a13f2089-73ce-41e3-989e-e58457d14fed","model_name":"canary","model_version":null,"outputs":[{"name":"output-0","shape":[1,1],"datatype":"STR","data":["model-v2"]}]}%  

    Run the preceding command multiple times. The results show that about 100% of the requests are forwarded to model-v2.

  4. Run the following command to delete model-v1 and the relevant resources:

    kubectl delete ingress gray-release-canary
    arena serve delete model-v1
    kubectl delete svc model-v2-svc