quickly deploy an inference Service based on KServe - Container Service for Kubernetes

This topic describes how to quickly deploy an inference Service based on KServe.

Prerequisites
Procedure
- Step 1: Deploy an inference Service
- Step 2: Access the Service

Prerequisites

A Container Service for Kubernetes (ACK) managed cluster is created.
Knative is deployed.
Note
When you deploy Knative, you can choose to deploy an Application Load Balancer (ALB), Microservices Engine (MSE), or Kourier gateway.

Procedure

You need to first deploy a predictive inference Service that uses the scikit-learn model trained based on the Iris dataset. The dataset covers three Iris types: Iris Setosa (index 0), Iris Versicolour (index 1), and Iris Virginica (index 2). You can send inference requests to the model to predict the types of Irises.

Note

The Iris dataset contains 50 samples for each type of Irises. Each sample has four features, including the length and width of sepals and the length and width of petals.

Step 1: Deploy an inference Service

Run the following command to deploy an inference Service named sklearn-iris:

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Run the following command to query the status of the Service:

kubectl get inferenceservices sklearn-iris

Expected output:

NAME           URL                                                         READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris   http://sklearn-iris-predictor-default.default.example.com   True           100                              sklearn-iris-predictor-default-00001   51s

Step 2: Access the Service

The IP address and access method of the Service vary based on the gateway that is used.

ALB

Run the following command to query the address of the ALB gateway:

kubectl get albconfig knative-internet

Expected output:

NAME               ALBID                    DNSNAME                                              PORT&PROTOCOL   CERTID   AGE
knative-internet   alb-hvd8nngl0l*******   alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com                               2

Run the following command to add the following JSON code to the ./iris-input.json file to create inference requests:

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Run the following command to access the Service:

INGRESS_DOMAIN=$(kubectl get albconfig knative-internet -o jsonpath='{.status.loadBalancer.dnsname}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_DOMAIN}/v1/models/sklearn-iris:predict" -d @./iris-input.json

Expected output:

*   Trying 120.77.XX.XX...
* TCP_NODELAY set
* Connected to alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com (120.77.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< Date: Thu, 13 Jul 2023 01:48:44 GMT
< Content-Type: application/json
< Content-Length: 21
< Connection: keep-alive
< 
* Connection #0 to host alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com left intact
{"predictions":[1,1]}

{"predictions": [1, 1]} is returned, which indicates that both samples sent to the inference Service match index is 1. This means that the Irises in both samples are Iris Versicolour.

MSE

Run the following command to query the address of the MSE gateway:
```
kubectl -n knative-serving get ing stats-ingress
```
Expected output:
```
NAME            CLASS                  HOSTS   ADDRESS                         PORTS   AGE
stats-ingress   knative-ingressclass   *       192.168.XX.XX,47.107.XX.XX      80      15d
```
47.107.XX.XX in the ADDRESS column is the public IP address of the MSE gateway, which will be used to access the inference Service. The order in which the public and private IP addresses of the MSE gateway are sorted is not fixed. In some cases, the public IP address follows the private IP address, for example, 47.107.XX.XX,192.168.XX.XX.

Run the following command to add the following JSON code to the ./iris-input.json file to create inference requests:

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Run the following command to access the Service:

# The order in which the public and private IP addresses of the MSE gateway are sorted is not fixed. In this example, the public IP address is used to access the inference Service. ingress[1] indicates that the public IP address follows the private IP address and ingress[0] indicates that the private IP address follows the public IP address. Choose one of them based on the actual order of the IP addresses. 
INGRESS_HOST=$(kubectl -n knative-serving get ing stats-ingress -o jsonpath='{.status.loadBalancer.ingress[1].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}/v1/models/sklearn-iris:predict" -d @./iris-input.json

Expected output:

*   Trying 47.107.XX.XX... # 47.107.XX.XX is the public IP address of the MSE gateway. 
* TCP_NODELAY set
* Connected to 47.107.XX.XX (47.107.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< content-length: 21
< content-type: application/json
< date: Tue, 11 Jul 2023 09:56:00 GMT
< server: istio-envoy
< req-cost-time: 5
< req-arrive-time: 1689069360639
< resp-start-time: 1689069360645
< x-envoy-upstream-service-time: 4
< 
* Connection #0 to host 47.107.XX.XX left intact
{"predictions":[1,1]}

{"predictions": [1, 1]} is returned, which indicates that both samples sent to the inference Service match index is 1. This means that the Irises in both samples are Iris Versicolour.

Kourier

Run the following command to query the address of the Kourier gateway:

kubectl -n knative-serving get svc kourier

Expected output:

NAME      TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
kourier   LoadBalancer   192.168.XX.XX   121.40.XX.XX  80:31158/TCP,443:32491/TCP   49m

The IP address of the inference Service is 121.40.XX.XX and the Service ports are HTTP 80 and HTTPS 443.

Run the following command to add the following JSON code to the ./iris-input.json file to create inference requests:

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Run the following command to access the Service:

INGRESS_HOST=$(kubectl -n knative-serving get service kourier -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}/v1/models/sklearn-iris:predict" -d @./iris-input.json

Expected output:

*   Trying 121.40.XX.XX...
* TCP_NODELAY set
* Connected to 121.40.XX.XX (121.40.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< content-length: 21
< content-type: application/json
< date: Wed, 12 Jul 2023 08:23:13 GMT
< server: envoy
< x-envoy-upstream-service-time: 4
< 
* Connection #0 to host 121.40.XX.XX left intact
{"predictions":[1,1]}

{"predictions": [1, 1]} is returned, which indicates that both samples sent to the inference Service match index is 1. This means that the Irises in both samples are Iris Versicolour.

Container Service for Kubernetes:Quickly deploy an inference Service based on KServe

Table of contents

Prerequisites

Procedure

Step 1: Deploy an inference Service

Step 2: Access the Service

ALB

MSE

Kourier