Quickly deploy an InferenceService using KServe - Container Compute Service

You can use KServe on to deploy AI models as serverless inference services. This provides key features like auto-scaling, multi-version management, and canary releases.

Prerequisites

The KServe component must be deployed. For more information, see Deploy the KServe component.

Step 1: Deploy the InferenceService

First, deploy an InferenceService with prediction capabilities. This service uses a scikit-learn model trained on the iris dataset. The dataset has three output classes: Iris Setosa (index: 0), Iris Versicolour (index: 1), and Iris Virginica (index: 2). You can then send inference requests to the deployed model to predict the corresponding iris plant class.

Note

The iris dataset consists of 50 data points for each of three types of iris flowers. Each sample contains four features: the length and width of its sepals and petals.

Run the following command to create an InferenceService named sklearn-iris.

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Run the following command to check the service status.

kubectl get inferenceservices sklearn-iris

Expected output:

NAME           URL                                                         READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris   http://sklearn-iris-predictor-default.default.example.com   True           100                              sklearn-iris-predictor-default-00001   51s

Step 2: Access the service

The IP address and access method vary depending on the service gateway. Select the appropriate method for your gateway.

ALB

Run the following command to retrieve the Application Load Balancer (ALB) endpoint.

kubectl get albconfig knative-internet

Expected output:

NAME               ALBID                    DNSNAME                                              PORT&PROTOCOL   CERTID   AGE
knative-internet   alb-hvd8nngl0l*******   alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com                               2

Run the following command to write the following JSON content to the ./iris-input.json file to prepare the input for the inference request.
```
cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF
```

Run the following command to access the service.

INGRESS_DOMAIN=$(kubectl get albconfig knative-internet -o jsonpath='{.status.loadBalancer.dnsname}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_DOMAIN}/v1/models/sklearn-iris:predict" -d @./iris-input.json

Expected output:

*   Trying 120.77.XX.XX...
* TCP_NODELAY set
* Connected to alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com (120.77.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< Date: Thu, 13 Jul 2023 01:48:44 GMT
< Content-Type: application/json
< Content-Length: 21
< Connection: keep-alive
< 
* Connection #0 to host alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com left intact
{"predictions":[1,1]}

The output is {"predictions": [1, 1]}. This indicates that for the two data points sent for inference, the model predicts that both flowers are Iris Versicolour (index 1).

Kourier

Run the following command to retrieve the Kourier service endpoint.

kubectl -n knative-serving get svc kourier

Expected output:

NAME      TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
kourier   LoadBalancer   192.168.XX.XX   121.40.XX.XX  80:31158/TCP,443:32491/TCP   49m

The access IP address for the service is 121.40.XX.XX, and the ports are 80 (HTTP) and 443 (HTTPS).

Run the following command to write the following JSON content to the ./iris-input.json file to prepare the input for the inference request.
```
cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF
```

Run the following command to access the service.

INGRESS_HOST=$(kubectl -n knative-serving get service kourier -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}/v1/models/sklearn-iris:predict" -d @./iris-input.json

Expected output:

*   Trying 121.40.XX.XX...
* TCP_NODELAY set
* Connected to 121.40.XX.XX (121.40.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< content-length: 21
< content-type: application/json
< date: Wed, 12 Jul 2023 08:23:13 GMT
< server: envoy
< x-envoy-upstream-service-time: 4
< 
* Connection #0 to host 121.40.XX.XX left intact
{"predictions":[1,1]}

The output is {"predictions": [1, 1]}. This indicates that for the two data points sent for inference, the model predicts that both flowers are Iris Versicolour (index 1).