Deploy sklearn Models on KServe Serverless InferenceService - Container Compute Service

Use KServe on ACK Serverless Knative to deploy AI models as serverless inference services with automatic scaling, multi-version management, and phased releases.

Prerequisites

Before you begin, ensure that you have:

The KServe component deployed. For more information, see Deploy the KServe component.

Step 1: Deploy the InferenceService

Because the model is deployed as a KServe InferenceService rather than a plain Kubernetes Service, you only need to provide a storageUri — auto-scaling, traffic management, and versioning are handled automatically.

The following example deploys a scikit-learn model trained on the iris dataset. The model accepts four-feature inputs (sepal length, sepal width, petal length, petal width) and predicts one of three iris classes: Iris Setosa (index 0), Iris Versicolour (index 1), or Iris Virginica (index 2).

Note

The iris dataset contains 50 samples per class, each with four measurements.

Create an InferenceService named sklearn-iris.

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Verify that the service is ready.

kubectl get inferenceservices sklearn-iris

The expected output is similar to:

NAME           URL                                                         READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris   http://sklearn-iris-predictor-default.default.example.com   True           100                              sklearn-iris-predictor-default-00001   51s

The service is ready when the READY column shows True.

Step 2: Send an inference request

The access method depends on your ingress gateway. Select the section that matches your setup.

Application Load Balancer (ALB)

Get the ALB endpoint.

kubectl get albconfig knative-internet

The expected output is similar to:

NAME               ALBID                    DNSNAME                                              PORT&PROTOCOL   CERTID   AGE
knative-internet   alb-hvd8nngl0l*******   alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com                               2

Create the input file for the inference request.

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Send the inference request.

INGRESS_DOMAIN=$(kubectl get albconfig knative-internet -o jsonpath='{.status.loadBalancer.dnsname}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_DOMAIN}/v1/models/sklearn-iris:predict" -d @./iris-input.json

The expected output is similar to:

*   Trying 120.77.XX.XX...
* TCP_NODELAY set
* Connected to alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com (120.77.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< Date: Thu, 13 Jul 2023 01:48:44 GMT
< Content-Type: application/json
< Content-Length: 21
< Connection: keep-alive
<
* Connection #0 to host alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com left intact
{"predictions":[1,1]}

Both data points are predicted as index 1, which corresponds to Iris Versicolour.

Kourier

Get the Kourier service endpoint.

kubectl -n knative-serving get svc kourier

The expected output is similar to:

NAME      TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
kourier   LoadBalancer   192.168.XX.XX   121.40.XX.XX     80:31158/TCP,443:32491/TCP   49m

The external IP (121.40.XX.XX) is the access address. The service listens on port 80 (HTTP) and 443 (HTTPS).

Create the input file for the inference request.

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Send the inference request.

INGRESS_HOST=$(kubectl -n knative-serving get service kourier -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}/v1/models/sklearn-iris:predict" -d @./iris-input.json

The expected output is similar to:

*   Trying 121.40.XX.XX...
* TCP_NODELAY set
* Connected to 121.40.XX.XX (121.40.XX.XX) port 80 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris-predictor-default.default.example.com
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 76 out of 76 bytes
< HTTP/1.1 200 OK
< content-length: 21
< content-type: application/json
< date: Wed, 12 Jul 2023 08:23:13 GMT
< server: envoy
< x-envoy-upstream-service-time: 4
<
* Connection #0 to host 121.40.XX.XX left intact
{"predictions":[1,1]}

Both data points are predicted as index 1, which corresponds to Iris Versicolour.