You can use KServe on to deploy AI models as serverless inference services. This provides key features like auto-scaling, multi-version management, and canary releases.
Prerequisites
The KServe component must be deployed. For more information, see Deploy the KServe component.
Step 1: Deploy the InferenceService
First, deploy an InferenceService with prediction capabilities. This service uses a scikit-learn model trained on the iris dataset. The dataset has three output classes: Iris Setosa (index: 0), Iris Versicolour (index: 1), and Iris Virginica (index: 2). You can then send inference requests to the deployed model to predict the corresponding iris plant class.
The iris dataset consists of 50 data points for each of three types of iris flowers. Each sample contains four features: the length and width of its sepals and petals.
Run the following command to create an InferenceService named sklearn-iris.
kubectl apply -f - <<EOF apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: model: modelFormat: name: sklearn storageUri: "gs://kfserving-examples/models/sklearn/1.0/model" EOFRun the following command to check the service status.
kubectl get inferenceservices sklearn-irisExpected output:
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris-predictor-default.default.example.com True 100 sklearn-iris-predictor-default-00001 51s
Step 2: Access the service
The IP address and access method vary depending on the service gateway. Select the appropriate method for your gateway.
ALB
Run the following command to retrieve the Application Load Balancer (ALB) endpoint.
kubectl get albconfig knative-internetExpected output:
NAME ALBID DNSNAME PORT&PROTOCOL CERTID AGE knative-internet alb-hvd8nngl0l******* alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com 2Run the following command to write the following JSON content to the
./iris-input.jsonfile to prepare the input for the inference request.cat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOFRun the following command to access the service.
INGRESS_DOMAIN=$(kubectl get albconfig knative-internet -o jsonpath='{.status.loadBalancer.dnsname}') SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3) curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_DOMAIN}/v1/models/sklearn-iris:predict" -d @./iris-input.jsonExpected output:
* Trying 120.77.XX.XX... * TCP_NODELAY set * Connected to alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com (120.77.XX.XX) port 80 (#0) > POST /v1/models/sklearn-iris:predict HTTP/1.1 > Host: sklearn-iris-predictor-default.default.example.com > User-Agent: curl/7.58.0 > Accept: */* > Content-Length: 76 > Content-Type: application/x-www-form-urlencoded > * upload completely sent off: 76 out of 76 bytes < HTTP/1.1 200 OK < Date: Thu, 13 Jul 2023 01:48:44 GMT < Content-Type: application/json < Content-Length: 21 < Connection: keep-alive < * Connection #0 to host alb-hvd8nngl0l******.cn-<region>.alb.aliyuncs.com left intact {"predictions":[1,1]}The output is
{"predictions": [1, 1]}. This indicates that for the two data points sent for inference, the model predicts that both flowers are Iris Versicolour (index 1).
Kourier
Run the following command to retrieve the Kourier service endpoint.
kubectl -n knative-serving get svc kourierExpected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kourier LoadBalancer 192.168.XX.XX 121.40.XX.XX 80:31158/TCP,443:32491/TCP 49mThe access IP address for the service is
121.40.XX.XX, and the ports are 80 (HTTP) and 443 (HTTPS).Run the following command to write the following JSON content to the
./iris-input.jsonfile to prepare the input for the inference request.cat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOFRun the following command to access the service.
INGRESS_HOST=$(kubectl -n knative-serving get service kourier -o jsonpath='{.status.loadBalancer.ingress[0].ip}') SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3) curl -v -H "Host: ${SERVICE_HOSTNAME}" "http://${INGRESS_HOST}/v1/models/sklearn-iris:predict" -d @./iris-input.jsonExpected output:
* Trying 121.40.XX.XX... * TCP_NODELAY set * Connected to 121.40.XX.XX (121.40.XX.XX) port 80 (#0) > POST /v1/models/sklearn-iris:predict HTTP/1.1 > Host: sklearn-iris-predictor-default.default.example.com > User-Agent: curl/7.58.0 > Accept: */* > Content-Length: 76 > Content-Type: application/x-www-form-urlencoded > * upload completely sent off: 76 out of 76 bytes < HTTP/1.1 200 OK < content-length: 21 < content-type: application/json < date: Wed, 12 Jul 2023 08:23:13 GMT < server: envoy < x-envoy-upstream-service-time: 4 < * Connection #0 to host 121.40.XX.XX left intact {"predictions":[1,1]}The output is
{"predictions": [1, 1]}. This indicates that for the two data points sent for inference, the model predicts that both flowers are Iris Versicolour (index 1).