Integrate KServe with ASM to implement inference services based on cloud-native AI models - Alibaba Cloud Service Mesh

KServe (formerly KFServing) is a model server and inference engine in cloud-native environments. It supports features such as automatic scaling, scale-to-zero, and canary deployment. By integrating KServe with Service Mesh (ASM), developers can quickly deploy and manage inference services based on machine learning models in cloud-native applications. This reduces the workload of manual configuration and maintenance and improves development efficiency.

Prerequisites

A Container Service for Kubernetes (ACK) cluster is added to an ASM instance of v1.17.2.7 or later. For more information, see Add a cluster to an ASM instance.
The feature that allows Istio resources to be accessed by using the Kubernetes API of clusters is enabled. For more information, see Enable the feature that allows Istio resources to be accessed by using the Kubernetes API of clusters.
Knative components are deployed in the ACK cluster and the Knative on ASM feature is enabled. For more information, see Step 1 in Use Knative on ASM to deploy a serverless application.
An ingress gateway is deployed.

Introduction to KServe

As a model server, KServe supports the deployment of machine learning and deep learning models at scale. KServe can be deployed to work in traditional Kubernetes Deployment mode or in serverless mode with the support for scale-to-zero. It provides automatic traffic-based scaling and blue-green or canary deployment for models. For more information, see KServe.

KServe

Step 1: Install the KServe component cert-manager

Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Ecosystem > KServe on ASM.
On the KServe on ASM page, click Enable KServe on ASM.
KServe depends on the cert-manager component. When you install KServe, the cert-manager component is automatically installed. If you want to use your own cert-manager component, turn off Automatically install the CertManager component in the cluster.

Step 2: Obtain the IP address of the ASM ingress gateway

Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose ASM Gateways > Ingress Gateway.
On the Ingress Gateway page, view and record the Service address of the ASM ingress gateway.

Step 3: Create an inference service

In this example, the scikit-learn training model is used for testing.

Use kubectl to connect to the ACK cluster on the data plane. Then, run the following command to create a namespace for deploying KServe resources:
```
kubectl create namespace kserve-test
```

Create an inference service.

Use the following content to create an isvc.yaml file:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

Run the following command to create the sklearn-iris service in the kserve-test namespace:
```
kubectl apply -f isvc.yaml -n kserve-test
```

Run the following command to query whether the sklearn-iris service is successfully created:

kubectl get inferenceservices sklearn-iris -n kserve-test

Expected output:

NAME           URL                                           READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION            AGE
sklearn-iris   http://sklearn-iris.kserve-test.example.com   True           100                              sklearn-iris-predictor-00001   3h26m

The value of READY is True, which indicates that the sklearn-iris service is successfully created.

(Optional) View the virtual service and Istio gateway.
After the sklearn-iris service is created, a virtual service and an Istio gateway are automatically created for the scikit-learn model. To view the virtual service and Istio gateway, perform the following steps:
1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.
2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > VirtualService.
3. On the VirtualService page, click the icon next to Namespace and select kserve-test from the Namespace drop-down list to view the created virtual service.
4. In the left-side navigation pane, choose ASM Gateways > Gateway.
5. In the upper part of the Gateway page, select knative-serving from the Namespace drop-down list to view the created Istio gateway.

Step 4: Access the service provided by the scikit-learn model

The following section describes the steps for accessing the service provided by the scikit-learn model in the Linux and Mac operating systems.

Run the following command to create an input file for the scikit-learn model:

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Test access to the service provided by the scikit-learn model through the ingress gateway.

Run the following commands to obtain the value of SERVICE_HOSTNAME:

SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
echo $SERVICE_HOSTNAME

Expected output:

sklearn-iris.kserve-test.example.com

Run the following commands to access the service. In this step, set ASM_GATEWAY to the IP address of the ingress gateway obtained in Step 2.

ASM_GATEWAY="XXXX" # Replace XXXX with the IP address of the ASM ingress gateway. 
curl  -H "Host: ${SERVICE_HOSTNAME}" http://${ASM_GATEWAY}:80/v1/models/sklearn-iris:predict -d @./iris-input.json

Expected output:

{"predictions": [1, 1]}

Test the performance of the service provided by the scikit-learn model.

Run the following command to deploy an application for stress testing:

kubectl create -f https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/kserve/v0.7/loadtest.yaml

Run the following command to query the names of pods:

kubectl get pod

Expected output:

NAME                                                       READY   STATUS      RESTARTS   AGE
load-testxhwtq-pj9fq                                       0/1     Completed   0          3m24s
sklearn-iris-predictor-00001-deployment-857f9bb56c-vg8tf   2/2     Running     0          51m

Run the following command to view the test result logs:

kubectl logs load-testxhwtq-pj9fq # Replace the pod name with the name of the pod on which the application that you deploy for stress testing runs.

Expected output:

Requests      [total, rate, throughput]         30000, 500.02, 500.01
Duration      [total, attack, wait]             59.999s, 59.998s, 1.352ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.196ms, 1.463ms, 1.378ms, 1.588ms, 1.746ms, 2.99ms, 18.873ms
Bytes In      [total, mean]                     690000, 23.00
Bytes Out     [total, mean]                     2460000, 82.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:30000
Error Set:

References

If you need to accelerate model loading of data-intensive applications, such as big data applications and AI applications, in cloud-native scenarios, see Integrate the KServe on ASM feature with Fluid to implement AI Serving that accelerates data access.
If you need to transform the raw input data into the format required by the model server, see Use InferenceService to deploy a transformer.
If you deploy model services for large-capacity, high-density, and frequently changing use cases, you can use Model Service Mesh to manage, deploy, and schedule multiple model services. For more information, see Model Service Mesh.