KServe, formerly known as KFServing, is a model server and inference engine for cloud-native environments. It supports automatic scaling, scale-to-zero, and canary deployment. This topic describes how to deploy KServe to provide AI services based on Service Mesh (ASM) and Alibaba Cloud Container Service for Kubernetes (ACK).
Prerequisites
- An ASM instance whose version is 1.17.2.7 or later is created. For more information, see Create an ASM instance or Update an ASM instance.
- A cluster is added to the ASM instance. For more information, see Add a cluster to an ASM instance.
- Access to the Istio resources of the ASM instance on the data plane by using the Kubernetes API is enabled. For more information, see Enable the feature that allows Istio resources to be accessed by using the Kubernetes API of clusters.
- Related Knative components are deployed in the ACK cluster and the Knative on ASM feature is enabled. For more information, see Use Knative on ASM to deploy a serverless application.
- An ingress gateway service is deployed.
Background information
As a model server, KServe supports the deployment of machine learning and deep learning models at scale. KServe can be deployed to work in traditional Kubernetes Deployment mode or in serverless mode with the support for scale-to-zero. It provides automatic traffic-based scaling and blue-green or canary deployment for models.

Step 1: Install the KServe component cert-manager
- Log on to the ASM console. In the left-side navigation pane, choose .
- On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose .
- On the KServe on ASM page, click Enable KServe on ASM. KServe depends on the cert-manager component. When you install KServe, the cert-manager component is automatically installed. If you want to use your own cert-manager component, turn off Automatically install the CertManager component in the cluster.
Step 2: Query the IP address of the ASM instance's ingress gateway
- Log on to the ASM console. In the left-side navigation pane, choose .
- On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose .
- On the Ingress Gateway page, view and save the service address of the ASM instance's ingress gateway. The service address is the IP address of the ingress gateway.
Step 3: Create an inference service
In this topic, the scikit-learn training model is used for testing.
- Use kubectl to connect the ACK cluster on the data plane, and run the following command to create a namespace in which KServe resources will be deployed:
kubectl create namespace kserve-test
- Create an inference service named sklearn-iris.
- Run the following command to query whether the sklearn-iris service is successfully created:
kubectl get inferenceservices sklearn-iris -n kserve-test
Expected output:
The output shows that the value ofNAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-00001 3h26m
READY
isTrue
. This indicates that the sklearn-iris service is successfully created. - Optional: View the virtual service and gateway created for the scikit-learn model. After the sklearn-iris service is created, a virtual service and a gateway are automatically created for the scikit-learn model. To view the created virtual service and gateway, perform the following steps:
- Log on to the ASM console. In the left-side navigation pane, choose .
- On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose .
- On the VirtualService page, click the
icon next to Namespace and select kserve-test from the Namespace drop-down list to view the created virtual service in the service list.
- In the left-side navigation pane, choose .
- In the upper part of the Gateway page, select knative-serving from the Namespace drop-down list to view the created gateway in the gateway list.
Step 4: Access the service provided by the scikit-learn model
The following section describes the steps for accessing the service provided by the scikit-learn model in the Linux and Mac operating systems.
- Run the following command to create an input file of the scikit-learn model:
cat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOF
- Test access to the service provided by the scikit-learn model through the ingress gateway.
- Test the performance of the service provided by the scikit-learn model.