Community Blog The Definition of the New Service Mesh-Driven Scenario: AI Model Services - Model Mesh

The Definition of the New Service Mesh-Driven Scenario: AI Model Services - Model Mesh

This article describes how to use Alibaba Cloud Service Mesh (ASM) and Alibaba Cloud Container Service for Kubernetes (ACK) for deployment.

By Xining Wang

KServe (originally called KFServing) is a model server and inference engine in the cloud-native environment. It supports automatic scaling, zero scaling, and canary deployment. As a model server, KServe provides the foundation for large-scale service machine learning and deep learning models. KServe can be deployed as a traditional Kubernetes deployment or as a Serverless deployment that supports zeroing. It uses Istio and Knative Serving for Serverless deployment, featuring automatic traffic-based scaling together with blue/green and the canary deployment of models.


This article describes how to use Alibaba Cloud Service Mesh (ASM) and Alibaba Cloud Container Service for Kubernetes (ACK) for deployment.


  • Create an Alibaba Cloud Service Mesh (ASM) instance whose Istio version is or later
  • Create a container service for Kubernetes (ACK) cluster
  • Add an ACK cluster to an ASM instance
  • ASM enables KubeAPI access to the data plane.
  • Knative v0.26 is installed in the ACK cluster on the data plane. Please see Serverless Containers and Automatic Scaling Based on Traffic Patterns for more information
  • KServe selects the v0.7 version.

Install KServe Components

1. Install Cert Manager

KServe depends on the Cert Manager component. V1.8.0 or later is recommended. This article uses v1.8.0 as an example. Run the following command to install the instance:

kubectl apply -f  https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/certmanager/v1.8.0/cert-manager.yaml

2. Install KServe

Before running the kserve.yaml command, make sure the apiVersion value of the following resources is changed from cert-manager.io/v1alpha2 to cert-manager.io/v1:

apiVersion: cert-manager.io/v1
kind: Certificate
  name: serving-cert
  namespace: kserve
  commonName: kserve-webhook-server-service.kserve.svc
  - kserve-webhook-server-service.kserve.svc
    kind: Issuer
    name: selfsigned-issuer
  secretName: kserve-webhook-server-cert
apiVersion: cert-manager.io/v1
kind: Issuer
  name: selfsigned-issuer
  namespace: kserve
  selfSigned: {}

Since the file here has corrected the preceding apiVersion, the following command can be directly run to install and deploy the file:

kubectl apply -f https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/kserve/v0.7/kserve.yaml

The execution result is similar to the following:

namespace/kserve created
customresourcedefinition.apiextensions.k8s.io/inferenceservices.serving.kserve.io created
customresourcedefinition.apiextensions.k8s.io/trainedmodels.serving.kserve.io created
role.rbac.authorization.k8s.io/leader-election-role created
clusterrole.rbac.authorization.k8s.io/kserve-manager-role created
clusterrole.rbac.authorization.k8s.io/kserve-proxy-role created
rolebinding.rbac.authorization.k8s.io/leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kserve-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/kserve-proxy-rolebinding created
configmap/inferenceservice-config created
configmap/kserve-config created
secret/kserve-webhook-server-secret created
service/kserve-controller-manager-metrics-service created
service/kserve-controller-manager-service created
service/kserve-webhook-server-service created
statefulset.apps/kserve-controller-manager created
certificate.cert-manager.io/serving-cert created
issuer.cert-manager.io/selfsigned-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io created
validatingwebhookconfiguration.admissionregistration.k8s.io/inferenceservice.serving.kserve.io created
validatingwebhookconfiguration.admissionregistration.k8s.io/trainedmodel.serving.kserve.io created

Create an ASM Gateway

If you have created an ASM gateway, skip this step.

In the ASM console, you can click Create on the UI to create an ASM. Port 80 is reserved for subsequent applications. Please see this link for more information.

Obtain the external IP address by running the following command:

kubectl --namespace istio-system get service istio-ingressgateway

Create the First Inference Service

Use the scikit-learn training model for testing

Create a Namespace

First, create a namespace for deploying KServe resources:

kubectl create namespace kserve-test

Create InferenceService

kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
  name: "sklearn-iris"
      storageUri: "https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/kserve/v0.7/model.joblib"

Check the creation status

Use the Kubeconfig data plane and run the following command to query the installation status of the inferenceservices sklearn-iris:

kubectl get inferenceservices sklearn-iris -n kserve-test

The execution result is similar to the following:

NAME           URL                                           READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
sklearn-iris   http://sklearn-iris.kserve-test.example.com   True           100                              sklearn-iris-predictor-default-00001   7m8s

As the installation completes, the virtual service that corresponds to the model configuration is automatically created simultaneously. The results can be viewed in the Service Mesh ASM console. There are similar results shown below:

In addition, the gateway rule definition corresponding to Knative is shown (Note: It is under the namespace knative-serving). A similar result is shown below:

Access Model Services

Create a Model Input File

cat <<EOF > "./iris-input.json"
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]

Access through ASM Gateway


SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)

The running result is similar to the following:


Use the previously created ASM gateway address to access the preceding sample model service and run the following command:

curl  -H "Host: ${SERVICE_HOSTNAME}" http://${ASM_GATEWAY}:80/v1/models/sklearn-iris:predict -d @./iris-input.json

The running result is similar to the following:

curl  -H "Host: ${SERVICE_HOSTNAME}" http://${ASM_GATEWAY}:80/v1/models/sklearn-iris:predict -d @./iris-input.json
{"predictions": [1, 1]}

Performance Test

Run the following command to test the performance of the model service deployed above:

kubectl create -f https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/kserve/v0.7/loadtest.yaml -n kserve-test


A similar result is obtained:

kubectl logs -n kserve-test load-testchzwx--1-kf29t
Requests      [total, rate, throughput]         30000, 500.02, 500.01
Duration      [total, attack, wait]             59.999s, 59.998s, 1.352ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.196ms, 1.463ms, 1.378ms, 1.588ms, 1.746ms, 2.99ms, 18.873ms
Bytes In      [total, mean]                     690000, 23.00
Bytes Out     [total, mean]                     2460000, 82.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:30000
Error Set:


The ability above is developed out of the demands of our customers. Our customers want to run KServe on top of service mesh technology to implement AI services. KServe runs smoothly on service mesh to implement the blue/green and canary deployment of model services, traffic distribution between revised versions, etc. It supports auto-scaling Serverless inference workload deployment, high scalability, and concurrency-based intelligent load routing.

As the industry's first fully managed Istio-compatible service mesh product, Alibaba Cloud Service Mesh (ASM) has maintained consistency with the community and industry trends (in terms of architecture) from the very beginning. The components of the control plane are hosted on the Alibaba Cloud side and are independent of the user clusters on the data plane. ASM products are customized and implemented based on community Istio. They provide component capabilities to support refined traffic management and security management on the managed control surface side. The managed mode decouples the lifecycle management of Istio components from the managed Kubernetes clusters, making the architecture more flexible and the system more scalable.

0 0 0
Share on

Alibaba Container Service

102 posts | 26 followers

You may also like


Alibaba Container Service

102 posts | 26 followers

Related Products