All Products
Search
Document Center

Alibaba Cloud Service Mesh:Integrate KServe with ASM to implement inference services based on cloud-native AI models

Last Updated:Jan 17, 2024

KServe (formerly KFServing) is a model server and inference engine in cloud-native environments. It supports features such as automatic scaling, scale-to-zero, and canary deployment. By integrating KServe with Service Mesh (ASM), developers can quickly deploy and manage inference services based on machine learning models in cloud-native applications. This reduces the workload of manual configuration and maintenance and improves development efficiency.

Prerequisites

Introduction to KServe

As a model server, KServe supports the deployment of machine learning and deep learning models at scale. KServe can be deployed to work in traditional Kubernetes Deployment mode or in serverless mode with the support for scale-to-zero. It provides automatic traffic-based scaling and blue-green or canary deployment for models. For more information, see KServe.

KServe

Step 1: Install the KServe component cert-manager

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Ecosystem > KServe on ASM.

  3. On the KServe on ASM page, click Enable KServe on ASM.

    KServe depends on the cert-manager component. When you install KServe, the cert-manager component is automatically installed. If you want to use your own cert-manager component, turn off Automatically install the CertManager component in the cluster.

Step 2: Obtain the IP address of the ASM ingress gateway

  1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

  2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose ASM Gateways > Ingress Gateway.

  3. On the Ingress Gateway page, view and record the Service address of the ASM ingress gateway.

Step 3: Create an inference service

In this example, the scikit-learn training model is used for testing.

  1. Use kubectl to connect to the ACK cluster on the data plane. Then, run the following command to create a namespace for deploying KServe resources:

    kubectl create namespace kserve-test
  2. Create an inference service.

    1. Use the following content to create an isvc.yaml file:

      apiVersion: "serving.kserve.io/v1beta1"
      kind: "InferenceService"
      metadata:
        name: "sklearn-iris"
      spec:
        predictor:
          model:
            modelFormat:
              name: sklearn
            storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
    2. Run the following command to create the sklearn-iris service in the kserve-test namespace:

      kubectl apply -f isvc.yaml -n kserve-test
  3. Run the following command to query whether the sklearn-iris service is successfully created:

    kubectl get inferenceservices sklearn-iris -n kserve-test

    Expected output:

    NAME           URL                                           READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION            AGE
    sklearn-iris   http://sklearn-iris.kserve-test.example.com   True           100                              sklearn-iris-predictor-00001   3h26m

    The value of READY is True, which indicates that the sklearn-iris service is successfully created.

  4. (Optional) View the virtual service and Istio gateway.

    After the sklearn-iris service is created, a virtual service and an Istio gateway are automatically created for the scikit-learn model. To view the virtual service and Istio gateway, perform the following steps:

    1. Log on to the ASM console. In the left-side navigation pane, choose Service Mesh > Mesh Management.

    2. On the Mesh Management page, click the name of the ASM instance. In the left-side navigation pane, choose Traffic Management Center > VirtualService.

    3. On the VirtualService page, click the 刷新 icon next to Namespace and select kserve-test from the Namespace drop-down list to view the created virtual service.

    4. In the left-side navigation pane, choose ASM Gateways > Gateway.

    5. In the upper part of the Gateway page, select knative-serving from the Namespace drop-down list to view the created Istio gateway.

Step 4: Access the service provided by the scikit-learn model

The following section describes the steps for accessing the service provided by the scikit-learn model in the Linux and Mac operating systems.

  1. Run the following command to create an input file for the scikit-learn model:

    cat <<EOF > "./iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
  2. Test access to the service provided by the scikit-learn model through the ingress gateway.

    1. Run the following commands to obtain the value of SERVICE_HOSTNAME:

      SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
      echo $SERVICE_HOSTNAME

      Expected output:

      sklearn-iris.kserve-test.example.com
    2. Run the following commands to access the service. In this step, set ASM_GATEWAY to the IP address of the ingress gateway obtained in Step 2.

      ASM_GATEWAY="XXXX" # Replace XXXX with the IP address of the ASM ingress gateway. 
      curl  -H "Host: ${SERVICE_HOSTNAME}" http://${ASM_GATEWAY}:80/v1/models/sklearn-iris:predict -d @./iris-input.json

      Expected output:

      {"predictions": [1, 1]}
  3. Test the performance of the service provided by the scikit-learn model.

    1. Run the following command to deploy an application for stress testing:

      kubectl create -f https://alibabacloudservicemesh.oss-cn-beijing.aliyuncs.com/kserve/v0.7/loadtest.yaml
    2. Run the following command to query the names of pods:

      kubectl get pod

      Expected output:

      NAME                                                       READY   STATUS      RESTARTS   AGE
      load-testxhwtq-pj9fq                                       0/1     Completed   0          3m24s
      sklearn-iris-predictor-00001-deployment-857f9bb56c-vg8tf   2/2     Running     0          51m
    3. Run the following command to view the test result logs:

      kubectl logs load-testxhwtq-pj9fq # Replace the pod name with the name of the pod on which the application that you deploy for stress testing runs.

      Expected output:

      Requests      [total, rate, throughput]         30000, 500.02, 500.01
      Duration      [total, attack, wait]             59.999s, 59.998s, 1.352ms
      Latencies     [min, mean, 50, 90, 95, 99, max]  1.196ms, 1.463ms, 1.378ms, 1.588ms, 1.746ms, 2.99ms, 18.873ms
      Bytes In      [total, mean]                     690000, 23.00
      Bytes Out     [total, mean]                     2460000, 82.00
      Success       [ratio]                           100.00%
      Status Codes  [code:count]                      200:30000
      Error Set:

References