All Products
Search
Document Center

Alibaba Cloud Service Mesh:Use InferenceService to deploy a transformer

Last Updated:Jan 12, 2024

Transformer is an InferenceService component that is used for pre-processing, post-processing, and model inference. InferenceService communicates with a transformer by using the REST protocol. A transformer can easily transform raw input data into the format required by the model server so that end-to-end data processing and model inference can be implemented.

Prerequisites

You can run basic inference services as expected in your environment. For more information, see Integrate the cloud-native inference service KServe with ASM.

Note

Different KServe versions may require different input data formats. In this example, KServe 0.10 is used. For more information, see Deploy Transformer with InferenceService.

Step 1: Create a transformer Docker image

  • Method 1: Under the kserve/python directory of KServe on GitHub, create a transformer Docker image by using Dockerfile.

    cd python
    docker build -t $DOCKER_USER/image-transformer:latest -f custom_transformer.Dockerfile .
    
    docker push {username}/image-transformer:latest
  • Method 2: Use an existing image.

     asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10

Step 2: Use REST predictor to deploy InferenceService

By default, InferenceService uses TorchServe to serve the PyTorch models, and the models are loaded from a model repository. In this example, the model repository has a MNIST model.

  1. Create a transformer-new.yaml file that contains the following content:

    apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
      name: torch-transformer
    spec:
      predictor:
        model:
          modelFormat:
            name: pytorch
          storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
      transformer:
        containers:
          - image: asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10
            name: kserve-container
            command:
              - "python"
              - "-m"
              - "model"
            args:
              - --model_name
              - mnist
  2. Run the following command to deploy InferenceService:

    kubectl apply -f transformer-new.yaml

Step 3: Run a prediction

  1. Verify the request input payload.

    Encode the content of the following image to Base64 and save it as the following input.json file.

    image.png

    {
        "instances":[
           {
              "image":{
                "b64": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
              }
           }
        ]
    }
  2. Access the model service over an ingress gateway.

    1. Run the following command to obtain the value of SERVICE_HOSTNAME:

      SERVICE_NAME=torchserve-transformer
      SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
      echo $SERVICE_HOSTNAME

      Expected output:

      torchserve-transformer.default.example.com
    2. Run the following command to access the model service.

      For more information about how to obtain the IP address of the ingress gateway, see Substep 1 Obtain the IP address of the ingress gateway of Step 3 in the "Use Istio resources to route traffic to different versions of a service" topic.

      MODEL_NAME=mnist
      INPUT_PATH=@./input.json
      ASM_GATEWAY="XXXX" # Replace XXXX with the IP address of the ingress gateway. 
      curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${ASM_GATEWAY}/v1/models/$MODEL_NAME:predict
      

      Expected output:

      > POST /v1/models/mnist:predict HTTP/1.1
      > Host: torchserve-transformer.default.example.com
      > User-Agent: curl/7.79.1
      > Accept: */*
      > Content-Length: 427
      > Content-Type: application/x-www-form-urlencoded
      > 
      * Mark bundle as not supporting multiuse
      < HTTP/1.1 200 OK
      < content-length: 19
      < content-type: application/json
      < date: Mon, 13 Nov 2023 05:53:15 GMT
      < server: istio-envoy
      < x-envoy-upstream-service-time: 119
      < 
      * Connection #0 to host xxxx left intact
      {"predictions":[2]}%                           

      The output indicates that the access to the model service is successful.