Use InferenceService to deploy a transformer - Alibaba Cloud Service Mesh

Transformer is an InferenceService component that is used for pre-processing, post-processing, and model inference. InferenceService communicates with a transformer by using the REST protocol. A transformer can easily transform raw input data into the format required by the model server so that end-to-end data processing and model inference can be implemented.

Prerequisites

You can run basic inference services as expected in your environment. For more information, see Integrate the cloud-native inference service KServe with ASM.

Note

Different KServe versions may require different input data formats. In this example, KServe 0.10 is used. For more information, see Deploy Transformer with InferenceService.

Step 1: Create a transformer Docker image

Method 1: Under the kserve/python directory of KServe on GitHub, create a transformer Docker image by using Dockerfile.

cd python
docker build -t $DOCKER_USER/image-transformer:latest -f custom_transformer.Dockerfile .

docker push {username}/image-transformer:latest

Method 2: Use an existing image.

 asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10

Step 2: Use REST predictor to deploy InferenceService

By default, InferenceService uses TorchServe to serve the PyTorch models, and the models are loaded from a model repository. In this example, the model repository has a MNIST model.

Create a transformer-new.yaml file that contains the following content:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: torch-transformer
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
  transformer:
    containers:
      - image: asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10
        name: kserve-container
        command:
          - "python"
          - "-m"
          - "model"
        args:
          - --model_name
          - mnist

Run the following command to deploy InferenceService:
```
kubectl apply -f transformer-new.yaml
```

Step 3: Run a prediction

Verify the request input payload.

Encode the content of the following image to Base64 and save it as the following input.json file.

{
    "instances":[
       {
          "image":{
            "b64": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
          }
       }
    ]
}

Access the model service over an ingress gateway.

Run the following command to obtain the value of SERVICE_HOSTNAME:

SERVICE_NAME=torchserve-transformer
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
echo $SERVICE_HOSTNAME

Expected output:

torchserve-transformer.default.example.com

Run the following command to access the model service.

For more information about how to obtain the IP address of the ingress gateway, see Substep 1 Obtain the IP address of the ingress gateway of Step 3 in the "Use Istio resources to route traffic to different versions of a service" topic.

MODEL_NAME=mnist
INPUT_PATH=@./input.json
ASM_GATEWAY="XXXX" # Replace XXXX with the IP address of the ingress gateway. 
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${ASM_GATEWAY}/v1/models/$MODEL_NAME:predict

Expected output:

> POST /v1/models/mnist:predict HTTP/1.1
> Host: torchserve-transformer.default.example.com
> User-Agent: curl/7.79.1
> Accept: */*
> Content-Length: 427
> Content-Type: application/x-www-form-urlencoded
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 19
< content-type: application/json
< date: Mon, 13 Nov 2023 05:53:15 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 119
< 
* Connection #0 to host xxxx left intact
{"predictions":[2]}%

The output indicates that the access to the model service is successful.