Deploy a Transformer for Pre/Post Processing via InferenceService - Alibaba Cloud Service Mesh

When your model server expects pre-processed input -- tensors instead of raw Base64 images, for example -- a transformer bridges the gap. It runs three handlers in sequence within a single InferenceService:

The preprocess handler converts raw input into the format the model expects -- for example, decoding a Base64 image into tensors.
The processed data is forwarded to the predictor for inference.
The predictor's response passes through the postprocess handler before returning to the client.

The transformer communicates with the predictor over REST by default.

Prerequisites

Before you begin, make sure you have:

A working KServe environment on Service Mesh (ASM). For setup instructions, see Integrate KServe with ASM
kubectl configured to access your cluster
(Optional) Docker installed, if you plan to build a custom transformer image
(Optional) Access to a container registry, if you plan to push custom images

Note

This tutorial uses KServe 0.10. Different versions may require different input data formats. For the upstream reference, see Deploy Transformer with InferenceService.

Create a transformer Docker image

Choose one of the following methods:

Method	When to use
Build from source	You need to customize the transformer logic
Pre-built image	You want to use the default image transformer without modifications

Build from source

Build the image from the kserve/python directory in the KServe GitHub repository:

cd python
docker build -t <your-registry-url>/image-transformer:latest -f custom_transformer.Dockerfile .
docker push <your-registry-url>/image-transformer:latest

Replace <your-registry-url> with your container registry URL, for example registry.example.com/ml-models.

Use the pre-built image

Use the following image directly in your InferenceService YAML:

asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10

Deploy InferenceService with a transformer

By default, InferenceService uses TorchServe to serve PyTorch models. This example deploys a pre-trained MNIST handwritten digit classifier with a custom image transformer.

Create a file named transformer-new.yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: torch-transformer
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
  transformer:
    containers:
      - image: asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10
        name: kserve-container
        command:
          - "python"
          - "-m"
          - "model"
        args:
          - --model_name
          - mnist

Field	Description
`spec.predictor.model.storageUri`	Google Cloud Storage path to the MNIST PyTorch model, served by TorchServe
`spec.transformer.containers[].image`	The transformer image that pre-processes incoming image data
`spec.transformer.containers[].args`	`--model_name mnist` tells the transformer which model to target

Deploy the InferenceService:
```
kubectl apply -f transformer-new.yaml
```
Wait for the InferenceService to become ready:
```
kubectl get inferenceservice torch-transformer
```
The READY column should show True before you proceed.

Run a prediction

Prepare the input data

Encode a sample image as Base64 and save it in a file named input.json. The following example uses a handwritten digit image:

{
    "instances": [
        {
            "image": {
                "b64": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
            }
        }
    ]
}

Send a prediction request

Get the service hostname:

SERVICE_NAME=torchserve-transformer
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
echo $SERVICE_HOSTNAME

Expected output:

torchserve-transformer.default.example.com

Send a request through the ASM ingress gateway:

To find the ingress gateway IP address, see Obtain the IP address of the ingress gateway.

MODEL_NAME=mnist
INPUT_PATH=@./input.json
ASM_GATEWAY="<ingress-gateway-ip>"  # Replace with your ingress gateway IP address
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${ASM_GATEWAY}/v1/models/$MODEL_NAME:predict

Verify the result

A successful response returns HTTP 200 with a JSON prediction:

< HTTP/1.1 200 OK
< content-length: 19
< content-type: application/json
< server: istio-envoy
<
{"predictions":[2]}

The prediction [2] means the model classified the input image as the digit 2. This confirms that the transformer decoded the Base64 image, converted it to tensors, and forwarded it to the MNIST predictor.

What's next

KServe transformer reference (v0.10) -- custom transformer source code and examples
Integrate KServe with ASM -- additional inference service deployment patterns