All Products
Search
Document Center

Alibaba Cloud Service Mesh:Deploy a transformer with InferenceService

Last Updated:Mar 11, 2026

When your model server expects pre-processed input -- tensors instead of raw Base64 images, for example -- a transformer bridges the gap. It runs three handlers in sequence within a single InferenceService:

  1. The preprocess handler converts raw input into the format the model expects -- for example, decoding a Base64 image into tensors.

  2. The processed data is forwarded to the predictor for inference.

  3. The predictor's response passes through the postprocess handler before returning to the client.

The transformer communicates with the predictor over REST by default.

Prerequisites

Before you begin, make sure you have:

  • A working KServe environment on Service Mesh (ASM). For setup instructions, see Integrate KServe with ASM

  • kubectl configured to access your cluster

  • (Optional) Docker installed, if you plan to build a custom transformer image

  • (Optional) Access to a container registry, if you plan to push custom images

Note

This tutorial uses KServe 0.10. Different versions may require different input data formats. For the upstream reference, see Deploy Transformer with InferenceService.

Create a transformer Docker image

Choose one of the following methods:

MethodWhen to use
Build from sourceYou need to customize the transformer logic
Pre-built imageYou want to use the default image transformer without modifications

Build from source

Build the image from the kserve/python directory in the KServe GitHub repository:

cd python
docker build -t <your-registry-url>/image-transformer:latest -f custom_transformer.Dockerfile .
docker push <your-registry-url>/image-transformer:latest

Replace <your-registry-url> with your container registry URL, for example registry.example.com/ml-models.

Use the pre-built image

Use the following image directly in your InferenceService YAML:

asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10

Deploy InferenceService with a transformer

By default, InferenceService uses TorchServe to serve PyTorch models. This example deploys a pre-trained MNIST handwritten digit classifier with a custom image transformer.

  1. Create a file named transformer-new.yaml:

    apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
      name: torch-transformer
    spec:
      predictor:
        model:
          modelFormat:
            name: pytorch
          storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
      transformer:
        containers:
          - image: asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10
            name: kserve-container
            command:
              - "python"
              - "-m"
              - "model"
            args:
              - --model_name
              - mnist
    FieldDescription
    spec.predictor.model.storageUriGoogle Cloud Storage path to the MNIST PyTorch model, served by TorchServe
    spec.transformer.containers[].imageThe transformer image that pre-processes incoming image data
    spec.transformer.containers[].args--model_name mnist tells the transformer which model to target
  2. Deploy the InferenceService:

    kubectl apply -f transformer-new.yaml
  3. Wait for the InferenceService to become ready:

    kubectl get inferenceservice torch-transformer

    The READY column should show True before you proceed.

Run a prediction

Prepare the input data

Encode a sample image as Base64 and save it in a file named input.json. The following example uses a handwritten digit image:

Sample handwritten digit
{
    "instances": [
        {
            "image": {
                "b64": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC"
            }
        }
    ]
}

Send a prediction request

  1. Get the service hostname:

    SERVICE_NAME=torchserve-transformer
    SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
    echo $SERVICE_HOSTNAME

    Expected output:

    torchserve-transformer.default.example.com
  2. Send a request through the ASM ingress gateway:

    To find the ingress gateway IP address, see Obtain the IP address of the ingress gateway.

    MODEL_NAME=mnist
    INPUT_PATH=@./input.json
    ASM_GATEWAY="<ingress-gateway-ip>"  # Replace with your ingress gateway IP address
    curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${ASM_GATEWAY}/v1/models/$MODEL_NAME:predict

Verify the result

A successful response returns HTTP 200 with a JSON prediction:

< HTTP/1.1 200 OK
< content-length: 19
< content-type: application/json
< server: istio-envoy
<
{"predictions":[2]}

The prediction [2] means the model classified the input image as the digit 2. This confirms that the transformer decoded the Base64 image, converted it to tensors, and forwarded it to the MNIST predictor.

What's next