All Products
Search
Document Center

Platform For AI:Warm up model services

Last Updated:Apr 01, 2026

When a model service receives its first request, the runtime may defer initialization — loading model files, JIT-compiling graph operations, or cold-starting a Java Virtual Machine (JVM). This deferred initialization can cause higher latency and 408 timeout or 450 errors on initial requests.

The model warm-up feature of Elastic Algorithm Service (EAS) in Platform for AI (PAI) eliminates this cold-start latency. EAS sends warm-up requests to the service engine before the model service goes live. The model service is considered fully started only after these warm-up requests are successfully sent.

How it works

The warm-up flow:

  1. Generate a warm-up request file using an EAS SDK.

  2. Upload the file to Object Storage Service (OSS).

  3. Reference the file in the model service JSON configuration.

  4. Deploy or update the service — EAS sends the warm-up requests automatically during startup.

Prerequisites

Before you begin, ensure that you have:

  • An OSS bucket to store the warm-up request file

  • The EAS SDK for Python or SDK for Java installed

  • A deployed or ready-to-deploy EAS model service

Generate a warm-up request file

A warm-up request file contains one or more sample requests that represent actual production traffic. EAS replays these requests during startup to initialize the model.

Two file formats are supported:

  • Binary (`.bin`): For TensorFlow and other structured input models. Built using the EAS SDK.

  • Text (`.txt`): For string-input models. Each line is one request. EAS detects the format automatically.

Important

The inputs and outputs in the warm-up request must exactly match those used in production. Any difference in inputs or outputs causes TensorFlow to reload model files, which defeats the purpose of warm-up.

The following examples show how to build a .bin warm-up request file for a TensorFlow model.

Use the SDK for Python

#!/usr/bin/env python

from eas_prediction import TFRequest

# Build a warm-up request that matches your production request exactly.
# The inputs and outputs must be identical to those used after the service goes live.
req = TFRequest('serving_default')
req.add_feed('sentence1', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
req.add_feed('sentence2', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
req.add_feed('y', [200, 2], TFRequest.DT_INT32, [2] * 200 * 2)
req.add_feed('keep_rate', [], TFRequest.DT_FLOAT, [0.2])
req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
req.add_fetch('sorted_labels')
req.add_fetch('sorted_probs')

# Save the warm-up request to a binary file.
with open("warm_up.bin", "wb") as fw:
    fw.write(req.to_string())

Use the SDK for Java

Add the eas-sdk dependency to your pom.xml:

<dependency>
  <groupId>com.aliyun.openservices.eas</groupId>
  <artifactId>eas-sdk</artifactId>
  <version>2.0.13</version>
</dependency>

Check the Maven repository for the latest version.

import java.io.File;
import com.aliyun.openservices.eas.predict.request.TFDataType;
import com.aliyun.openservices.eas.predict.request.TFRequest;
import org.apache.commons.io.FileUtils;

public class TestTf {

    public static void main(String[] args) throws Exception {
        // Build a warm-up request that matches your production request exactly.
        TFRequest request = new TFRequest();
        request.setSignatureName("predict_images");

        float[] content = new float[784];
        for (int i = 0; i < content.length; i++) {
            content[i] = (float) 0.0;
        }
        request.addFeed("images", TFDataType.DT_FLOAT, new long[]{1, 784}, content);
        request.addFetch("scores");

        // Write the request to a binary file.
        File writename = new File("/path/to/warm_up1.bin");
        FileUtils.writeByteArrayToFile(writename, request.getRequest().toByteArray());
    }
}

Replace /path/to/warm_up1.bin with the actual path where you want to save the file.

Verify the request file

Before uploading the file to OSS, verify it produces a valid response.

Send a test request with curl

curl --data-binary @"</path/to/warmup.bin>" \
  -H 'Authorization: <your-token>' \
  <service-address>
PlaceholderDescription
</path/to/warmup.bin>Path to the warm-up request file
<your-token>Token used to access the model service
<service-address>Endpoint of the model service

If the response is too large to display in the terminal, add --output <file-path> to save it to a file.

Parse the file directly

Use this method to inspect the request content without a running service.

Python

from eas_prediction import TFRequest

req = TFRequest()
with open('/path/to/warm_up1.bin', 'rb') as wm:
    req.request_data.ParseFromString(wm.read())
    print(req.request_data)

Java

import com.aliyun.openservices.eas.predict.proto.PredictProtos;
import org.apache.commons.io.FileUtils;
import java.io.File;

public class Test {

    public static void main(String[] args) throws Exception {
        File refile = new File("/path/to/warm_up1.bin");
        byte[] data = FileUtils.readFileToByteArray(refile);
        PredictProtos.PredictRequest pb = PredictProtos.PredictRequest.parseFrom(data);
        System.out.println(pb);
    }
}

Configure and deploy the model service

  1. Upload the warm-up request file to OSS.

  2. Add the warm-up parameters to the model service JSON configuration:

    {
        "name": "warm_up_demo",
        "model_path": "oss://path/to/model",
        "warm_up_data_path": "oss://path/to/warm_up_test.bin",
        "processor": "tensorflow_cpu_1.15",
        "metadata": {
            "cpu": 2,
            "instance": 1,
            "rpc": {
                "warm_up_count": 5
            }
        }
    }
    ParameterDescriptionDefault
    warm_up_data_pathOSS path of the warm-up request file. EAS locates and uses this file automatically during startup.
    warm_up_countNumber of times each warm-up request is sent.5

    For all other configuration parameters, see Parameters of model services.

  3. Deploy or update the model service. For details, see Create a service and Modify a service.

EAS sends the warm-up requests during startup. The model service is considered fully started only after all warm-up requests succeed.

FAQ

Why does my TensorFlow model still reload files after warm-up?

Issue

In real business scenarios, updating TensorFlow models can lead to service instability. Even with the warm-up feature added to the Processor, this issue may persist. Every distinct input-output signature can cause the model to reload files for warming up.

Cause

This issue arises because the TensorFlow function session->Run(inputs, output_tensor_names, {}, &outputs) performs hash validation on inputs and output_tensor_names. If there are changes in the inputs or outputs, it triggers a reload.

For example, consider a TensorFlow model with the following inputs and outputs:

Inputs:
  threshold: []; DT_FLOAT
  model_id: []; DT_STRING
  input_holder: [-1]; DT_STRING

Outputs:
  model_version_id: []; DT_STRING
  sorted_labels: [-1, 3]; DT_STRING
  sorted_probs: [-1, 3]; DT_FLOAT

If the warm-up request fetches only sorted_labels and sorted_probs:

request.addFeed("input_holder", TFDataType.DT_STRING, new long[]{1}, input);
request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);

request.addFetch("sorted_labels");
request.addFetch("sorted_probs");

But production requests also fetch model_version_id:

request.addFeed("input_holder", TFDataType.DT_STRING, new long[]{1}, input);
request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);

request.addFetch("sorted_labels");
request.addFetch("sorted_probs");
request.addFetch("model_version_id"); // An additional output is fetched.

That extra output changes the hash, causing a reload even after warm-up.

Solution

Each service must use real business requests for warming up. Upload a warm-up file built from an actual production request, with the exact same inputs and outputs. It is sufficient for session->Run to successfully execute once with a matching request — you only need one warm-up file.

Next steps