When a model service receives its first request, the runtime may defer initialization — loading model files, JIT-compiling graph operations, or cold-starting a Java Virtual Machine (JVM). This deferred initialization can cause higher latency and 408 timeout or 450 errors on initial requests.
The model warm-up feature of Elastic Algorithm Service (EAS) in Platform for AI (PAI) eliminates this cold-start latency. EAS sends warm-up requests to the service engine before the model service goes live. The model service is considered fully started only after these warm-up requests are successfully sent.
How it works
The warm-up flow:
Generate a warm-up request file using an EAS SDK.
Upload the file to Object Storage Service (OSS).
Reference the file in the model service JSON configuration.
Deploy or update the service — EAS sends the warm-up requests automatically during startup.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket to store the warm-up request file
The EAS SDK for Python or SDK for Java installed
A deployed or ready-to-deploy EAS model service
Generate a warm-up request file
A warm-up request file contains one or more sample requests that represent actual production traffic. EAS replays these requests during startup to initialize the model.
Two file formats are supported:
Binary (`.bin`): For TensorFlow and other structured input models. Built using the EAS SDK.
Text (`.txt`): For string-input models. Each line is one request. EAS detects the format automatically.
The inputs and outputs in the warm-up request must exactly match those used in production. Any difference in inputs or outputs causes TensorFlow to reload model files, which defeats the purpose of warm-up.
The following examples show how to build a .bin warm-up request file for a TensorFlow model.
Use the SDK for Python
#!/usr/bin/env python
from eas_prediction import TFRequest
# Build a warm-up request that matches your production request exactly.
# The inputs and outputs must be identical to those used after the service goes live.
req = TFRequest('serving_default')
req.add_feed('sentence1', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
req.add_feed('sentence2', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
req.add_feed('y', [200, 2], TFRequest.DT_INT32, [2] * 200 * 2)
req.add_feed('keep_rate', [], TFRequest.DT_FLOAT, [0.2])
req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
req.add_fetch('sorted_labels')
req.add_fetch('sorted_probs')
# Save the warm-up request to a binary file.
with open("warm_up.bin", "wb") as fw:
fw.write(req.to_string())Use the SDK for Java
Add the eas-sdk dependency to your pom.xml:
<dependency>
<groupId>com.aliyun.openservices.eas</groupId>
<artifactId>eas-sdk</artifactId>
<version>2.0.13</version>
</dependency>Check the Maven repository for the latest version.
import java.io.File;
import com.aliyun.openservices.eas.predict.request.TFDataType;
import com.aliyun.openservices.eas.predict.request.TFRequest;
import org.apache.commons.io.FileUtils;
public class TestTf {
public static void main(String[] args) throws Exception {
// Build a warm-up request that matches your production request exactly.
TFRequest request = new TFRequest();
request.setSignatureName("predict_images");
float[] content = new float[784];
for (int i = 0; i < content.length; i++) {
content[i] = (float) 0.0;
}
request.addFeed("images", TFDataType.DT_FLOAT, new long[]{1, 784}, content);
request.addFetch("scores");
// Write the request to a binary file.
File writename = new File("/path/to/warm_up1.bin");
FileUtils.writeByteArrayToFile(writename, request.getRequest().toByteArray());
}
}Replace /path/to/warm_up1.bin with the actual path where you want to save the file.
Verify the request file
Before uploading the file to OSS, verify it produces a valid response.
Send a test request with curl
curl --data-binary @"</path/to/warmup.bin>" \
-H 'Authorization: <your-token>' \
<service-address>| Placeholder | Description |
|---|---|
</path/to/warmup.bin> | Path to the warm-up request file |
<your-token> | Token used to access the model service |
<service-address> | Endpoint of the model service |
If the response is too large to display in the terminal, add --output <file-path> to save it to a file.
Parse the file directly
Use this method to inspect the request content without a running service.
Python
from eas_prediction import TFRequest
req = TFRequest()
with open('/path/to/warm_up1.bin', 'rb') as wm:
req.request_data.ParseFromString(wm.read())
print(req.request_data)Java
import com.aliyun.openservices.eas.predict.proto.PredictProtos;
import org.apache.commons.io.FileUtils;
import java.io.File;
public class Test {
public static void main(String[] args) throws Exception {
File refile = new File("/path/to/warm_up1.bin");
byte[] data = FileUtils.readFileToByteArray(refile);
PredictProtos.PredictRequest pb = PredictProtos.PredictRequest.parseFrom(data);
System.out.println(pb);
}
}Configure and deploy the model service
Upload the warm-up request file to OSS.
Add the warm-up parameters to the model service JSON configuration:
{ "name": "warm_up_demo", "model_path": "oss://path/to/model", "warm_up_data_path": "oss://path/to/warm_up_test.bin", "processor": "tensorflow_cpu_1.15", "metadata": { "cpu": 2, "instance": 1, "rpc": { "warm_up_count": 5 } } }Parameter Description Default warm_up_data_pathOSS path of the warm-up request file. EAS locates and uses this file automatically during startup. — warm_up_countNumber of times each warm-up request is sent. 5For all other configuration parameters, see Parameters of model services.
Deploy or update the model service. For details, see Create a service and Modify a service.
EAS sends the warm-up requests during startup. The model service is considered fully started only after all warm-up requests succeed.
FAQ
Why does my TensorFlow model still reload files after warm-up?
Issue
In real business scenarios, updating TensorFlow models can lead to service instability. Even with the warm-up feature added to the Processor, this issue may persist. Every distinct input-output signature can cause the model to reload files for warming up.
Cause
This issue arises because the TensorFlow function session->Run(inputs, output_tensor_names, {}, &outputs) performs hash validation on inputs and output_tensor_names. If there are changes in the inputs or outputs, it triggers a reload.
For example, consider a TensorFlow model with the following inputs and outputs:
Inputs:
threshold: []; DT_FLOAT
model_id: []; DT_STRING
input_holder: [-1]; DT_STRING
Outputs:
model_version_id: []; DT_STRING
sorted_labels: [-1, 3]; DT_STRING
sorted_probs: [-1, 3]; DT_FLOATIf the warm-up request fetches only sorted_labels and sorted_probs:
request.addFeed("input_holder", TFDataType.DT_STRING, new long[]{1}, input);
request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);
request.addFetch("sorted_labels");
request.addFetch("sorted_probs");But production requests also fetch model_version_id:
request.addFeed("input_holder", TFDataType.DT_STRING, new long[]{1}, input);
request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);
request.addFetch("sorted_labels");
request.addFetch("sorted_probs");
request.addFetch("model_version_id"); // An additional output is fetched.That extra output changes the hash, causing a reload even after warm-up.
Solution
Each service must use real business requests for warming up. Upload a warm-up file built from an actual production request, with the exact same inputs and outputs. It is sufficient for session->Run to successfully execute once with a matching request — you only need one warm-up file.
Next steps
SDKs: Learn how to use EAS SDKs to build and send requests.
Parameters of model services: Explore the full set of JSON deployment parameters.