All Products
Search
Document Center

Platform For AI:Warm up model services

Last Updated:Apr 24, 2025

Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides the model warm-up feature to reduce the time required to process the initial request sent to an online model service. This feature warms up a model before the model service is published online. This ensures that model services work as expected immediately after they are published online. This topic describes how to use the model warm-up feature.

Background information

When sending a request to a model for the first time, different Runtimes may perform various initialization processes, leading to higher latency for the initial request and possible timeouts. For example, with the Java Processor, the cold start of the Java virtual machine (JVM) can cause the initial requests to take longer time. Similarly, for some TensorFlow models, the initial call requires loading model-related files or parameters into memory, which can be time-consuming and result in a high response time (RT) for the first few requests, potentially causing a 408 timeout or 450 error. To address this, EAS provides the model warm-up feature that allows warming up the model service before it goes live, ensuring it can serve requests promptly upon deployment.

The EAS service engine sends warm-up requests to itself before the model service goes live. While the initial request may take longer, subsequent requests can be completed in a short amount of time.

To use the EAS model warm-upg feature, first generate a warm-up request file. Then, specify this request file in the JSON file used for deploying the model service. During deployment or updating of the model service, the EAS service engine will send the warm-up requests. The model service is considered fully started only after these requests are successfully sent.

Use the model warm-up feature

To generate a warm-up request file, you need to construct a request file based on the requests that can be sent after a model service is published online. The warm-up request file is read or called during the warm-up process. You can use EAS SDKs that are used to call services to construct a request file. For more information, see SDKs. The following example describes how to use the model warm-up feature to warm up a TensorFlow model.

  1. Generate a warm-up request file.

    In the following examples, EAS SDK for Python and EAS SDK for Java are used to describe how to construct request files for TensorFlow model services. For other types of models, you can can refer to similar methods. For model services that use strings as inputs, you can store requests in a TXT file as strings. Each TXT file can contain multiple requests. Each request occupies a data row in the TXT file. EAS automatically identifies the file format and sends warm-up requests in different formats.

    Important

    The inputs and outputs of the requests used to warm up a TensorFlow model must be the same as those of the requests sent after the model service is published online.

    The following sample code provides an example on how to use SDKs to construct a request file for a TensorFlow model service.

    • Use the SDK for Python

      #!/usr/bin/env python
      
      from eas_prediction import PredictClient
      from eas_prediction import StringRequest
      from eas_prediction import TFRequest
      
      if __name__ == '__main__':
              # The sample warm-up request. Construct a warm-up request based on your actual requirements. The inputs and outputs of the requests used for warm-up must be the same as those of the requests that are sent after the model service is published online. 
              req = TFRequest('serving_default')
              req.add_feed('sentence1', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
              req.add_feed('sentence2', [200, 15], TFRequest.DT_INT32, [1] * 200 * 15)
              req.add_feed('y', [200, 2], TFRequest.DT_INT32, [2] * 200 * 2)
              req.add_feed('keep_rate', [], TFRequest.DT_FLOAT, [0.2])
              req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
              req.add_fetch('sorted_labels')
              req.add_fetch('sorted_probs')
              # print(req.request_data) # Display the request data. 
              with open("warm_up.bin", "wb") as fw :
                  fw.write(req.to_string());
              # Save the warm_up.bin file as the warm-up request file.

    • Use the SDK for Java

      To use EAS SDK for Java in a Maven project, you must add the eas-sdk dependencies to the <dependencies> in the pom.xml file. The latest version in the Maven repository prevails. Sample code:

      <dependency>
        <groupId>com.aliyun.openservices.eas</groupId>
        <artifactId>eas-sdk</artifactId>
        <version>2.0.13</version>
      </dependency>

      Sample code of SDK for Java:

      import java.io.File;
      import com.aliyun.openservices.eas.predict.request.TFDataType;
      import com.aliyun.openservices.eas.predict.request.TFRequest;
      import org.apache.commons.io.FileUtils;
      
      public class TestTf {
      
          public static void main(String[] args) throws Exception{
              // The sample warm-up request. Construct a warm-up request based on your actual requirements. 
              TFRequest request = new TFRequest();
              request.setSignatureName("predict_images");
              float[] content = new float[784];
              for (int i = 0; i < content.length; i++){
                content[i] = (float)0.0;
              }
              request.addFeed("images", TFDataType.DT_FLOAT, new long[]{1, 784}, content);
              request.addFetch("scores");
              
              try {
                  // Construct a request file. If no existing file is available, create a new file. 
                  File writename = new File("/path/to/warm_up1.bin");
                  FileUtils.writeByteArrayToFile(writename, request.getRequest().toByteArray());
              } catch (Exception ex) {
              }
          }
      }
  2. Verify the request file.

    You can use one of the following methods to verify the request file:

    • Method 1: Send a service request for verification

      Run the following command to send a request to the model service: If the returned content is too large and cannot be printed on the terminal, you can add a --output <filePath> field to store the result in a file.

      curl  --data-binary @"</path/to/warmup.bin>" -H 'Authorization: <yourToken>' <serviceAddress>

      Replace the following parameters with actual values:

      • </path/to/warmup.bin>: the path of the warm-up request file that is generated in the preceding step.

      • <yourToken>: the token that is used to access the model service.

      • <serviceAddress>: the endpoint of the model service.

    • Method 2: Parse the request file for verification

      • Use Python

        from eas_prediction import TFRequest
        
        req = TFRequest()
        with open('/path/to/warm_up1.bin', 'rb') as wm:
            req.request_data.ParseFromString(wm.read())
            print(req.request_data)
        
      • Use Java

        import com.aliyun.openservices.eas.predict.proto.PredictProtos;
        import org.apache.commons.io.FileUtils;
        import java.io.File;
        
        public class Test {
        
            public static void main(String[] args) throws Exception {
        
                File refile = new File("/path/to/warm_up1.bin");
                byte[] data = FileUtils.readFileToByteArray(refile);
                PredictProtos.PredictRequest pb = PredictProtos.PredictRequest.parseFrom(data);
                System.out.println(pb);
            }
        }
  3. Configure the model service.

    1. Upload the generated warm-up request file to OSS.

    2. Configure the parameters of the model service.

      In the model description file in the JSON format, configure the parameters of the model service.

      {
          "name":"warm_up_demo",
          "model_path":"oss://path/to/model", 
          "warm_up_data_path":"oss://path/to/warm_up_test.bin", // The path of the warm-up request file in OSS. 
          "processor":"tensorflow_cpu_1.15",
          "metadata":{
              "cpu":2,
              "instance":1,
              "rpc": {
                  "warm_up_count": 5, // The number of times each warm-up request is sent. If you do not specify a value for this parameter, 5 is used as the default value. 
              }
          }
      }

      The following warm-up parameters are used. For information about other parameters, see Parameters for JSON deployment.

      • warm_up_data_path: the path of the warm-up request file in OSS. The system automatically searches for the file and warms up the model by using the file before the model service is published online.

      • warm_up_count: the number of times each warm-up request is sent. If you do not specify a value for this parameter, 5 is used as the default value.

  4. Deploy or update the model service. For more information, see Create a service or Modify a service.

    When you deploy or update the model service, the service engine of EAS sends warm-up requests to warm up the model service.

FAQ about warming up TensorFlow models

  • Issue

    In real business scenarios, updating TensorFlow models can lead to service instability. Even with warm-up feature added to the Processor, this issue may persist. Testing reveals that every distinct input-output signature can cause the model to reload files for warming up. Even when the model has been preloaded with all signatures, sending certain requests may still require considerable time for reloading.

  • Causes

    This issue arises because the TensorFlow function session->Run(inputs, output_tensor_names, {}, &outputs)  performs hash validation on inputs and output_tensor_names. If there are changes in the inputs or outputs, it triggers a reload.

    The following sample code shows the inputs of a sample TensorFlow model:

    Inputs:
      threshold: []; DT_FLOAT
      model_id: []; DT_STRING
      input_holder: [-1]; DT_STRING

    The following sample code shows the outputs of the TensorFlow model:

    Outputs:
      model_version_id: []; DT_STRING
      sorted_labels: [-1, 3]; DT_STRING
      sorted_probs: [-1, 3]; DT_FLOAT

    The following sample warm-up requests are sent:

    request.addFeed("input_holder",TFDataType.DT_STRING, new long[]{1}, input);
    request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
    request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);
    
    request.addFetch("sorted_labels");
    request.addFetch("sorted_probs");

    After the TensorFlow model is warmed up, the following requests are sent. Compared with the warm-up requests, the outputs of the following requests contain an additional parameter. In this case, the request file needs to be reloaded:

    request.addFeed("input_holder",TFDataType.DT_STRING, new long[]{1}, input);
    request.addFeed("threshold", TFDataType.DT_FLOAT, new long[] {}, th);
    request.addFeed("model_id", TFDataType.DT_STRING, new long[]{}, model_name);
    
    request.addFetch("sorted_labels");
    request.addFetch("sorted_probs");
    request.addFetch("model_version_id"); // An additional parameter is returned.

  • Solution

    Each service needs to use real business requests for warming up, and this warming process is specifically applicable to the inputs and outputs of those requests. Therefore, the model warm-up feature provided by EAS requires you to upload actual request data. It is sufficient for the session->Run to successfully execute once with a real request; thus, you can upload just one warm-up file and ensure the warm-up process adheres strictly to the inputs and outputs used in actual calls.