All Products
Search
Document Center

Platform For AI:Optimize a TensorFlow model

Last Updated:Mar 15, 2026

Optimize TensorFlow models using PAI-Blade Python APIs. All results measured on NVIDIA T4 GPU.

Prerequisites

  • TensorFlow and PAI-Blade wheel packages are installed. For more information, see Install Blade.

  • A trained TensorFlow model is available. This topic uses a public ResNet50 model.

Procedure

This topic demonstrates how to optimize a TensorFlow model using a public ResNet50 model. Adapt these steps for your own TensorFlow models.

  1. Import PAI-Blade and dependencies.

    import os
    import numpy as np
    import tensorflow.compat.v1 as tf
    import blade
  2. Write a function to download the model and test data.

    PAI-Blade supports optimization without test data (zero-input optimization), but results are more accurate with real input data. Provide test data when possible. The following sample code downloads the model and test data.

    def _wget_demo_tgz():
        # Download a public ResNet50 model.
        url = 'http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz'
        local_tgz = os.path.basename(url)
        local_dir = local_tgz.split('.')[0]
        if not os.path.exists(local_dir):
            blade.util.wget_url(url, local_tgz)
            blade.util.unpack(local_tgz)
        model_path = os.path.abspath(os.path.join(local_dir, "frozen_inference_graph.pb"))
        graph_def = tf.GraphDef()
        with open(model_path, 'rb') as f:
            graph_def.ParseFromString(f.read())
        # Use random numbers as test data.
        test_data = np.random.rand(1, 800,1000, 3)
        return graph_def, {'image_tensor:0': test_data}
    
    graph_def, test_data = _wget_demo_tgz()
  3. Call blade.optimize to optimize the model. For parameter details, see Python API reference.

    input_nodes=['image_tensor']
    output_nodes = ['detection_boxes', 'detection_scores', 'detection_classes', 'num_detections', 'detection_masks']
    
    optimized_model, opt_spec, report = blade.optimize(
        graph_def,                 # Model to optimize. In this example, a tf.GraphDef object. Can also be a SavedModel path.
        'o1',                      # Optimization level. Valid values: o1 and o2.
        device_type='gpu',         # Target device. Valid values: gpu, cpu, and edge.
        inputs=input_nodes,        # Input nodes. Optional. PAI-Blade automatically infers if not specified.
        outputs=output_nodes,      # Output nodes.
        test_data=[test_data]      # Test data.
    )

    blade.optimize returns three objects:

    • optimized_model: The optimized model. A tf.GraphDef object in this example.

    • opt_spec: Configuration information, environment variables, and resource files required to reproduce optimization results. Apply using a with statement.

    • report: Optimization report. Print directly to view. For parameter details, see Optimization report.

    Progress information during optimization:

    [Progress] 5%, phase: user_test_data_validation.
    [Progress] 10%, phase: test_data_deduction.
    [Progress] 15%, phase: CombinedSwitch_1.
    [Progress] 24%, phase: TfStripUnusedNodes_22.
    [Progress] 33%, phase: TfStripDebugOps_23.
    [Progress] 42%, phase: TfFoldConstants_24.
    [Progress] 51%, phase: CombinedSequence_7.
    [Progress] 59%, phase: TfCudnnrnnBilstm_25.
    [Progress] 68%, phase: TfFoldBatchNorms_26.
    [Progress] 77%, phase: TfNonMaxSuppressionOpt_27.
    [Progress] 86%, phase: CombinedSwitch_20.
    [Progress] 95%, phase: model_collecting.
    [Progress] 100%, Finished!
  4. Print the optimization report.

    print("Report: {}".format(report))

    The report shows which optimization items contribute most to performance improvement.

    Report: {
      // ......
      "optimizations": [
        // ......
        {
          "name": "TfNonMaxSuppressionOpt",
          "status": "effective",
          "speedup": "1.58",        // Acceleration ratio.
          "pre_run": "522.74 ms",   // Latency before optimization.
          "post_run": "331.45 ms"   // Latency after optimization.
        },
        {
          "name": "TfAutoMixedPrecisionGpu",
          "status": "effective",
          "speedup": "2.43",
          "pre_run": "333.30 ms",
          "post_run": "136.97 ms"
        }
        // ......
      ],
      // End-to-end optimization results.
      "overall": {
        "baseline": "505.91 ms",    // Latency of the original model.
        "optimized": "136.83 ms",   // Latency of the optimized model.
        "speedup": "3.70"           // Acceleration ratio.
      },
      // ......
    }
  5. Compare the performance before and after optimization.

    import time
    
    def benchmark(model):
        tf.reset_default_graph()
        with tf.Session() as sess:
            sess.graph.as_default()
            tf.import_graph_def(model, name="")
            # Warmup!
            for i in range(0, 1000):
                sess.run(['image_tensor:0'], test_data)
            # Benchmark!
            num_runs = 1000
            start = time.time()
            for i in range(0, num_runs):
                sess.run(['image_tensor:0'], test_data)
            elapsed = time.time() - start
            rt_ms = elapsed / num_runs * 1000.0
            # Show the result!
            print("Latency of model: {:.2f} ms.".format(rt_ms))
    
    # Test the speed of the original model.
    benchmark(graph_def)
    
    # Test the speed of the optimized model.
    with opt_spec:
        benchmark(optimized_model)

    Performance test results. The results match the values in the optimization report.

    Latency of model: 530.26 ms.
    Latency of model: 148.40 ms.

Extensions

The model parameter of blade.optimize supports multiple input formats. For TensorFlow models, pass the model in one of these ways:

  • Pass a tf.GraphDef object.

  • Load a frozen model in PB or PBTXT format from a file.

  • Import a SavedModel from a specified path.

This example passes a tf.GraphDef object in memory to blade.optimize. The following sample code shows the other two methods:

  • Pass a frozen PB file

    optimized_model, opt_spec, report = blade.optimize(
        './path/to/frozen_pb.pb',  # File can also be in .pbtxt format.
        'o1',
        device_type='gpu',
    )
  • Pass a SavedModel path

    optimized_model, opt_spec, report = blade.optimize(
        './path/to/saved_model_directory/',
        'o1',
        device_type='gpu',
    )

Troubleshooting

GPU optimization fails with "pipeline TfGpuO1Pipeline is not registered"

Root cause: GPU optimization requires additional dependencies that are not installed or the GPU runtime environment is not properly configured.

Solution:

  1. Verify CUDA and cuDNN are installed and match the TensorFlow version requirements. For example, TensorFlow 2.x typically requires CUDA 11.2+ and cuDNN 8.1+.

  2. Check that the TensorFlow GPU version is installed, not the CPU-only version:

    python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

    If this returns an empty list, reinstall TensorFlow with GPU support:

    pip install tensorflow-gpu
  3. Ensure PAI-Blade is installed with GPU support. Reinstall if necessary:

    pip install --upgrade blade-gpu
  4. If GPU optimization is not required, use device_type='cpu' instead:

    optimized_model, opt_spec, report = blade.optimize(
        graph_def,
        'o1',
        device_type='cpu',  # Use CPU optimization
        outputs=output_nodes,
        test_data=[test_data]
    )

GPU optimization requirements

GPU optimization (device_type='gpu') requires:

  • NVIDIA GPU with CUDA Compute Capability 6.0 or higher (Pascal architecture or later)

  • CUDA Toolkit 11.2 or later

  • cuDNN 8.1 or later

  • TensorFlow GPU version matching the CUDA/cuDNN versions

  • PAI-Blade GPU package (blade-gpu)

CPU optimization (device_type='cpu') does not require CUDA or GPU hardware and uses oneDNN optimization libraries instead.

Next steps

After optimizing a model, execute it directly in Python or deploy it as an EAS service. PAI-Blade also provides a C++ SDK to integrate optimized models into applications. For more information, see Use an SDK to deploy a TensorFlow model for inference.