Optimize TensorFlow models using PAI-Blade Python APIs. All results measured on NVIDIA T4 GPU.
Prerequisites
-
TensorFlow and PAI-Blade wheel packages are installed. For more information, see Install Blade.
-
A trained TensorFlow model is available. This topic uses a public ResNet50 model.
Procedure
This topic demonstrates how to optimize a TensorFlow model using a public ResNet50 model. Adapt these steps for your own TensorFlow models.
-
Import PAI-Blade and dependencies.
import os import numpy as np import tensorflow.compat.v1 as tf import blade -
Write a function to download the model and test data.
PAI-Blade supports optimization without test data (zero-input optimization), but results are more accurate with real input data. Provide test data when possible. The following sample code downloads the model and test data.
def _wget_demo_tgz(): # Download a public ResNet50 model. url = 'http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz' local_tgz = os.path.basename(url) local_dir = local_tgz.split('.')[0] if not os.path.exists(local_dir): blade.util.wget_url(url, local_tgz) blade.util.unpack(local_tgz) model_path = os.path.abspath(os.path.join(local_dir, "frozen_inference_graph.pb")) graph_def = tf.GraphDef() with open(model_path, 'rb') as f: graph_def.ParseFromString(f.read()) # Use random numbers as test data. test_data = np.random.rand(1, 800,1000, 3) return graph_def, {'image_tensor:0': test_data} graph_def, test_data = _wget_demo_tgz() -
Call
blade.optimizeto optimize the model. For parameter details, see Python API reference.input_nodes=['image_tensor'] output_nodes = ['detection_boxes', 'detection_scores', 'detection_classes', 'num_detections', 'detection_masks'] optimized_model, opt_spec, report = blade.optimize( graph_def, # Model to optimize. In this example, a tf.GraphDef object. Can also be a SavedModel path. 'o1', # Optimization level. Valid values: o1 and o2. device_type='gpu', # Target device. Valid values: gpu, cpu, and edge. inputs=input_nodes, # Input nodes. Optional. PAI-Blade automatically infers if not specified. outputs=output_nodes, # Output nodes. test_data=[test_data] # Test data. )blade.optimizereturns three objects:-
optimized_model: The optimized model. A
tf.GraphDefobject in this example. -
opt_spec: Configuration information, environment variables, and resource files required to reproduce optimization results. Apply using a
withstatement. -
report: Optimization report. Print directly to view. For parameter details, see Optimization report.
Progress information during optimization:
[Progress] 5%, phase: user_test_data_validation. [Progress] 10%, phase: test_data_deduction. [Progress] 15%, phase: CombinedSwitch_1. [Progress] 24%, phase: TfStripUnusedNodes_22. [Progress] 33%, phase: TfStripDebugOps_23. [Progress] 42%, phase: TfFoldConstants_24. [Progress] 51%, phase: CombinedSequence_7. [Progress] 59%, phase: TfCudnnrnnBilstm_25. [Progress] 68%, phase: TfFoldBatchNorms_26. [Progress] 77%, phase: TfNonMaxSuppressionOpt_27. [Progress] 86%, phase: CombinedSwitch_20. [Progress] 95%, phase: model_collecting. [Progress] 100%, Finished! -
-
Print the optimization report.
print("Report: {}".format(report))The report shows which optimization items contribute most to performance improvement.
Report: { // ...... "optimizations": [ // ...... { "name": "TfNonMaxSuppressionOpt", "status": "effective", "speedup": "1.58", // Acceleration ratio. "pre_run": "522.74 ms", // Latency before optimization. "post_run": "331.45 ms" // Latency after optimization. }, { "name": "TfAutoMixedPrecisionGpu", "status": "effective", "speedup": "2.43", "pre_run": "333.30 ms", "post_run": "136.97 ms" } // ...... ], // End-to-end optimization results. "overall": { "baseline": "505.91 ms", // Latency of the original model. "optimized": "136.83 ms", // Latency of the optimized model. "speedup": "3.70" // Acceleration ratio. }, // ...... } -
Compare the performance before and after optimization.
import time def benchmark(model): tf.reset_default_graph() with tf.Session() as sess: sess.graph.as_default() tf.import_graph_def(model, name="") # Warmup! for i in range(0, 1000): sess.run(['image_tensor:0'], test_data) # Benchmark! num_runs = 1000 start = time.time() for i in range(0, num_runs): sess.run(['image_tensor:0'], test_data) elapsed = time.time() - start rt_ms = elapsed / num_runs * 1000.0 # Show the result! print("Latency of model: {:.2f} ms.".format(rt_ms)) # Test the speed of the original model. benchmark(graph_def) # Test the speed of the optimized model. with opt_spec: benchmark(optimized_model)Performance test results. The results match the values in the optimization report.
Latency of model: 530.26 ms. Latency of model: 148.40 ms.
Extensions
The model parameter of blade.optimize supports multiple input formats. For TensorFlow models, pass the model in one of these ways:
-
Pass a tf.GraphDef object.
-
Load a frozen model in PB or PBTXT format from a file.
-
Import a SavedModel from a specified path.
This example passes a tf.GraphDef object in memory to blade.optimize. The following sample code shows the other two methods:
-
Pass a frozen PB file
optimized_model, opt_spec, report = blade.optimize( './path/to/frozen_pb.pb', # File can also be in .pbtxt format. 'o1', device_type='gpu', ) -
Pass a SavedModel path
optimized_model, opt_spec, report = blade.optimize( './path/to/saved_model_directory/', 'o1', device_type='gpu', )
Troubleshooting
GPU optimization fails with "pipeline TfGpuO1Pipeline is not registered"
Root cause: GPU optimization requires additional dependencies that are not installed or the GPU runtime environment is not properly configured.
Solution:
-
Verify CUDA and cuDNN are installed and match the TensorFlow version requirements. For example, TensorFlow 2.x typically requires CUDA 11.2+ and cuDNN 8.1+.
-
Check that the TensorFlow GPU version is installed, not the CPU-only version:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"If this returns an empty list, reinstall TensorFlow with GPU support:
pip install tensorflow-gpu -
Ensure PAI-Blade is installed with GPU support. Reinstall if necessary:
pip install --upgrade blade-gpu -
If GPU optimization is not required, use
device_type='cpu'instead:optimized_model, opt_spec, report = blade.optimize( graph_def, 'o1', device_type='cpu', # Use CPU optimization outputs=output_nodes, test_data=[test_data] )
GPU optimization requirements
GPU optimization (device_type='gpu') requires:
-
NVIDIA GPU with CUDA Compute Capability 6.0 or higher (Pascal architecture or later)
-
CUDA Toolkit 11.2 or later
-
cuDNN 8.1 or later
-
TensorFlow GPU version matching the CUDA/cuDNN versions
-
PAI-Blade GPU package (
blade-gpu)
CPU optimization (device_type='cpu') does not require CUDA or GPU hardware and uses oneDNN optimization libraries instead.
Next steps
After optimizing a model, execute it directly in Python or deploy it as an EAS service. PAI-Blade also provides a C++ SDK to integrate optimized models into applications. For more information, see Use an SDK to deploy a TensorFlow model for inference.