Optimize a TensorFlow model

Platform for AI (PAI)-Blade allows you to optimize models in various ways. You need to only install wheel packages in your local environment. Then, you can optimize models by calling a Python method. This topic describes how to use PAI-Blade to optimize a TensorFlow model. In this example, an NVIDIA Tesla T4 GPU is used.

Prerequisites

The wheel packages of PAI-Blade and TensorFlow are installed.
A TensorFlow model is trained. In this example, an open TensorFlow ResNet50 model is used.

In this example, an open TensorFlow ResNet50 model is optimized. You can also optimize your own TensorFlow model.

Import PAI-Blade and other dependency libraries.

import os
import numpy as np
import tensorflow.compat.v1 as tf
import blade

Write code for a method that is used to download the model to be optimized and test data.

PAI-Blade supports zero-input model optimization. However, to ensure the accuracy of optimization results, we recommend that you optimize models with the input of test data. The following sample code provides an example:

def _wget_demo_tgz():
    # Download an open TensorFlow ResNet50 model. 
    url = 'http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz'
    local_tgz = os.path.basename(url)
    local_dir = local_tgz.split('.')[0]
    if not os.path.exists(local_dir):
        blade.util.wget_url(url, local_tgz)
        blade.util.unpack(local_tgz)
    model_path = os.path.abspath(os.path.join(local_dir, "frozen_inference_graph.pb"))
    graph_def = tf.GraphDef()
    with open(model_path, 'rb') as f:
        graph_def.ParseFromString(f.read())
    # Use random numbers as test data. 
    test_data = np.random.rand(1, 800,1000, 3)
    return graph_def, {'image_tensor:0': test_data}

graph_def, test_data = _wget_demo_tgz()

Call the blade.optimize method to optimize the ResNet50 model. The following sample code provides an example on how to optimize the model:

input_nodes=['image_tensor']
output_nodes = ['detection_boxes', 'detection_scores', 'detection_classes', 'num_detections', 'detection_masks']

optimized_model, opt_spec, report = blade.optimize(
    graph_def,                 # The model to be optimized. In this example, a tf.GraphDef object is specified. You can also set this parameter to the path in which the optimized model is stored. 
    'o1',                      # The optimization level. Valid values: o1 and o2. 
    device_type='gpu',         # The type of the device on which the model is run. Valid values: gpu, cpu, and edge. 
    inputs=input_nodes,        # The input node. This parameter is optional. If you do not specify this parameter, PAI-Blade automatically infers the input node. 
    outputs=output_nodes,      # The output node. 
    test_data=[test_data]      # The test data. 
)

The blade.optimize method returns the following objects:

optimized_model: the optimized model. In this example, a tf.GraphDef object is returned.
opt_spec: the external dependencies that are required to reproduce the optimization results. The external dependencies include the configuration information, environment variables, and resource files. You can execute the WITH statement in Python to make the external dependencies take effect.
report: the optimization report, which can be directly displayed. For more information about the parameters in the optimization report, see Optimization report.

During the optimization, the optimization processes are displayed, as shown in the following example:

[Progress] 5%, phase: user_test_data_validation.
[Progress] 10%, phase: test_data_deduction.
[Progress] 15%, phase: CombinedSwitch_1.
[Progress] 24%, phase: TfStripUnusedNodes_22.
[Progress] 33%, phase: TfStripDebugOps_23.
[Progress] 42%, phase: TfFoldConstants_24.
[Progress] 51%, phase: CombinedSequence_7.
[Progress] 59%, phase: TfCudnnrnnBilstm_25.
[Progress] 68%, phase: TfFoldBatchNorms_26.
[Progress] 77%, phase: TfNonMaxSuppressionOpt_27.
[Progress] 86%, phase: CombinedSwitch_20.
[Progress] 95%, phase: model_collecting.
[Progress] 100%, Finished!

Display the optimization report.

print("Report: {}".format(report))

In the optimization report, you can view the optimization items that achieve optimization effects, as shown in the following example:

Report: {
  // ......
  "optimizations": [
    // ......
    {
      "name": "TfNonMaxSuppressionOpt",
      "status": "effective",
      "speedup": "1.58",        // The acceleration ratio. 
      "pre_run": "522.74 ms",   // The latency before the optimization. 
      "post_run": "331.45 ms"   // The latency after the optimization. 
    },
    {
      "name": "TfAutoMixedPrecisionGpu",
      "status": "effective",
      "speedup": "2.43",
      "pre_run": "333.30 ms",
      "post_run": "136.97 ms"
    }
    // ......
  ],
  // The end-to-end optimization results. 
  "overall": {
    "baseline": "505.91 ms",    // The latency of the original model. 
    "optimized": "136.83 ms",   // The latency of the optimized model. 
    "speedup": "3.70"           // The acceleration ratio. 
  },
  // ......
}

Compare the performance of the model before and after the optimization.

import time

def benchmark(model):
    tf.reset_default_graph()
    with tf.Session() as sess:
        sess.graph.as_default()
        tf.import_graph_def(model, name="")
        # Warmup!
        for i in range(0, 1000):
            sess.run(['image_tensor:0'], test_data)
        # Benchmark!
        num_runs = 1000
        start = time.time()
        for i in range(0, num_runs):
            sess.run(['image_tensor:0'], test_data)
        elapsed = time.time() - start
        rt_ms = elapsed / num_runs * 1000.0
        # Show the result!
        print("Latency of model: {:.2f} ms.".format(rt_ms))

# Test the latency of the original model. 
benchmark(graph_def)

# Test the latency of the optimized model. 
with opt_spec:
    benchmark(optimized_model)

The test results are returned, as shown in the following example. The test results echo the information in the optimization report.

Latency of model: 530.26 ms.
Latency of model: 148.40 ms.

Extended information

When you call the blade.optimize method, you can specify the model to be optimized for the model parameter in multiple ways. You can specify a TensorFlow model in one of the following ways:

Specify a tf.GraphDef object.
Load a frozen.pb model from a .pb or .pbtxt file.
Specify the path in which the optimized model is stored.

In this example, the first way is used. A tf.GraphDef object in the memory is specified for the blade.optimize method. The following sample code provides examples of the other two ways:

Load a frozen.pb model from a .pb or .pbtxt file.

optimized_model, opt_spec, report = blade.optimize(
    './path/to/frozen_pb.pb',  # You can also load a .pbtxt file. 
    'o1',
    device_type='gpu',
)

Specify the path in which the optimized model is stored.

optimized_model, opt_spec, report = blade.optimize(
    './path/to/saved_model_directory/',
    'o1',
    device_type='gpu',
)

What to do next

After the model is optimized by using PAI-Blade, you can run the optimized model in Python or deploy the optimized model as a service in Elastic Algorithm Service (EAS) of PAI. PAI-Blade also provides an SDK for C++ to help you integrate the optimized model into your own application.