Optimize a PyTorch model

Machine Learning Platform for AI (PAI)-Blade allows you to optimize models in various ways. You need to only install wheel packages in your local environment. Then, you can optimize models by calling a Python method. This topic describes how to optimize a PyTorch model by using PAI-Blade. In this example, NVIDIA Tesla T4 GPUs are used.

Prerequisites

PyTorch is installed. The wheel packages of PAI-Blade are installed.
PyTorch models are trained. In this example, an open ResNet50 model is used.

Import PAI-Blade and other dependency libraries.

import os
import time
import torch
import torchvision.models as models
import blade

Load a ResNet50 model from the torchvision library. PAI-Blade supports only ScriptModules. Therefore, the ResNet50 model must be converted into a ScriptModule.

model = models.resnet50().float().cuda()  # Prepare the model. 
model = torch.jit.script(model).eval()    # Convert the model into a ScriptModule. 
dummy = torch.rand(1, 3, 224, 224).cuda() # Construct test data.

Call the blade.optimize method to optimize the ResNet50 model. The following sample code provides an example on how to optimize the model. If you have questions during model optimization, you can join the DingTalk group of PAI-Blade users and consult the technical support staff.
```
optimized_model, opt_spec, report = blade.optimize(
    model,                 # The model to be optimized. 
    'o1',                  # The optimization level. Valid values: o1 and o2. 
    device_type='gpu',     # The type of the device on which the model is run. Valid values: gpu and cpu. 
    test_data=[(dummy,)],  # The test data. The test data used for a PyTorch model is a list of tuples of tensors. 
)
```
The blade.optimize method returns the following objects:
- optimized_model: the optimized model. In this example, a torch.jit.ScriptModule object is returned.
- opt_spec: the external dependencies that are required to reproduce the optimization results. The external dependencies include the configuration information, environment variables, and resource files. You can execute the WITH statement in Python to make the external dependencies take effect.
- report: the optimization report, which can be directly displayed. For more information about the parameters in the optimization report, see Optimization report.
During model optimization, the optimization progress is displayed. The following sample code provides an example:
```
[Progress] 5%, phase: user_test_data_validation.
[Progress] 10%, phase: test_data_deduction.
[Progress] 15%, phase: CombinedSwitch_4.
[Progress] 95%, phase: model_collecting.
```

Display the optimization report.

print("Report: {}".format(report))

In the optimization report, you can view the optimization items that achieve optimization effects. The following sample code provides an example:

Report: {
  // ......
  "optimizations": [
    {
      "name": "PtTrtPassFp32",
      "status": "effective",
      "speedup": "1.50",     // The acceleration ratio. 
      "pre_run": "5.29 ms",  // The latency before acceleration. 
      "post_run": "3.54 ms"  // The latency after acceleration. 
    }
  ],
  // The end-to-end optimization results. 
  "overall": {
    "baseline": "5.30 ms",    // The latency of the original model. 
    "optimized": "3.59 ms",   // The latency of the optimized model. 
    "speedup": "1.48"         // The acceleration ratio. 
  },
  // ......
}

Compare the performance before and after model optimization.

@torch.no_grad()
def benchmark(model, inp):
    for i in range(100):
        model(inp)
    start = time.time()
    for i in range(200):
        model(inp)
    elapsed_ms = (time.time() - start) * 1000
    print("Latency: {:.2f}".format(elapsed_ms / 200))

# Measure the speed of the original model. 
benchmark(model, dummy)

# Measure the speed of the optimized model. 
benchmark(optimized_model, dummy)

Extended information

When you call the blade.optimize method, you can specify the model to be optimized for the model parameter in multiple ways. To optimize a PyTorch model, you can specify the model in one of the following ways:

Specify a torch.jit.ScriptModule object.
Load a torch.jit.ScriptModule object from a model file saved by using the torch.jit.save method.

In this example, a torch.jit.ScriptModule object in the memory is specified for the blade.optimize method. The following sample code provides an example on how to load a model from a model file:

optimized_model, opt_spec, report = blade.optimize(
    'path/to/torch_model.pt',
    'o1',
    device_type='gpu'
)

What to do next

After the model is optimized by using PAI-Blade, you can run the optimized model in Python or deploy the optimized model as a service in Elastic Algorithm Service (EAS) of PAI. PAI-Blade also provides an SDK for C++ to help you integrate the optimized model into your own application.