Use an SDK to deploy a TensorFlow model for inference - Platform For AI

Machine Learning Platform for AI (PAI)-Blade provides an SDK for C++ that you can use to deploy optimized models for inference. This topic describes how to use the SDK provided by PAI-Blade. A TensorFlow model is used in this topic.

Prerequisites

A TensorFlow model is optimized by using PAI-Blade. For more information, see Optimize a TensorFlow model.
An SDK is installed, and an authentication token is obtained. For more information, see Install PAI-Blade. In this example, GNU Compiler Collection (GCC) 4.8 is used. Therefore, an SDK that uses the Pre-CXX11 application binary interface (ABI) is required. To meet this requirement, the RPM package of V3.7.0 is used.

Note A model that is optimized by using PAI-Blade can be properly run only if the corresponding SDK is installed.

Prepare the environment

This topic describes how to use PAI-Blade SDK to deploy a TensorFlow model for inference. In this example, CentOS 7 is installed.

Prepare the server.
Prepare an Elastic Compute Service (ECS) instance that is configured with the following specifications:
- Instance type: ecs.gn6i-c4g1.xlarge, NVIDIA Tesla T4 GPU
- Operating system: CentOS 7.9 64-bit
- Device: CUDA 10.0
- GPU: Driver 440.64.00
- GPU computing acceleration package: cuDNN 7.6.5
Install GCC.
In this example, GCC 4.8 is used by default for CentOS. To install GCC 4.8, run the following command:
```
yum install -y gcc-c++
```

Install Python 3.

# Update pip. 
python3 -m pip install --upgrade pip

# Install virtualenv as a virtual environment in which you can install TensorFlow. 
pip3 install virtualenv==16.0
python3 -m virtualenv venv

# This step is important. Activate virtualenv. 
source venv/bin/activate

Install TensorFlow and download relevant libraries.
If you use a TensorFlow model for inference, two dynamic-link libraries libtensorflow_framework.so and libtensorflow_cc.so are required. In actual production scenarios, you must install a TensorFlow wheel that contains the libtensorflow_framework.so library. The TensorFlow wheel and the libtensorflow_cc.so library must use the same configurations, environment, and compiler version. For demonstration purposes, TensorFlow Community Edition and pre-compiled libraries are used in this example. Do not use the pre-compiled libraries in the production environment.
```
# Install TensorFlow. 
pip3 install tensorflow-gpu==1.15.0

# Download libtensorflow_cc.so. 
wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/sdk/tensorflow/libtensorflow_cc.so
```

Deploy a model for inference

To use PAI-Blade SDK to load and deploy an optimized model for inference, you can link the libraries in the SDK when you compile the inference code, without the need to modify the original code logic.

Prepare the model.
In this example, a sample model that is optimized is used for demonstration purposes. Run the following command to download the sample model. You can also use your own optimized model. For more information about how to optimize a model by using PAI-Blade, see Optimize a TensorFlow model.
```
wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/asr_frozen.pb
```
Download and view the inference code.
You can run a TensorFlow model optimized by using PAI-Blade in the same way as a regular TensorFlow model. You do not need to write extra code or set extra parameters. You can run the following command to download the inference code used in this example:
```
wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/demo/sdk/tensorflow/tf_sdk_demo.cc
```
View the downloaded tf_sdk_demo.cc file. The file specifies only the common inference logic for TensorFlow models and does not contain code in relation to PAI-Blade.
Compile the inference code.
You need to only link the libtf_blade.so file that resides in the /usr/local/lib directory to the inference code before you can run a model that is optimized by using PAI-Blade. Run the following commands to compile the code:
```
# Obtain the compiler flags of TensorFlow. 
TF_COMPILE_FLAGS=$(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))')

# Obtain the linker flags of TensorFlow. 
TF_LD_FLAGS=$(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))')

# The libtensorflow_cc.so file is in the current directory. 
TF_CC_PATH='.'

g++ -std=c++11 tf_sdk_demo.cc \
    ${TF_COMPILE_FLAGS} \
    ${TF_LD_FLAGS} \
    -L ${TF_CC_PATH} \
    -L /usr/local/lib \
    -ltensorflow_cc \
    -ltf_blade \
    -ltao_ops \
    -o demo_cpp_sdk.bin
```
You can modify the following parameters based on your business requirements:
- tf_sdk_demo.cc: the name of the file that specifies the inference code.
- /usr/local/lib: the installation path of the SDK. In most cases, you do not need to modify this parameter.
- demo_cpp_sdk.bin: the name of the executable program generated after compilation.
Note Compared with a regular TensorFlow model, the libtf_blade.so and libtao_ops.so files are additionally linked to the inference code in this example. The libtao_ops.so file contains optimization operators.
Run the model for inference on a local device.
You can use the compiled executable program demo_cpp_sdk.bin to load and run the sample model named asr_frozen.pb. The sample model has been optimized by using PAI-Blade. To do so, run the following commands:
```
# Required. Contact the PAI service team to obtain the value. 
export BLADE_REGION=<region>
# Required. Contact the PAI service team to obtain the value. 
export BLADE_TOKEN=<token>
TF_LD_FLAGS=$(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))')
TF_FRAMEWORK_PATH=`echo $TF_LD_FLAGS | awk '{print $1}' | sed "s/-L//g"`
LD_LIBRARY_PATH=${TF_FRAMEWORK_PATH}:${TF_CC_PATH}:/usr/local/lib:${LD_LIBRARY_PATH} ./demo_cpp_sdk.bin asr_frozen.pb
```
Modify the following parameters based on your business requirements:
- <region>: the region in which you use PAI-Blade. You can join the DingTalk group of PAI-Blade users to obtain the regions in which PAI-Blade can be used. For more information about the QR code of the DingTalk group, see Obtain an authentication token.
- <token>: the authentication token that is required to use PAI-Blade. You can join the DingTalk group of PAI-Blade users to obtain the authentication token. For more information about the QR code of the DingTalk group, see Obtain an authentication token.
- /usr/local/lib: the installation path of the SDK. In most cases, you do not need to modify this parameter.
- demo_cpp_sdk.bin: the executable program that is compiled in the previous step.
- asr_frozen.pb: the TensorFlow model that is optimized by using PAI-Blade. In this example, the sample model that is downloaded in Step 1 is used.
If the system displays information similar to the following output, the model is being run.
```
...
2020-11-20 16:41:41.263192: I demo_cpp_sdk.cpp:96] --- Execution uses: 41.995 ms
2020-11-20 16:41:41.305550: I demo_cpp_sdk.cpp:96] --- Execution uses: 42.334 ms
2020-11-20 16:41:41.347772: I demo_cpp_sdk.cpp:96] --- Execution uses: 42.195 ms
2020-11-20 16:41:41.390894: I demo_cpp_sdk.cpp:96] --- Execution uses: 43.09 ms
2020-11-20 16:41:41.434968: I demo_cpp_sdk.cpp:96] --- Execution uses: 44.047 ms
...
```