AIACC-Inference can optimize models based on exportable frameworks in the Open Neural Network Exchange (ONNX) format to improve in inference performance. This topic describes how to manually install AIACC-Inference in ONNX version and test the demo.


  • An Alibaba Cloud GPU-accelerated instance is created.
    • Instance specification: NVIDIA P100, V100, or T4 GPU
      Note For more information, see Instance family.
    • The image used by the instance: Ubuntu 16.04 LTS or CentOS 7.x
  • The GPU-accelerated instance is installed.
    • Python 3.6
    • The version of CUDA 10.0, 10.2, or 11.0
    • The version of cuDNN 7.6 or later
    • TensorRT (CUDA 10.0) or (CUDA 10.2 or 11.0)

Background information

ONNX is an open source format in which trained models are stored. You can convert data of models of different frameworks to the ONNX format. This faciliates the test for models of different frameworks in the same environment.

AIACC-Inference in ONNX version cuts computational graphs of models, fuses layers, and makes high-performance operational to improve inference performance. The optimization software interface of the ONNX model provided by AIACC-Inference in ONNX version allows you to infer and optimize deep learning models that are developed based on PyTorch, MXNet, and other frameworks that can export the ONNX model.

AIACC-Inference in ONNX version provides the following model optimization options: FP32 and FP16. The FP16 model can use the Tensor Cores hardware of NVIDIA Volta and the Turing architecture to further improve the inference performance on V100 and T4 GPUs.


The example executes inference tasks and randomly generates and classifies images based on a ResNet50 model. This reduces the amount of time required for a task to perform inference from 6.4 ms to less than 1.5 ms.

Environment configurations:
  • Instance specification: ecs.gn6i-c4g1.xlarge that has the T4 GPU
  • The image used by the instance: 64-bit Ubuntu 16.04
  • Python 3.6.12
  • CUDA 10.0.130
  • cuDNN 7.6.5
  • TensorRT

Step 1: Install the AIACC-Inference package in ONNX version

  1. Connect to the instance. For more information, see Connect to a Linux instance by using Workbench.
  2. Download the package.
    In this example, the package name is
  3. Install the UNZIP tool and decompress the package.
    apt-get install unzip
  4. Install the package.
    pip3 install aiacc_inference_onnx_latest/cuda10.0-TensorRT-*-cp36-cp36m-linux_x86_64.whl
    If the Command "python egg_info" failed with error code 1 in /tmp/pip-build-uihnmvc2/onnx/ error is reported when you install the package, update Setuptools and pip.
    pip3 install --upgrade setuptools
    python3 -m pip install --upgrade pip

Step 2: Use Python to perform inference

  1. Download the ResNet50 model.
    1. Download the file package of the ResNet50 model.
    2. Decompress the file package and go to the directory where the model file is located.
      tar -xvf resnet50-v1-7.tar.gz && cd resnet50v1/

      View the files in the directory. resnet50-v1-7.onnx is the model file used in this example.

  2. Create a file and add the sample code.
    import numpy as np
    import aiaccix as aix
    import time
    import os
    # %%init session
    sess = aix.InferenceSession("./resnet50-v1-7.onnx")
    input_name = sess.get_inputs()[0].name
    input_shape = sess.get_inputs()[0].shape
    print("input_name is %s, input_shape is %s"%(input_name,str(input_shape)))
    # %%test image, image size == [1,3,224,224]
    input_image = np.random.random((1,3,224,224)).astype(np.float32)
    for _ in range(10):
      pred_onnx =, {input_name: input_image})
    #test the inference time of input_image
    start = time.time()
    for _ in range(1000):
      pred_onnx =, {input_name: input_image})
    end = time.time()
    print('shape is ',input_image.shape,', delta time: ',end-start,' ms')
  3. Run the code in to complete inference.
  4. View the inference result.
    The following figure shows that the inference result takes about 1.48 ms. python-onnx