AIACC-Training can optimize models based on mainstream AI computing frameworks including TensorFlow, PyTorch, MXNet, and Caffe, which can improve training performance. This topic describes how to manually install AIACC-Training for TensorFlow, PyTorch, and MXNet frameworks.

Prerequisites

An Alibaba Cloud GPU-accelerated instance is created. The GPU-accelerated instance meets the following requirements:
  • The CentOS 7.x or Ubuntu 16.04 image is used by the instance.
  • CUDA 10.1, 10.0, or 9.0 is installed.
  • Python 3.6 or 2.7 is installed.
  • An AI computing framework is installed.

Background information

AIACC-Training is deeply optimized to execute training tasks on Alibaba Cloud IaaS resources, and uses the same set of core code to accelerate training tasks based on AI mainstream computing frameworks.

Procedure

  1. Connect to the instance. For more information, see Connect to a Linux instance by using Workbench.
  2. Install the common dependency.
    • CentOS 7.x
      # install OpenMPI4
      wget https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/openmpi-4.0.3-1.el7.x86_64.rpm
      rpm -Uivh openmpi-4.0.3-1.el7.x86_64.rpm
    • Ubuntu 16.04
      # install OpenMPI4
      wget https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/openmpi_4.0.3-1_amd64.deb
      dpkg -i openmpi_4.0.3-1_amd64.deb
  3. Install the AI computing framework support package.
    The support package for the AI computing framework is in the WHL format. The following table describes the download addresses you can obtain based on the combination of the version of CUDA, frameworks, and Python.
    Framework Version Download address format Example
    TensorFlow
    • CUDA 9.0+TensorFlow 1.12
    • CUDA 10.0+TensorFlow 1.14
    • CUDA 10.0+TensorFlow 1.15
    • CUDA 10.1+TensorFlow 2.1
    The format of download addresses for Python 3 and Python 2 is the same: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_tensorflow-1.3.2%2B${TENSORFLOW_VER}-py2.py3-none-manylinux1_x86_64.whl If the environment is CUDA10.0 + TensorFlow 1.14: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/cuda100/perseus_tensorflow-1.3.2%2B1.14-py2.py3-none-manylinux1_x86_64.whl
    PyTorch
    • CUDA 9.0+PyTorch 1.2
    • CUDA 10.0+PyTorch 1.3
    • CUDA 10.0+PyTorch 1.4
    • Python 3.6: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_torch-1.3.2.post2%2B${TORCH_VER}-cp36-cp36m-linux_x86_64.whl
    • Python 2.7: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_torch-1.3.2.post2%2B${TORCH_VER}-cp27-cp27mu-linux_x86_64.whl
    • If the environment is CUDA 10.0 + PyTorch 1.3 + Python 3.6: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_torch-1.3.2.post2%2B${TORCH_VER}-cp27-cp27mu-linux_x86_64.whl
    • If the environment is CUDA 10.0 + PyTorch 1.3 + Python 2.7: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/cuda100/perseus_torch-1.3.2%2B1.3-cp27-cp27mu-linux_x86_64.whl
    MXNet
    • CUDA 9.0+MXNet 1.4
    • CUDA 9.0+MXNet 1.5
    • CUDA 10.0+MXNet 1.4
    • CUDA 10.0+MXNet 1.5
    The format of download addresses for Python 3 and Python 2 is the same: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_mxnet-1.3.2%2B${MXNET_VER}-py2.py3-none-manylinux1_x86_64.whl If the environment is CUDA 10.0 + MXNet 1.5: https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/${CUDA_VER}/perseus_mxnet-1.3.2%2B${MXNET_VER}-py2.py3-none-manylinux1_x86_64.whl
    The following code provides an example on command to download and install the framework support package corresponding to CUDA10.0 + TensorFlow 1.14:
    wget https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/cuda100/perseus_tensorflow-1.3.2%2B1.14-py2.py3-none-manylinux1_x86_64.whl
    pip3 install perseus_tensorflow-1.3.2%2B1.14-py2.py3-none-manylinux1_x86_64.whl
    Note To use AIACC-Training to accelerate training, you need only to make minimal modifications to the model code. For more information, see Adapt the model code to AIACC-Training.