Apsara AI Accelerator (AIACC) is an AI accelerator developed by Alibaba Cloud. It consists of a training accelerator (AIACC-Training) and an inference accelerator (AIACC-Inference), and can accelerate mainstream AI frameworks such as TensorFlow, PyTorch, MxNet, and Caffe. AIACC can automatically configure the Python conda environment that contains AIACC-Training.

Background information

Conda is an open source cross-platform software package and environment management system. In the configured Python conda environment, you can install and switch the deep learning framework, and significantly improve the training performance by using AIACC-Training.

AIACC-Training has the following acceleration features:
  • Uses the gradient fusion communication method to support adaptive multi-stream fusion and adaptive gradient fusion and improves the training performance of bandwidth-intensive network models by up to 300%.
  • Uses the decentralized gradient-based negotiation mechanism to reduce the traffic of gradient-based negotiation on large-scale nodes by up to two orders of magnitude.
  • Uses the hierarchical Allreduce algorithm to support FP16 gradient compression and mixed precision compression.
  • Allows you to enable the NaN check during the training process and determine which gradient the NaN comes from when the graphics card architecture is SM60 or later.
  • Provides API extensions for MXNet to support data parallelism and model parallelism of the InsightFace type.
  • Provides deep optimization for RDMA networks.

Automatically install AIACC-Training

AIACC depends on the GPU driver, CUDA, and cuDNN. When you create a GPU-accelerated instance, select Auto-install GPU Driver and then select GPU Cloud Accelerator. After a GPU-accelerated instance is created, the Python conda environment that contains AIACC-Training V1.3.0 is configured based on the CUDA version that you selected. For more information about how to create a GPU-accelerated instance, see Create a compute optimized instance with GPU capabilities. Automatically install AIACC

The Python conda environment contains dependency packages such as AIACC-Training and OpenMPI, but does not contain deep learning frameworks. For more information about how to install a deep learning framework, see Install the deep learning framework.

The CUDA version determines the versions of supported deep learning frameworks. The following table describes the mappings.
CUDA version Default conda environment Version of the supported deep learning framework
CUDA 10.1 tf2.1_cu10.1_py36 Tensorflow 2.1
CUDA 10.0 tf1.15_tr1.4.0_mx1.5.0_cu10.0_py36
  • Tensorflow 1.15 + Pytorch 1.4.0 + MXNet 1.5.0
  • Tensorflow 1.14 + Pytorch 1.3.0 + MXNet 1.4.0
CUDA 9.0 tf1.12_tr1.3.0_mx1.5.0_cu9.0_py36 Tensorflow 1.12 + Pytorch 1.3.0 + MXNet 1.5.0

Install the deep learning framework

  1. Connect to a Linux instance from the console.
  2. View the automatically activated conda environment.
    View the version number before the username, as shown in the following figure. The version of the conda environment
    tf2.1_cu10.1_py36 indicates:
    • Tensorflow 2.1
    • CUDA 10.1
    • Python 3.6
  3. Optional:If you do not need to use the automatically activated conda environment, activate a different conda environment.
    1. Run the following command to view all conda environments:
      conda env list
      The following figure shows an example command output. View conda environments
    2. Run the following command to activate the required conda environment:
      conda activate [version number]
      The following figure shows an example command output. Activate the conda environment
  4. Run the following command to install the deep learning framework:
    install_frameworks.sh
    The following figure shows an example command output. Install the deep learning framework
  5. Test the demo.
    The TensorFlow demo is tested in this example.
    • For TensorFlow 2.1, perform the following operations:
      1. Decompress the demo test package.
        tar -xvf ali-perseus-demos.tgz
      2. Go to the directory of the TensorFlow demo.
        cd ali-perseus-demos/tensorflow2-examples
      3. Run the test script in the directory.

        Sample command:

        python tensorflow2_keras_mnist_perseus.py
        This demo uses the Modified National Institute of Standards and Technology (MNIST) dataset for training. This can ensure the same precision as your benchmark code while improving the training performance. The following figure shows an example training result. Training results of actual data
    • For TensorFlow 1.14, perform the following operations:
      1. Decompress the demo test package.
        tar -xvf ali-perseus-demos.tgz
      2. Go to the directory of the TensorFlow demo.
        cd ali-perseus-demos/tensorflow-benchmarks
      3. View the test command in README.txt.
      4. Go to the directory where the test script of the corresponding version resides.
        Sample command:
        cd benchmarks-tf1.14
      5. Modify and run the test command based on the number of GPUs that the specified instance type is equipped with.
        Sample command:
        mpirun --allow-run-as-root --bind-to none -np 1 -npernode 1  \
               --mca btl_tcp_if_include eth0  \
               --mca orte_keep_fqdn_hostnames t   \
               -x NCCL_SOCKET_IFNAME=eth0   \
               -x LD_LIBRARY_PATH   \
               ./config-fp16-tf.sh
        This demo uses synthetic data for training to test the training speed. The following figure shows an example training result. The training result of synthetic data