Apsara AI Accelerator (AIACC) is an artificial intelligence (AI) accelerator developed
by Alibaba Cloud. It consists of a training accelerator (AIACC-Training) and an inference
accelerator (AIACC-Inference). AIACC-Training can accelerate major AI computing frameworks
such as TensorFlow, PyTorch, MxNet, and Caffe to achieve significant gains in training
performance.
Background information
Conda is an open source package management system and environment management system
that can run on different platforms. Miniconda is a small installer of conda. When
you create a GPU-accelerated instance, you can configure a conda environment that
contains AIACC-Training to be installed automatically. You can use Miniconda to select
a conda environment. Then, in the conda environment, you can install and switch between
deep learning frameworks and use AIACC-Training to improve training performance.
AIACC-Training has the following acceleration features:
- Gradient fusion communication: supports adaptive multi-stream fusion and adaptive
gradient fusion to improve the training performance of bandwidth-intensive network
models by up to 300%.
- Decentralized gradient-based negotiation: reduces the traffic of gradient-based negotiation
on large-scale nodes by up to two orders of magnitude.
- Hierarchical Allreduce algorithm: supports FP16 gradient compression and mixed precision
compression.
- NaN check: You can enable NaN check during the training process and determine the
gradient from which the NaN comes on an SM60 or later platform.
- API extensions for MXNet: support data parallelism and model parallelism of the InsightFace
type.
- Deep optimization for remote direct memory access (RDMA) networks.
Automatically install AIACC-Training
AIACC-Training depends on the GPU driver, CUDA, and cuDNN. When you create a GPU-accelerated
instance, select
Auto-install GPU Driver and then select
Auto-install AIACC-Training. For more information about how to create a GPU-accelerated instance, see
Create an NVIDIA GPU-accelerated instance.

Conda environments contain dependency packages such as AIACC-Training and OpenMPI,
but do not contain deep learning frameworks. For more information about how to install
a deep learning framework, see Select a conda environment and install a deep learning framework.
CUDA versions determine the versions of deep learning frameworks that can be installed.
The following table describes the mappings between CUDA versions and versions of deep
learning frameworks.
CUDA version |
Default conda environment |
Version of the deep learning framework that can be installed |
CUDA 10.1 |
tf2.1_cu10.1_py36 |
Tensorflow 2.1 |
CUDA 10.0 |
tf1.15_tr1.4.0_mx1.5.0_cu10.0_py36 |
- Tensorflow 1.15 + Pytorch 1.4.0 + MXNet 1.5.0
- Tensorflow 1.14 + Pytorch 1.3.0 + MXNet 1.4.0
|
CUDA 9.0 |
tf1.12_tr1.3.0_mx1.5.0_cu9.0_py36 |
Tensorflow 1.12 + Pytorch 1.3.0 + MXNet 1.5.0 |
Select a conda environment and install a deep learning framework
- Connect to the instance. For more information, see Overview.
- Select a conda environment.
- Initialize Miniconda.
. /root/miniconda/etc/profile.d/conda.sh
- View all conda environments.
conda env list
The following figure shows an example command output.

- Select a conda environment.
conda activate [environments_name]
The following figure shows an example command output.

aiacct_tf2.1_cu10.1_py36 indicates:
- Tensorflow 2.1
- CUDA 10.1
- Python 3.6
- Install a deep learning framework.
install_frameworks.sh
The
install_frameworks.sh script includes the commands used to install deep learning frameworks that are compatible
with the selected conda environment. The following figure shows a sample script.

The following figure shows an example script output.

- Test the demo.
By default, the ali-perseus-demos.tgz demo file is located in the /root directory. In this example, the TensorFlow demo is tested.
- For TensorFlow 2.1, perform the following operations:
- Decompress the demo test package.
tar -xvf ali-perseus-demos.tgz
- Go to the directory of the TensorFlow demo.
cd ali-perseus-demos/tensorflow2-examples
- Run the test script in the directory.
Sample command:
python tensorflow2_keras_mnist_perseus.py
This demo uses Modified National Institute of Standards and Technology (MNIST) datasets
for training to improve training performance and ensure that the same level of precision
is achieved as that of your benchmark code. The following figure shows an example
training result.

- For TensorFlow 1.14, perform the following operations:
- Decompress the demo test package.
tar -xvf ali-perseus-demos.tgz
- Go to the directory of the TensorFlow demo.
cd ali-perseus-demos/tensorflow-benchmarks
- View the test command in README.txt.
- Go to the directory where the test script of the corresponding version resides.
Sample command:
cd benchmarks-tf1.14
- Modify and run the test command based on the number of GPUs with which the specified
instance type is equipped.
Sample command:
mpirun --allow-run-as-root --bind-to none -np 1 -npernode 1 \
--mca btl_tcp_if_include eth0 \
--mca orte_keep_fqdn_hostnames t \
-x NCCL_SOCKET_IFNAME=eth0 \
-x LD_LIBRARY_PATH \
./config-fp16-tf.sh
This demo uses synthetic data for training to test the training speed. The following
figure shows an example training result.

Delete Miniconda
You can delete Miniconda if you no longer need AIACC-Training. By default, the root
user can install and delete Miniconda.
- Delete the miniconda folder.
- Delete relevant environment variables and output.
- Modify the /root/.bashrc file and comment out the environment variables and output related to Miniconda and
AIACC-Training, as shown in the following figure.
- Make the changes to the environment variables take effect.