All Products
Search
Document Center

:What is AIACC?

Last Updated:Dec 15, 2023

Apsara AI Accelerator (AIACC) is an AI acceleration engine developed based on Alibaba Cloud IaaS resources. AIACC is used to optimize models based on mainstream AI computing frameworks, accelerate deep learning applications, and improve model training and inference performance. You can use AIACC together with the resource management tool FastGPU to create AI computing tasks to improve research and development efficiency.

Use AIACC to accelerate an application in deep learning scenarios

AIACC consists of a training accelerator AIACC-Training and an inference accelerator AIACC-Inference. The following infrastructure shows how to use AIACC in deep learning scenarios.

image

Layer

Description

Resource layer

(Alibaba Cloud IaaS resources)

Uses Alibaba Cloud IaaS resources at the resource layer. The resources can be enabled on demand to meet the requirements of large-scale GPU clusters for elastic computing, storage, and network resources.

Scheduling layer

(AI acceleration resource management)

Uses FastGPU to create AI computing tasks and manage the resources of large-scale GPU clusters at the scheduling layer. For more information, see What is FastGPU?

Framework layer

(AI acceleration engine)

Uses AIACC to accelerate multiple frameworks in a centralized manner at the framework layer. AIACC uses the performance optimization technology based on data communication. When AIACC implements distributed training, AIACC must exchange data between machines and between GPUs to ensure the acceleration effect. For more information, see AIACC-Training and AIACC-Inference.

Application layer

(AI acceleration solution)

Implements deep learning in various application scenarios such as image recognition, object detection, video recognition, click-through rate (CTR) prediction, natural language understanding, and speech recognition. AIACC is used to implement centralized acceleration for multiple frameworks at the framework layer. You need to only make minimal modifications to the code to improve application performance.

Benefits

AIACC provides the following benefits:

  • AIACC is based on Alibaba Cloud IaaS resources that are stable and easy to use.

  • You can use AIACC together with FastGPU to create training tasks. This reduces the time to create and configure resources, and improves GPU resource utilization to reduce costs.

  • AIACC supports centralized acceleration for multiple frameworks. This allows smooth collaboration between the frameworks and improves training and inference performance.

  • AI algorithms are developed in shorter verification cycles, which ensures faster model iteration. This makes research and development more efficient.

AIACC-Training

AIACC-Training (formerly known as Ali-Perseus or Perseus-Training) is developed and maintained by the Alibaba Cloud AIACC team based on Alibaba Cloud IaaS resources to achieve efficient acceleration for AI distributed training. AIACC-Training is designed to be compatible with open source systems and to accelerate your distributed training tasks without manual intervention.

The following figure shows the architecture of AIACC-Training.

image

Layer

Description

Mainstream AI computing framework

AIACC-Training allows you to accelerate distributed training tasks by using models that are built based on mainstream AI computing frameworks such as TensorFlow, PyTorch, MXNet, and Caffe

Interface layer

The interface layer provides unified interfaces and components that are used to interact and communicate with AIACC-Training system. The interface layer includes unified communication interface classes, unified basic component classes, unified basic communication classes, and unified gradient entry layer. AIACC-Training is compatible with the APIs of PyTorch Distributed Data Parallel (DDP) and Horovod and can accelerate the performance of native distributed training without manual intervention.

Underlying acceleration layer

The underlying acceleration layer uses high-performance distributed communication libraries to implement model performance optimization in a unified manner. It works with gradient negotiation optimization, gradient fusion optimization, gradient compression optimization, and communication operation optimization. AIACC-Training optimizes the features of the Alibaba Cloud network infrastructure and the policy of AI data-parallel distributed training to achieve significant performance improvements.

The following content lists the acceleration features of AIACC-Training:

  • Gradient fusion communication allows you to use adaptive multi-stream fusion and adaptive gradient fusion to improve the training performance of bandwidth-intensive network models by 50% to 300%.

  • Highly optimized online and offline gradient-based negotiation reduces the overhead of gradient-based negotiation on large-scale nodes by up to two orders of magnitude.

  • Hierarchical Allreduce algorithm supports FP16 gradient compression and mixed precision compression.

  • Gradient compression based on gossip is supported.

  • Gradient communication optimization based on multistep is supported.

  • Deep optimization for remote direct memory access (RDMA) and elastic RDMA (eRDMA) networks.

  • API extensions for MXNet support data parallelism and model parallelism of the InsightFace type and enhance the performance of Synchronized Batch Normalization (SyncBN) operators.

  • Group communication operators provided by GroupComm allow you to build complex training tasks that implement the communication in both data parallelism and model parallelism.

AIACC-Training provides significant benefits in training speed and cost. For more information about test data, visit Stanford DAWNBench.

The following table provides typical optimization cases of distributed training.

Customer

Model

Framework

Number of GPUs

Training speed increase

An AI chip manufacturer

Image classification

MXNet

256

100%

An AI chip manufacturer

Face Service

MXNet

256

200%

A car manufacturer

FaceNet

PyTorch

32

100%

A mobile phone manufacturer

BERT

TensorFlow

32

30%

A mobile phone manufacturer

GPT2

PyTorch

32

30%

An AI company

Faster-RCNN

MXNet, Horovod, and BytePS

128

30%

An AI company

InsightFace

MXNet, Horovod, and BytePS

128

200%

An online education platform

ESPnet

PyTorch-DP

16

30%

An online education platform

ESPnet2

PyTorch-DDP

16

30%

An online education platform

CTR

PyTorch

32

80%

An online education platform

OCR

PyTorch

32

30%

A mobile phone manufacturer

Image classification

PyTorch

128

25%

A mobile phone manufacturer

MAE

PyTorch

32

30%

A research institute

GPT2

PyTorch+Megatron

32

30%

A social media platform

MMDetection2

PyTorch

32

30%

A financial intelligence company

InsightFace

PyTorch

32

50%

A mobile phone manufacturer

Detection2

PyTorch

64

25%

A visual team

insightface

MXNet

64

50%

A game vendor

ResNet

PyTorch

32

30%

A city brain project

InsightFace

MXNet

16

42%

A pharmaceutical technology company

Autoencoder

PyTorch

32

30%

An autonomous driving company

swin-transformer

PyTorch

32

70%

AIACC-Inference

AIACC-Inference provides significant benefits in inference speed and cost. For more information about test data, visit Stanford DAWNBench.

The high-performance operator acceleration library finds the optimal operator among the self-developed high-performance operators and NVIDIA operators. Then, the library generates a list of high-performance operators for the acceleration engine to split and pass subgraphs.

For more information about how to install and use AIACC-Inference, see the following topics: