Benefits of Elastic GPU Service and DeepGPU - Elastic GPU Service

Elastic GPU Service provides extensive service coverage, superior computing power and network performance, and flexible purchase methods. DeepGPU is a free toolkit collection that provides enhanced GPU computing capabilities. This topic describes the benefits of Elastic GPU Service and DeepGPU.

Elastic GPU Service

Extensive service coverage
Elastic GPU Service supports large-scale deployment to allow you to deploy GPU-accelerated instances in 17 regions around the world. Elastic GPU Service also provides flexible delivery methods such as auto provisioning and auto scaling to help you meet the sudden demands of your business.
Superior computing power
Elastic GPU Service provides GPUs that have superior computing power. When you use Elastic GPU Service together with a high-performance CPU platform, a GPU-accelerated instance can provide mixed-precision computing performance of up to 1,000 trillion floating point operations per second (TFLOPS).
Superior network performance
GPU-accelerated instances use virtual private clouds (VPCs) that support up to 4.5 million packets per second (Mpps) and 32 Gbit/s of internal bandwidth. You can use GPU-accelerated instances together with Super Computing Cluster (SCC) to provide a remote direct memory access (RDMA) network that have up to 50 Gbit/s of bandwidth between nodes. This allows you to meet the low-latency and high-bandwidth requirements when data is transmitted between nodes.
Flexible purchase methods
Elastic GPU Service supports various billing methods, such as the subscription and pay-as-you-go billing methods, preemptible instances, reserved instances, and storage capacity units (SCUs). To prevent inefficient use of resources, select a billing method based on your business requirements.
Note
You cannot purchase reserved instances for specific GPU-accelerated instance families. For more information, see Attributes.

DeepGPU

DeepGPU include the following components: Apsara AI Accelerator (AIACC, that includes AIACC-Training and AIACC-Inference), AIACC 2.0-AIACC Communication Speeding (AIACC-ACSpeed), AIACC Graph Speeding (AIACC-AGSpeed), FastGPU, and cGPU.

AIACC

AIACC is an AI accelerator that is developed by Alibaba Cloud. AIACC has significant performance advantages in training and inference scenarios, which can help you improve computing efficiency and reduce usage costs.

Centralized acceleration
AIACC facilitates centralized acceleration of various AI frameworks in TensorFlow, Caffe, MXNet, and PyTorch.
In-depth performance optimization
ACSpeed provides in-depth performance optimization based on basic IaaS resources of Alibaba Cloud, such as GPUs, CPUs, networks, and I/O.
Auto scaling
AIACC supports quick construction and automatic scaling based on basic IaaS resources.
Compatibility with open source frameworks
ACSpeed is lightweight, convenient, and compatible with open source frameworks. You can directly import algorithms of models from open source frameworks with little to no code changes.

AIACC-ACSpeed (ACSpeed)

ACSpeed is an in-house AI training accelerator that is developed by Alibaba Cloud that delivers significant performance improvement for distributed model training. You can use ACSpeed to optimize the performance of jobs that involve distributed communication. This improves computing efficiency and reduces costs.

Customized optimization
ACSpeed provides customizable optimization capabilities for popular PyTorch frameworks. The capabilities are applicable to all model training scenarios.
Centralized acceleration
ACSpeed leverages the capabilities of the nccl-plugin component to support centralized acceleration for multiple AI frameworks such as TensorFlow, Caffee, and MXNet.
In-depth performance optimization
ACSpeed provides in-depth performance optimization based on basic IaaS resources of Alibaba Cloud, such as GPUs, CPUs, networks, and I/O.
Auto scaling
ACSpeed can be quickly deployed and automatically scaled on top of the IaaS resources of Alibaba Cloud and PyTorch-native features.
Compatibility with open source frameworks
ACSpeed is lightweight, convenient, and compatible with open source frameworks. You can directly import algorithms of models from open source frameworks with little to no code changes.

For more information about the performance advantages of ACSpeed in model training, see AIACC-ACSpeed performance data.

AIACC-AGSpeed (AGSpeed)

AGSpeed is an optimizing compiler for AI training that is developed by Alibaba Cloud. AGSpeed is designed to optimize training performance and resolve bottleneck issues of jobs that are built on the PyTorch framework. AGSpeed is proven to deliver significant performance advantages, and can be used to improve training efficiency and reduce costs.

Customized optimization
AGSpeed provides customizable optimization capabilities for popular PyTorch frameworks. The capabilities are applicable to all model training scenarios.
Imperceptible acceleration
The standard industry approach that is used to optimize training jobs is by using TorchScript to obtain the static computational graph, which is then optimized by the backend compiler in the PyTorch framework. However, this approach is not accurate and imperceptible. Compared to the PyTorch-native frontend TorchScript, AGSpeed can deliver imperceptible acceleration for training jobs.
In-depth performance optimization
ACSpeed provides in-depth performance optimization based on basic IaaS resources of Alibaba Cloud, such as GPUs, CPUs, networks, and I/O.
Compatibility with open source frameworks
ACSpeed is lightweight, convenient, and compatible with open source frameworks. You can directly import algorithms of models from open source frameworks with little to no code changes.

FastGPU

FastGPU allows you to build AI computing tasks without the need to deploy computing, storage, or network resources at the IaaS layer. You can quickly deploy clusters after you make simple configurations. FastGPU can help you save time and reduce costs.

High efficiency
- Quickly deploy clusters. You do not need to deploy resources such as computing, storage, and network resources at the IaaS layer. The time that is required to deploy a cluster is reduced to 5 minutes.
- Manage tasks and resources in a convenient and fast manner by using interfaces and command lines.
Cost-effectiveness
- You can purchase a GPU-accelerated instance after the dataset completes the preparations and triggers a training or inference task. After the training or inference task ends, the GPU-accelerated instance is automatically released. FastGPU can synchronize the resource lifecycle with tasks to reduce costs.
- Preemptible instances are supported.
Ease of use
- All resources are deployed at the IaaS layer. The resources are accessible and can be debugged.
- FastGPU meets the requirements of visualization and log management and ensures that tasks are traceable.

cGPU

cGPU of GPU allows you to flexibly allocate resources and isolate your business. You can use cGPU to reduce costs and improve security.

Cost-effectiveness
With the continuous development of graphics cards and the progress of semiconductor manufacturing, computing power becomes stronger, but the price of a single GPU increases. In most business scenarios, an AI application does not require a whole GPU. cGPU allows multiple containers to share one GPU. This way, your business is isolated for security. GPU utilization is improved, and costs are reduced.
Flexible allocation of resources
cGPU allows you to flexibly allocate physical GPU resources based on your business requirements.
- You can flexibly allocate resources by video memory or computing power.
- cGPU also allows you to flexibly configure policies to allocate computing power. You can switch between the scheduling policies in real time to meet the requirements of business peaks and off-peaks.