Usage scenarios of Elastic GPU Service and DeepGPU - Elastic GPU Service

Elastic GPU Service is suitable for scenarios such as video transcoding, image rendering, AI training, AI inference, and cloud graphics workstations. DeepGPU provides enhanced GPU computing capabilities and is suitable for AI training and AI inference. This topic describes the scenarios of Elastic GPU Service and DeepGPU.

Scenarios of Elastic GPU Service

Transcoding for real-time videos
During the Double 11 Global Shopping Festival gala in 2019, Elastic GPU Service was used to support video transcoding at resolutions of 1080P, 2K, and 4K in real-time. Elastic GPU Service transcoded videos with high image quality and definition in real time while consuming minimal bandwidth. The following section provides details:
- Elastic GPU Service supported high-concurrency real-time video streaming of more than 5,000 channels, which gradually increased to a peak of 6,200 channels per minute, and smoothly handled the traffic peak.
- Elastic GPU Service also took part in tasks such as generating real-time rendering images of households. A large number of ECS Bare Metal instances of the ebmgn6v instance type with powerful computing capacity are provided for the first time to support Taobao renderers. The instances improved performance by dozens of times, achieved real-time rendering in seconds, and rendered more than 5,000 household images in total.
AI training
The GPU-accelerated computed optimized instance families gn6v and gn6e provide excellent general-purpose GPU acceleration capabilities and are suitable for providing acceleration engines for deep learning. The following section provides details:
- The gn6v and gn6e instance families use NVIDIA V100 GPU processors with 16 GB and 32 GB of memory respectively and can provide mixed-precision computing capacity of up to 1,000 TFLOPS per node.
- The gn6v and gn6e instances can be integrated into an elastic computing ecosystem to provide solutions that are suitable for online and offline computing scenarios.
- You can use the instances together with container services to simplify deployment and O&M and schedule resources.
AI inference
The gn6i instance family provides excellent AI inference capabilities that can meet computing requirements in deep learning scenarios, especially in AI inference. The following section provides details:
- The gn6i instances use NVIDIA Tesla T4 GPU processors to provide a single-precision floating-point computing capacity of up to 8.1 TFLOPS and int8 fixed-point processing capabilities of up to 130 TOPS. The instances also support mixed precision.
- Additionally, a single processor consumes only 75 W of power while maintaining a high-performance output.
- The gn6i instances can be integrated into an elastic computing ecosystem to provide solutions that are suitable for online and offline computing scenarios.
- You can use the instaces together with container services to simplify deployment and O&M and schedule resources.
- Alibaba Cloud Marketplace provides a gn6i instance image that uses an NVIDIA GPU driver and a deep learning framework to simplify development.
Cloud graphics workstations
The gn6i instances use NVIDIA Tesla T4 GPU accelerators based on the Turing architecture and provide excellent graphics computing capacity. You can use gn6i instances together with WUYING Workspace to provide cloud graphics workstation services. The services can be used in scenarios such as film and television animation design, industrial design, medical imaging, and high-performance computing result presentation.

Scenarios of DeepGPU

DeepGPU include the following components: Apsara AI Accelerator (AIACC, that includes AIACC-Training and AIACC-Inference), AIACC 2.0-AIACC Communication Speeding (AIACC-ACSpeed), AIACC Graph Speeding (AIACC-AGSpeed), FastGPU, and cGPU. You can use DeepGPU in AI training and AI inference scenarios. The following section provides details:

AI training

AIACC is suitable for AI training and AI inference scenarios. AIACC-ACSpeed (ACSpeed) and AIACC-AGSpeed (AGSpeed) are suitable for AI trainings that are based on the PyTorch framework and can provide optimization for the PyTorch framework.

The following table describes the AI training scenarios of AIACC.

Scenario	Applicable model	Storage
Image classification and image recognition	MXNet models	Cloud Paralleled File System (CPFS)
CTR prediction	Wide&Deep models of TensorFlow	Hadoop Distributed File System (HDFS)
Natural Language Processing (NLP)	Transformer and BERT models of TensorFlow	CPFS

The following table describes the AI training scenarios of ACSpeed.

Scenario	Applicable model	Storage
Image classification and image recognition	Neural network models, such as ResNet and VGG-16, and AIGC models such as Stable Diffusion	CPFS
CTR prediction	Wide&Deep model	HDFS
NLP	Transformer and BERT models	CPFS
Pretraining and fine-tuning of large models	Large language models (LLMs), such as Megatron-LM and DeepSpeed	CPFS

The following table describes the AI training scenarios of AGSpeed.
Scenario
Applicable model
Image classification
ResNet and MobileNet models
Image segmentation
Unet3D models
NLP
BERT, GPT-2, and T5 models

AI inference

AIACC is suitable for AI inference scenarios. The following table describes the AI inference scenarios of AIACC.

Scenario	Applicable model	Specification	Optimization
Video Ultra HD inference	Ultra HD models	T4 GPU	The performance is improved by 1.7 times based on the following optimization: Video decoding is ported to the GPU. Preprocessing and postprocessing are ported to the GPU. The data set size is automatically obtained from a single operation. Deep optimization of convolution.
Online inference of image synthesis	GAN models	T4 GPU	The performance is improved by 3 times based on the following optimization: Preprocessing and postprocessing are ported to the GPU. The data set size is automatically obtained from a single operation. Deep optimization of convolution.
Prediction and inference of CTR	Wide&Deep model	M40 GPU	The performance is improved by 5.1 times based on the following optimization: Pipeline optimization. Model splitting. The child models are separately optimized.
NLP inference	BERT models	T4 GPU	The performance is improved by 2.3 times based on the following optimization: Pipeline optimization of preprocessing and postprocessing. The data set size is automatically obtained from a single operation. Deep optimization of Kernel.

Scenario	Applicable model
Image classification	ResNet and MobileNet models
Image segmentation	Unet3D models
NLP	BERT, GPT-2, and T5 models

Scenarios of Elastic GPU Service

Transcoding for real-time videos

AI training

AI inference

Cloud graphics workstations

Scenarios of DeepGPU

AI training

AI inference