Describes the scenarios for Elastic GPU Service and DeepGPU - Elastic GPU Service

Elastic GPU Service supports video transcoding, image rendering, AI training, AI inference, and cloud graphics workstations. DeepGPU extends these capabilities with enhanced GPU acceleration for AI training and AI inference.

Elastic GPU Service use cases

Video transcoding

Applicable instance type: ebmgn6v (ECS Bare Metal)

GPU-accelerated transcoding handles high-concurrency, real-time video streams at 1080P, 2K, and 4K resolutions while keeping bandwidth consumption low.

During the 2019 Double 11 Global Shopping Festival gala, Elastic GPU Service:

Supported real-time video streaming across more than 5,000 concurrent channels, peaking at 6,200 channels per minute
Rendered more than 5,000 household images in real time—in seconds per image—using ebmgn6v ECS Bare Metal instances to power Taobao renderers, improving rendering performance by dozens of times

AI training

Applicable instance families: gn6v, gn6e | GPU: NVIDIA V100

The gn6v and gn6e instance families are built for deep learning acceleration. gn6v provides 16 GB of GPU memory per card; gn6e provides 32 GB. Both deliver up to 1,000 TFLOPS (teraflops) of mixed-precision computing per node.

These instance families integrate with container services to simplify deployment, operations, and resource scheduling across online and offline computing environments.

AI inference

Applicable instance family: gn6i | GPU: NVIDIA Tesla T4

The gn6i instance family uses the NVIDIA Tesla T4 GPU, which delivers:

Up to 8.1 TFLOPS of single-precision floating-point performance
Up to 130 TOPS (tera-operations per second) of int8 fixed-point processing for quantized inference
Mixed-precision support
75 W power consumption per GPU—high output at low power draw

Like the training families, gn6i integrates with container services for simplified deployment and resource scheduling. Pre-built images with NVIDIA GPU drivers and popular deep learning frameworks are available on Alibaba Cloud Marketplace.

Cloud graphics workstations

Applicable instance family: gn6i | GPU: NVIDIA Tesla T4 (Turing architecture)

Pair gn6i instances with WUYING Workspace to deliver cloud-based GPU graphics workstations. This setup is suited for graphics-intensive workflows across industries such as:

Film and television animation design
Industrial design
Medical imaging
High-performance computing result presentation

DeepGPU use cases

DeepGPU bundles enhanced GPU acceleration tools for AI workloads. Its components include:

Apsara AI Accelerator (AIACC): Includes AIACC-Training and AIACC-Inference
AIACC-ACSpeed: Communication acceleration optimized for PyTorch-based training
AIACC-AGSpeed: Graph optimization for PyTorch-based training
FastGPU
cGPU

AI training

AIACC

Scenario	Applicable model	Storage
Image classification and image recognition	MXNet models	Cloud Paralleled File System (CPFS)
Click-through rate (CTR) prediction	Wide&Deep models of TensorFlow	Hadoop Distributed File System (HDFS)
Natural language processing (NLP)	Transformer and BERT models of TensorFlow	CPFS

AIACC-ACSpeed

AIACC-ACSpeed optimizes distributed training communication for PyTorch workloads, including large-model pretraining and fine-tuning.

Scenario	Applicable model	Storage
Image classification and image recognition	Neural network models such as ResNet and VGG-16, and AIGC models such as Stable Diffusion	CPFS
CTR prediction	Wide&Deep model	HDFS
NLP	Transformer and BERT models	CPFS
Pretraining and fine-tuning of large models	Large language models (LLMs) such as Megatron-LM and DeepSpeed	CPFS

AIACC-AGSpeed

AIACC-AGSpeed optimizes the computational graph for PyTorch workloads.

Scenario	Applicable model
Image classification	ResNet and MobileNet models
Image segmentation	Unet3D models
NLP	BERT, GPT-2, and T5 models

AI inference

Scenario	Applicable model	GPU	Performance improvement	Optimization
Video Ultra HD inference	Ultra HD models	T4	1.7x	Video decoding ported to GPU; preprocessing and postprocessing ported to GPU; dataset size automatically obtained from a single operation; deep convolution optimization
Online inference of image synthesis	Generative adversarial network (GAN) models	T4	3x	Preprocessing and postprocessing ported to GPU; dataset size automatically obtained from a single operation; deep convolution optimization
CTR prediction and inference	Wide&Deep model	M40	5.1x	Pipeline optimization; model splitting; child models separately optimized
NLP inference	BERT models	T4	2.3x	Pipeline optimization of preprocessing and postprocessing; dataset size automatically obtained from a single operation; deep Kernel optimization