Elastic GPU Service supports video transcoding, image rendering, AI training, AI inference, and cloud graphics workstations. DeepGPU extends these capabilities with enhanced GPU acceleration for AI training and AI inference.
Elastic GPU Service use cases
Video transcoding
Applicable instance type: ebmgn6v (ECS Bare Metal)
GPU-accelerated transcoding handles high-concurrency, real-time video streams at 1080P, 2K, and 4K resolutions while keeping bandwidth consumption low.
During the 2019 Double 11 Global Shopping Festival gala, Elastic GPU Service:
Supported real-time video streaming across more than 5,000 concurrent channels, peaking at 6,200 channels per minute
Rendered more than 5,000 household images in real time—in seconds per image—using ebmgn6v ECS Bare Metal instances to power Taobao renderers, improving rendering performance by dozens of times
AI training
Applicable instance families: gn6v, gn6e | GPU: NVIDIA V100
The gn6v and gn6e instance families are built for deep learning acceleration. gn6v provides 16 GB of GPU memory per card; gn6e provides 32 GB. Both deliver up to 1,000 TFLOPS (teraflops) of mixed-precision computing per node.
These instance families integrate with container services to simplify deployment, operations, and resource scheduling across online and offline computing environments.
AI inference
Applicable instance family: gn6i | GPU: NVIDIA Tesla T4
The gn6i instance family uses the NVIDIA Tesla T4 GPU, which delivers:
Up to 8.1 TFLOPS of single-precision floating-point performance
Up to 130 TOPS (tera-operations per second) of int8 fixed-point processing for quantized inference
Mixed-precision support
75 W power consumption per GPU—high output at low power draw
Like the training families, gn6i integrates with container services for simplified deployment and resource scheduling. Pre-built images with NVIDIA GPU drivers and popular deep learning frameworks are available on Alibaba Cloud Marketplace.
Cloud graphics workstations
Applicable instance family: gn6i | GPU: NVIDIA Tesla T4 (Turing architecture)
Pair gn6i instances with WUYING Workspace to deliver cloud-based GPU graphics workstations. This setup is suited for graphics-intensive workflows across industries such as:
Film and television animation design
Industrial design
Medical imaging
High-performance computing result presentation
DeepGPU use cases
DeepGPU bundles enhanced GPU acceleration tools for AI workloads. Its components include:
Apsara AI Accelerator (AIACC): Includes AIACC-Training and AIACC-Inference
AIACC-ACSpeed: Communication acceleration optimized for PyTorch-based training
AIACC-AGSpeed: Graph optimization for PyTorch-based training
FastGPU
cGPU
AI training
AIACC
| Scenario | Applicable model | Storage |
|---|---|---|
| Image classification and image recognition | MXNet models | Cloud Paralleled File System (CPFS) |
| Click-through rate (CTR) prediction | Wide&Deep models of TensorFlow | Hadoop Distributed File System (HDFS) |
| Natural language processing (NLP) | Transformer and BERT models of TensorFlow | CPFS |
AIACC-ACSpeed
AIACC-ACSpeed optimizes distributed training communication for PyTorch workloads, including large-model pretraining and fine-tuning.
| Scenario | Applicable model | Storage |
|---|---|---|
| Image classification and image recognition | Neural network models such as ResNet and VGG-16, and AIGC models such as Stable Diffusion | CPFS |
| CTR prediction | Wide&Deep model | HDFS |
| NLP | Transformer and BERT models | CPFS |
| Pretraining and fine-tuning of large models | Large language models (LLMs) such as Megatron-LM and DeepSpeed | CPFS |
AIACC-AGSpeed
AIACC-AGSpeed optimizes the computational graph for PyTorch workloads.
| Scenario | Applicable model |
|---|---|
| Image classification | ResNet and MobileNet models |
| Image segmentation | Unet3D models |
| NLP | BERT, GPT-2, and T5 models |
AI inference
| Scenario | Applicable model | GPU | Performance improvement | Optimization |
|---|---|---|---|---|
| Video Ultra HD inference | Ultra HD models | T4 | 1.7x | Video decoding ported to GPU; preprocessing and postprocessing ported to GPU; dataset size automatically obtained from a single operation; deep convolution optimization |
| Online inference of image synthesis | Generative adversarial network (GAN) models | T4 | 3x | Preprocessing and postprocessing ported to GPU; dataset size automatically obtained from a single operation; deep convolution optimization |
| CTR prediction and inference | Wide&Deep model | M40 | 5.1x | Pipeline optimization; model splitting; child models separately optimized |
| NLP inference | BERT models | T4 | 2.3x | Pipeline optimization of preprocessing and postprocessing; dataset size automatically obtained from a single operation; deep Kernel optimization |