All Products
Search
Document Center

Elastic GPU Service:Scenarios

Last Updated:Oct 16, 2023

Elastic GPU Service is suitable for scenarios such as video transcoding, image rendering, AI training, AI inference, and cloud graphics workstations. DeepGPU provides enhanced GPU computing capabilities and is suitable for AI training and AI inference. This topic describes the scenarios of Elastic GPU Service and DeepGPU.

Scenarios of Elastic GPU Service

  • Transcoding for real-time videos

    During the Double 11 Global Shopping Festival gala in 2019, Elastic GPU Service was used to support video transcoding at resolutions of 1080P, 2K, and 4K in real-time. Elastic GPU Service transcoded videos with high image quality and definition in real time while consuming minimal bandwidth. The following section provides details:

    • Elastic GPU Service supported high-concurrency real-time video streaming of more than 5,000 channels, which gradually increased to a peak of 6,200 channels per minute, and smoothly handled the traffic peak.

    • Elastic GPU Service also took part in tasks such as generating real-time rendering images of households. A large number of ECS Bare Metal instances of the ebmgn6v instance type with powerful computing capacity are provided for the first time to support Taobao renderers. The instances improved performance by dozens of times, achieved real-time rendering in seconds, and rendered more than 5,000 household images in total.

  • AI training

    The GPU-accelerated computed optimized instance families gn6v and gn6e provide excellent general-purpose GPU acceleration capabilities and are suitable for providing acceleration engines for deep learning. The following section provides details:

    • The gn6v and gn6e instance families use NVIDIA V100 GPU processors with 16 GB and 32 GB of memory respectively and can provide mixed-precision computing capacity of up to 1,000 TFLOPS per node.

    • The gn6v and gn6e instances can be integrated into an elastic computing ecosystem to provide solutions that are suitable for online and offline computing scenarios.

    • You can use the instances together with container services to simplify deployment and O&M and schedule resources.

  • AI inference

    The gn6i instance family provides excellent AI inference capabilities that can meet computing requirements in deep learning scenarios, especially in AI inference. The following section provides details:

    • The gn6i instances use NVIDIA Tesla T4 GPU processors to provide a single-precision floating-point computing capacity of up to 8.1 TFLOPS and int8 fixed-point processing capabilities of up to 130 TOPS. The instances also support mixed precision.

    • Additionally, a single processor consumes only 75 W of power while maintaining a high-performance output.

    • The gn6i instances can be integrated into an elastic computing ecosystem to provide solutions that are suitable for online and offline computing scenarios.

    • You can use the instaces together with container services to simplify deployment and O&M and schedule resources.

    • Alibaba Cloud Marketplace provides a gn6i instance image that uses an NVIDIA GPU driver and a deep learning framework to simplify development.

  • Cloud graphics workstations

    The gn6i instances use NVIDIA Tesla T4 GPU accelerators based on the Turing architecture and provide excellent graphics computing capacity. You can use gn6i instances together with WUYING Workspace to provide cloud graphics workstation services. The services can be used in scenarios such as film and television animation design, industrial design, medical imaging, and high-performance computing result presentation.

Scenarios of DeepGPU

DeepGPU include the following components: Apsara AI Accelerator (AIACC, that includes AIACC-Training and AIACC-Inference), AIACC 2.0-AIACC Communication Speeding (AIACC-ACSpeed), AIACC Graph Speeding (AIACC-AGSpeed), FastGPU, and cGPU. You can use DeepGPU in AI training and AI inference scenarios. The following section provides details:

  • AI training

    AIACC is suitable for AI training and AI inference scenarios. AIACC-ACSpeed (ACSpeed) and AIACC-AGSpeed (AGSpeed) are suitable for AI trainings that are based on the PyTorch framework and can provide optimization for the PyTorch framework.

    • The following table describes the AI training scenarios of AIACC.

      Scenario

      Applicable model

      Storage

      Image classification and image recognition

      MXNet models

      Cloud Paralleled File System (CPFS)

      CTR prediction

      Wide&Deep models of TensorFlow

      Hadoop Distributed File System (HDFS)

      Natural Language Processing (NLP)

      Transformer and BERT models of TensorFlow

      CPFS

    • The following table describes the AI training scenarios of ACSpeed.

      Scenario

      Applicable model

      Storage

      Image classification and image recognition

      Neural network models, such as ResNet and VGG-16, and AIGC models such as Stable Diffusion

      CPFS

      CTR prediction

      Wide&Deep model

      HDFS

      NLP

      Transformer and BERT models

      CPFS

      Pretraining and fine-tuning of large models

      Large language models (LLMs), such as Megatron-LM and DeepSpeed

      CPFS

    • The following table describes the AI training scenarios of AGSpeed.

      Scenario

      Applicable model

      Image classification

      ResNet and MobileNet models

      Image segmentation

      Unet3D models

      NLP

      BERT, GPT-2, and T5 models

  • AI inference

    AIACC is suitable for AI inference scenarios. The following table describes the AI inference scenarios of AIACC.

    Scenario

    Applicable model

    Specification

    Optimization

    Video Ultra HD inference

    Ultra HD models

    T4 GPU

    The performance is improved by 1.7 times based on the following optimization:

    • Video decoding is ported to the GPU.

    • Preprocessing and postprocessing are ported to the GPU.

    • The data set size is automatically obtained from a single operation.

    • Deep optimization of convolution.

    Online inference of image synthesis

    GAN models

    T4 GPU

    The performance is improved by 3 times based on the following optimization:

    • Preprocessing and postprocessing are ported to the GPU.

    • The data set size is automatically obtained from a single operation.

    • Deep optimization of convolution.

    Prediction and inference of CTR

    Wide&Deep model

    M40 GPU

    The performance is improved by 5.1 times based on the following optimization:

    • Pipeline optimization.

    • Model splitting.

    • The child models are separately optimized.

    NLP inference

    BERT models

    T4 GPU

    The performance is improved by 2.3 times based on the following optimization:

    • Pipeline optimization of preprocessing and postprocessing.

    • The data set size is automatically obtained from a single operation.

    • Deep optimization of Kernel.