Elastic GPU Service provides GPU-accelerated computing capabilities and ready-to-use, scalable GPU computing resources. Elastic GPU Service is an elastic computing service provided by Alibaba Cloud. This service combines the computing power of GPUs and CPUs to address the challenges of scenarios such as AI, high-performance computing, and professional graphics and image processing.
Why Elastic GPU Service?
GPU-accelerated instances are computing servers based on GPUs and CPUs. GPUs have unique advantages in performing mathematical and geometric computing, especially floating-point and parallel computing. GPUs provide 100 times the computing power of their CPU counterparts. GPUs have the following features:
- Have a large number of arithmetic logic units (ALUs) good for large-scale parallel computing.
- Support high-throughput scenarios where multiple threads run in parallel to process computing tasks.
- Have simple logic control units (LCUs).
The following table compares GPU-accelerated instances provided by Elastic GPU Service and self-managed GPU-accelerated servers.
|Item||GPU-accelerated instance||Self-managed GPU-accelerated server|
|Ease of use||
|Disaster recovery and backup||
Elastic GPU Service platform
A GPU is a computing chip that provides real-time, high-speed parallel computing and floating-point computing capabilities. Elastic GPU Service combines Alibaba Cloud elastic computing services with high-speed parallel heterogeneous accelerators of GPUs to provide GPU-accelerated elastic computing services.
Alibaba Cloud launched GPU-accelerated instances based on the Elastic GPU Service platform. GPU-accelerated instances provide GPU acceleration capabilities and can be managed in the same manner as common Elastic Compute Service (ECS) instances. To create GPU-accelerated instances, select an enterprise-level heterogeneous computing instance type. For more information about instance types, see Instance families.
- High elasticity
Provides serial instance families. GPU-accelerated instances can be created within minutes and be horizontally and vertically scaled.
- High performance and high security
Supports GPUDirect for GPU-to-GPU communication. GPUs can communicate with each other over high-bandwidth, low-latency NVLink connections without interventions of CPUs. GPUs provide elastic security isolation among tenants. Hypervisors can be used to authorize tenants to use and manage GPUs. You can configure high-speed communication between isolated GPUs in a secure manner.
- Easy deployment
Deeply integrated with the Alibaba Cloud ecosystem. You can combine Elastic GPU Service with other Alibaba Cloud services to build applications. For example, you can combine Elastic GPU Service with Object Storage Service (OSS) or Apsara File Storage NAS to meet storage requirements and with E-MapReduce (EMR) to preprocess deep learning data. You can also combine Elastic GPU Service with Container Service for Kubernetes (ACK) to make services easier to deliver to users.
- Easy monitoring
Provides comprehensive monitoring in dimensions such as GPUs, instances, and groups to alleviate your O&M pressure. For more information, see GPU monitoring.
Elastic GPU Service resources are billed in the same ways as ECS resources. Computing resources (vCPUs and memory), images, Elastic Block Storage (EBS) devices, public bandwidth, and snapshots are billable resources in Elastic GPU Service.
- Subscription: a billing method in which you pay for resources upfront and use them over a period of time.
- Pay-as-you-go: a billing method in which you pay for resources after you use them.
- Preemptible instance: You can bid for unused computing resources to create preemptible instances. Preemptible instances provide significant cost savings compared with pay-as-you-go instances and have a reclaiming mechanism.
- Reserved instance: Reserved instances are discount coupons provided for use with pay-as-you-go instances. When you purchase a reserved instance, you make an upfront commitment to use instances that have specified configurations (including instance type, region, and zone) over a period of time to receive billing discounts. Reserved instances are applied to offset the bills of computing resources.
- Savings plan: Savings plans are discount plans provided for use with pay-as-you-go instances. When you purchase a savings plan, you make an upfront commitment to use a consistent amount (measured in USD/hour) of resources over a period of time to receive billing discounts. Saving plans are applied to offset the bills of computing resources and system disks.
- Storage capacity unit (SCU): SCUs are storage resource plans provided for use with pay-as-you-go storage resources. When you purchase an SCU, you make an upfront commitment to use storage resources of a specific capacity over a period of time to receive billing discounts. SCUs are applied to offset the bills of various storage resources such as EBS devices, Apsara File Storage NAS file systems, and OSS buckets.
- AIACC-Training: an AI accelerator developed by Alibaba Cloud to improve training performance. For more information, see Automatically install AIACC-Training.
- AIACC-Inference: an AI accelerator developed by Alibaba Cloud to improve inference performance. For more information, see Automatically install AIACC-Inference.
- cGPU: a technology used to isolate GPU resources so that multiple containers can share a single graphics card. For more information, see What is the cGPU service?.
- FastGPU: a tool provided by Alibaba Cloud to build AI computing tasks. It provides interfaces and command lines for you to build AI computing tasks on Alibaba Cloud Infrastructure as a Service (IaaS) resources. For more information, see What is FastGPU?.