This topic describes the typical application scenarios of FPGA-based ECS instances.

Live video transcoding

Heterogeneous computing GPU- and FPGA-based instances are used to support real-time video transcoding for the 2019 Tmall Double 11 Gala live broadcast and provide real-time transcoding services at different resolutions of 4K, 2K, and 1080p with high image quality and low bandwidth. Using FPGAs to transmit 720p videos encoded based on the H.265 standard can reduce the bandwidth by 21.6%. GPU-based ECS instances support more than 5,000 concurrent real-time video streams per minute, and gradually increase this number to up to 6200 to handle traffic peaks. Heterogeneous computing GPU-based ECS instances are used in real-time rendering of household images. ebmgn6v, the compute optimized ECS Bare Metal Instance family with GPUs, is now provided to improve the Taobao renderers by dozens of times. Real-time rendering is implemented within seconds for the first time, and more than 5,000 large household images are rendered. Heterogeneous computing FPGA-based ECS instances provide the image transcoding service over an ultra-large cluster consisting of more than 3,000 nodes to offer a processing capability of up to millions of QPS for the image space service of Taobao. This service handles 85% of image processing on Taobao for an estimated CNY 300 million in savings on computing costs.

Artificial intelligence

GPUs are currently the best choice for AI solutions because of the following traits:
  • GPUs feature a comprehensive ecosystem and highly parallel computing power, which can help you implement solutions and deploy them online.
  • The development of artificial intelligence is still in its early stages. Industries are seeking the possibility of commercial availability on the algorithm level.

An increasing number of AI applications are expected to become commercially available in the next few years. In this case, it is expected that the demand for lower power consumption, lower costs, lower processing latency, and higher degree of customization will increase significantly. f3 instances have unique performance advantages and broad space for development in the large-scale commercial deployment of artificial intelligence (inference applications).

GPUs have many dedicated parallel computing units and ultra-high memory capacities to deliver processing advantages, making multi-channel large-scale data transmission and fast parallel computing a typical computing mode. However, this mode also increases the processing latency of each channel of data. In online business scenarios that require low latency, such as speech recognition, the processing delay of an f3 instance is only one-tenth that of a GPU-based instance when the batch size is relatively small.

One of the development trends of deep neural network computing is to improve computing throughput by compromising the accuracy of data representation and the network demand for computing power. The accuracy of data representation decreases from double-precision floating point to single-precision floating point, and then to fixed-point processing, while fixed-point operation is the traditional advantage of FPGAs. Compared with GPUs, FPGAs have a large number of fixed-point processing units. Even the internal logic resources of an FPGA chip can be configured as fixed-point processing units to provide ultra-high fixed-point computing capabilities.

DNA sequencing

DNA sequencing is a new gene test technology that can analyze and determine the complete sequence of genes in blood or saliva and predict the potential existence of diseases. The DNA sequencing technology can lock down viral genes for proactive treatment, and is widely applied in non-invasive prenatal gene tests for Down's syndrome. With the rapid development of the DNA sequencing technology, the amount of genetic data grows exponentially. The increasingly extensive application of genetic data requires higher analysis capabilities.

Traditional computing systems use multiple high-end CPUs to build HPC systems to reduce the time required for analysis. However, this practice can lead to increased costs as well as limit the scale of industry applications and development of gene enterprises. At present, domestic gene enterprises are in a dilemma of huge demand but high costs of gene computing. They are in urgent need of cost-efficient computing resources.

In Whole Genome Sequencing (WGS), for example, it takes nearly 100 hours for a CPU-based instance with 16 vCPUs and 64 GiB memory to complete a single WGS analysis. However, an f3 instance can complete the analysis within 30 minutes, significantly reducing the compute time and costs.

IC design prototype verification

An important part of traditional digital IC designs involves the use of FPGAs to build a chip prototype verification platform for function tests, and the verification process requires a large number of FPGA logic units. However, for traditional digital chip design companies, it is time and labor intensive to buy or develop FPGA verification boards or platforms and goes beyond the core business scope of these companies. In addition, FPGA platforms are upgraded faster than chips are designed. It has always been a pain point in the design of large digital chips to develop large and new FPGA boards.

f3 instances use VU9P FPGAs with 2.5 million single-chip logic units that support 600 Gbit/s interconnection between two chips and 100 Gbit/s interconnection between multiple boards. f3 instances support a maximum of 16 VU9P chips, which can meet the requirements for a large amount of logic in the digital chip prototype verification phase. In addition, f3 instances can save your effort in maintaining complex FPGA boards and reduce the verification platform maintenance costs.

Cloud-based compression for accelerated computing

When users perform big data storage and high-speed network transmission on the cloud, they often need to choose between efficiency and costs for the sake of instance performance. gzip is a compression tool that is widely used in Internet services. However, traditional CPU-based gzip compression is inefficient, time-consuming, and cannot support large traffic. Compute optimized instances with FPGAs can be used for gzip compression with 8 to 10 times the performance of common CPU-based instances. FPGA-based instances can fully meet your requirements for data compression.

Additionally, FPGAs can be used to accelerate a wide range of compression tasks such as compression of background service logs, static website resource files, batch computing tasks, and distributed storage.

Database acceleration

A large Internet service provider needs to process petabytes of data, update hundreds of millions of web pages, and update petabytes of log data every day. Large clusters are required to process such a large amount of data. When a large amount of data is under processing, the performance of the data warehouse will directly affect the data processing capability.

f3 instances can significantly improve the performance of database products by leveraging the fine-grained data processing capabilities and highly-concurrent parallel computing capabilities of FPGAs.
  • f3 instances use FPGA-based parallel compute to accelerate sorting. For example, f3 instances can improve the performance of sorting for PostgreSQL databases by more than 10 times of the performance when only CPUs are used.
  • Time series data is widely used in diversified industrial scenarios such as Internet of Things (IoT) device monitoring systems, enterprise-level energy management systems (EMSs), production safety monitoring systems, and electric power detection systems. The throughput of a single data channel of an f3 instance is more than 30 times higher than that of a single-core CPU.