This topic describes the terms, architecture, and hardware development kit (HDK) of the recommended f3 instances.

Overview

The rapid development of cloud computing and data center business requires an increasing amount of computing power. In an increasing number of scenarios where high computing power is required, the computing power provided by CPUs cannot meet the requirements. In specific scenarios, FPGAs can accelerate performance by dozens or even hundreds of times compared with CPUs and can reduce latency by two orders of magnitude. At the Apsara Conference in September 2019, Alibaba Cloud released f3 instances that are based on Xilinx 16nm process devices. f3 instances pioneer the single-card-dual-chip design and take the lead in computing density.

Alibaba Cloud f3 instances provide developers with tools and environments for developing and using FPGAs in the cloud, which are easy to use, cost-effective, agile, and secure. f3 instances make it easy to develop FPGA accelerators and deploy business based on FPGAs.

Hardware architecture

f3 instances use the single-card-dual-chip architecture to implement inter-chip and inter-card interconnections. The following figure shows the hardware architecture of f3 instances.Image 25

Specifications

Instance type Description (single VU9P)
Dimension Full height and full length
FPGA model XCVU9P
Peripheral component interconnect express (PCIe) interface PCIe GEN3 X16
Memory 4 x Double Data Rate 4 (DDR4) 2,133MHz. Capacity: 4 x 16 GB
Inter-chip interconnection 200 Gbps x 3
Ethernet interface 100 Gbps x 2
Clock module Clocks that can be dynamically configured

Topology

f3 instances use a dual-card interconnection topology to implement communication between FPGAs. The minimum communication bandwidth is 100 Gbit/s.

Development environment

Platform Description
Development tool Vivado 2018.2
Chip XCVU9P
Development environment Linux CentOS 7.4
Kernel version 3.10.0-693.el7.x86_64

Logical structure

Image 28
A VU9P chip contains the following parts, as shown in the preceding figure:
  • SHELL

    Static area, which contains PCIE DMA/XDMA, register path, DDR1, and other control logic.

  • ROLE

    Dynamic area, which contains three DDR controllers (DDR0, DDR2, and DDR3), DMA interaction path, and SerDes (inter-chip interconnection and board interconnection).

  • Customer Logic

    It is included in Role and allows you to customize your own logic based on the provided fixed interface logic.

Description of user interfaces

Signal name I/O Bit width Description
clock&reset sys_alite_aclk I 1 The register clock domain clock, which is 50 MHz.
sys_alite_aresetn I 1 The register clock domain reset signal.
sys_clk_200m I 1 The user clock with a frequency of 200 MHz. You can use this clock to connect MMCM to extend the clock.
sys_clk_rstn I 1 The global reset signal.
kernel_clk_300m I 1 The user clock with a frequency of 300 MHz. The clock is fixed and configurable. Typically, we recommend that you use this clock as the master clock.
kernel_clk_rstn I 1 The user clock reset signal.
kernel2_clk_500m I 1 The user clock with a frequency of 500 MHz. The clock is fixed and configurable.
kernel2_clk_rstn I 1 The user clock reset signal.
pcie_axi_aclk I 1 pcie axi clock: The PCIE core clock domain and the xdma/dma/int interface clock domain.
pcie_axi_arstn I 1 pcie core rstn.
c0_ddr4_ui_clk I 1 The ddr0 channel clock domain.
c0_ddr4_rstn I 1 The ddr0 channel clock domain reset signal.
c1_ddr4_ui_clk I 1 The ddr1 channel clock domain.
c1_ddr4_rstn I 1 The ddr1 channel clock domain reset signal.
c2_ddr4_ui_clk I 1 The ddr2 channel clock domain.
c2_ddr4_rstn I 1 The ddr2 channel clock domain reset signal.
c3_ddr4_ui_clk I 1 The ddr3 channel clock domain.
c3_ddr4_rstn I 1 The ddr3 channel clock domain reset signal.
AXI-MM XDMA - - AXI MM standard interface.
  • For more information, see the AXI4_specification document.
  • XDMA: For information about PG195, visit the official Xilinx website.
  • DMA: For information about PG194, visit the official Xilinx website.
  • AXI- Lite: provides 8 MB of register access space to user interfaces.
Notice Take note of the clock domain of each interface when you use the interface.
DMA - -
AXI-Lite - -
DDR0/1/2/3 - -
int - 16 You can separately send 16 interrupt reports with a clock domain of pcie_axi_aclk.
AXI_STREAM inter chipinterconnect - - Lightweight interconnection interfaces use the Xilinx aurora protocol standard. For information about PG074, visit the official Xilinx website.
Card interconncet - -

Term

Term Description
FaaS FPGA as a Service (FaaS)
HDK Hardware development kit
SDK Software development kit
SHELL Static logic, including external interfaces such as PCIe and DDR4
Role Dynamic logic, PR region
CL Customer logic (CL) provided by developers
PR Partial reconfiguration
MGNTPF Mangement Phsical Function
USRPF User Phsical Function
OpenCL Open Computing Language
HAL Hardware Abstraction Layer