Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems | 2021

When application-specific ISA meets FPGAs: a multi-layer virtualization framework for heterogeneous cloud FPGAs

 
 

Abstract


While field-programmable gate arrays (FPGAs) have been widely deployed into cloud platforms, the high programming complexity and the inability to manage FPGA resources in an elastic/scalable manner largely limits the adoption of FPGA acceleration. Existing FPGA virtualization mechanisms partially address these limitations. Application-specific (AS) ISA provides a nice abstraction to enable a simple software programming flow that makes FPGA acceleration accessible by the mainstream software application developers. Nevertheless, existing AS ISA-based approaches can only manage FPGA resources at a per-device granularity, leading to a low resource utilization. Alternatively, hardware-specific (HS) abstraction improves the resource utilization by spatially sharing one FPGA among multiple applications. But it cannot reduce the programming complexity due to the lack of a high-level programming model. In this paper, we propose a virtualization mechanism for heterogeneous cloud FPGAs that combines AS ISA and HS abstraction to fully address aforementioned limitations. To efficiently combine these two abstractions, we provide a multi-layer virtualization framework with a new system abstraction as an indirection layer between them. This indirection layer hides the FPGA-specific resource constraints and leverages parallel pattern to effectively reduce the mapping complexity. It simplifies the mapping process into two steps, where the first step decomposes an AS ISA-based accelerator under no resource constraint to extract all fine-grained parallel patterns, and the second step leverages the extracted parallel patterns to simplify the process of mapping the decomposed accelerators onto the underlying HS abstraction. While system designers might be able to manually perform these steps for small accelerator designs, we develop a set of custom tools to automate this process and achieve a high mapping quality. By hiding FPGA-specific resource constraints, the proposed system abstraction provides a homogeneous view for the heterogeneous cloud FPGAs to simplify the runtime resource management. The extracted parallel patterns could also be leveraged by the runtime system to improve the performance of scale-out acceleration by maximally hiding the inter-FPGA communication latency. We use an AS ISA similar to the one proposed in BrainWave project and a recently proposed HS abstraction as a case study to demonstrate the effectiveness of the proposed virtualization framework. The performance is evaluated on a custom-built FPGA cluster with heterogeneous FPGA resources. Compared with the baseline system that only uses AS ISA, the proposed framework effectively combines these two abstractions and improves the aggregated system throughput by 2.54× with a marginal virtualization overhead.

Volume None
Pages None
DOI 10.1145/3445814.3446699
Language English
Journal Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Full Text