IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2021
Cheetah: An Accurate Assessment Mechanism and a High-Throughput Acceleration Architecture Oriented Toward Resource Efficiency
Abstract
Convolutional neural network (CNN) is widely used in artificial intelligence for its excellent recognition accuracy. With its scale increasing rapidly and architecture becoming complicated, it is much difficult to implement CNN in hardware platform efficiently. Many FPGA-based CNN accelerators are proposed in previous work. However, when evaluating resource efficiency, their assessment methods are: 1) device related; 2) frequency related; or 3) they confuse resource efficiency with resource occupancy. There is an insistent demand for intuitive and fair assessment criteria. When implementing CNNs, they still have improvement room in computing resource efficiency, especially for layers with large feature size and few feature maps. In this work, we propose <inline-formula> <tex-math notation= LaTeX >$R_{\\mathrm{ score}}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation= LaTeX >$C_{\\mathrm{ score}}$ </tex-math></inline-formula>, which compose a comprehensive and accurate resource efficiency assessment mechanism for evaluation and design guidance, respectively. Under the guidance, we introduce Cheetah, an FPGA-based high-throughput acceleration architecture. Its computing part can optimize the use of available resources in both time and space aspects, resulting in better throughput improvement. An auxiliary storage system and a pipeline stage compression method are designed for less storage overhead and shorter inference latency. We implement AlexNet and ResNet18 on KCU1500 at 230 and 240 MHz, respectively, with a throughput of 2411.01GOP/s and 2435.05GOP/s for 16-bit quantification. Cheetah achieves an excellent average <inline-formula> <tex-math notation= LaTeX >$R_{\\mathrm{ score}}$ </tex-math></inline-formula> of (0.9441, 0.9456) on different FPGA devices, while the others’ mainly distribute between 0.3 and 0.8. Finally, Cheetah has 6.78X speed improvement and 1.87X power-efficiency improvement than that of Nvidia Jetson TX2, which is the fastest, most power-efficient embedded AI computing device.