Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays | 2019

Embracing Systolic: Super Systolization of Large-Scale Circulant Matrix-vector Multiplication on FPGA with Subquadratic Space Complexity

 
 

Abstract


The recent advance in artificial intelligence (AI) technology has led to a new round of systolic structure innovation. Many AI accelerators have employed systolic structure to realize the core large-scale matrix-vector multiplication for high-performance processing, which has a complexity of $o(n^2)$ for matrix size of $n\\times n$ (difficult to be implemented on the field-programmable gate array (FPGA) platform). To overcome this drawback, in this paper, we propose a super systolization strategy to implement the core circulant matrix-vector multiplication into a systolic structure with subquadratic space complexity. The proposed effort is carried out through two stages of coherent interdependent efforts: (i) a novel matrix-vector multiplication algorithm based on Toeplitz matrix-vector product (TMVP) approach is proposed to obtain subquadratic space complexity; (ii) a series of optimization techniques are introduced to map the proposed algorithm into desired systolic structure. Finally, detailed complexity analysis and comparison have been conducted to prove the efficiency of the proposed strategy. The proposed strategy is highly efficient and can be extended in many neural network based hardware implementation platforms.

Volume None
Pages None
DOI 10.1145/3289602.3293968
Language English
Journal Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Full Text