J. Syst. Archit. | 2021

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

 
 
 
 
 
 
 

Abstract


Abstract Embedded devices are common carriers for deploying inference networks, which leverage the customized accelerator to achieve the promised performance with strict resource constraints. In the inference of Deep Neural Network(DNN), the sparsity existing in the activations and weights of every layer contributes massive non-effective memory accesses and computing operations. The data compression is adopted as a data pruning method for accelerator design, which eliminates the zero-valued data with a specific data packaging method. However, the data compression, in varying degrees, breaks the data regularity of the processing array DNN accelerators calculates with. The complexity of data access caused by irregular data organization will add extra control logic and decoding logic to compensate. The accelerator architecture that supports sparsity can use the sophisticated memory access scheming and parallel on-chip decoder structure via an efficient data packaging method to balance memory-accessing and computing for acceleration. In this paper, we propose a flexible and highly parallel accelerator architecture that uses a quantitative data packaging method which is efficient and stable for different degrees of sparsity and parallel optimization to explore the sparsity in DNN to achieve high performance with low energy consumption. The total DRAM accesses, performance and energy consumption of the proposed sparse architecture are evaluated with different inference networks. Experiments show that the DRAM accesses of the proposed efficient data packaging method is significantly lower than other commonly used sparse data compression storage methods, the improved performance and saved energy of the sparse accelerator architecture after adopting the optimization method proposed in this paper are up to 1.2x and 1.6x, respectively, over a comparably provisioned do not support sparsity accelerator. In addition, the accelerator architecture proposed has achieved energy efficiency and performance improvements of up to 1.70x and 1.56x, compared with the state-of-the-art architectures.

Volume 117
Pages 102094
DOI 10.1016/J.SYSARC.2021.102094
Language English
Journal J. Syst. Archit.

Full Text