J. Syst. Archit. | 2021

Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems

Abstract

Abstract Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.

Volume 118

J. Syst. Archit. | 2021

Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems

Abstract

Volume 118

Pages 102183

DOI 10.1016/J.SYSARC.2021.102183

Language English

Journal J. Syst. Archit.

Full Text