Neurocomputing | 2021

EDENet: Elaborate density estimation network for crowd counting

 
 
 
 
 
 

Abstract


Abstract For the CNN-based density estimation approaches in the field of crowd counting, how to generate a high-quality density map with accurate counting performance and detailed spatial description is still an open question. In this paper, to tackle the aforementioned contradiction, we propose an end-to-end trainable architecture called Elaborate Density Estimation Network for Crowd Counting (EDENet), which can gradually generate high-quality density estimation maps based on distributed supervision. Specifically, EDENet is composed of Feature Extraction Network (FEN), Feature Fusion Network (FFN), Double-Head Network (DHN) and Adaptive Density Fusion Network (ADFN). The FEN adopts VGG as the backbone network and employs Spatial Adaptive Pooling (SAP) to extract coarse-grained features. The FFN can effectively fuse contextual information and localization information for enhancing the spatial description ability of fine-grained features. In the DHN, the Density Attention Module (DAM) can provide attention masks of foreground-background, thereby urging the Density Regression Module (DRM) to focus on the pixels around the head annotations to regress density maps with different resolutions. The ADFN constructed on the basis of the adaptive weighting mechanism can directly introduce coarse-grained density representation into high-resolution density maps to strengthen the commonality and dependency among density maps. Extensive experiments on four benchmark crowd datasets (the ShanghaiTech, the UCF-QNRF, the JHU-CRWORD++ and the NWPU-Crowd) indicate that EDENet can achieve state-of-the-art recognition performance and high robustness. Not only that, the density map with the highest Peak Signal to Noise Ratio (PSNR) can be considered to be of high quality.

Volume 459
Pages 108-121
DOI 10.1016/j.neucom.2021.06.086
Language English
Journal Neurocomputing

Full Text