Is this you? Create Your Porfile

Shuohao Li

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuohao Li is active.

Explore More

Publication

Featured researches published by Shuohao Li.

scandinavian conference on image analysis | 2017

Crowd Counting Based on MMCNN in Still Images

Tao Wang; Guohui Li; Jun Lei; Shuohao Li; Shukui Xu

Accurately estimate the crowd count from a still image with arbitrary perspective and arbitrary crowd density is one of the difficulties of crowd analysis in surveillance videos. Conventional methods are scene-specific and subject to occlusions. In this paper, we propose a Multi-task Multi-column Convolutional Neural Network (MMCNN) architecture for crowd counting and crowd density estimation in still images of surveillance scenes. The MMCNN architecture is an end-to-end system which is robust for images with different perspective and different crowd density. By promoting MCNN with \(3\times 3\) filter, the MMCNN could utilize local spatial features from each column. Furthermore, the ground truth density map is generated based on Perspective-Adaptive Gaussian kernels which can better represent the heads of pedestrians. Finally, we use an iterative switching process in our deep crowd model to alternatively optimize the crowd density map estimation task and crowd counting task. We conduct experiments on the WorldExpo’10 dataset and our method achieves better results.

international conference on image vision and computing | 2016

Continuous action recognition based on hybrid CNN-LDCRF model

Jun Lei; Guohui Li; Shuohao Li; Dan Tu; Qiang Guo

Continuous action recognition in video is more challenging compared with traditional isolated action recognition. In this paper, we proposed a hybrid framework combining Convolutional Neural Network (CNN) and Latent-Dynamic Conditional Random Field (LDCRF) to segment and recognize continuous actions simultaneously. Most existing action recognition works construct complex handcrafted features, which are highly problem dependent. We utilize CNN model, a type of deep models, to automatically learn high level action features directly from raw inputs. The LDCRF model is used to model the intrinsic and extrinsic dynamics of actions. The CNN is embedded in the bottom layer of LDCRF, which converts the structure of LDCRF from shallow to deep. This framework incorporates action feature learning and continuous action recognition procedures in a unified way. The training of our model is in end-to-end fashion. The parameters of CNN and LDCRF are jointly optimized by gradient descent algorithm. We test our method on two public dataset: KTH and HumanEva. Experiment shows our method achieves improved recognition accuracy compared with several other methods. We also demonstrate the superiority of features learnt by CNN compared with handcrafted features.

Journal of Applied Remote Sensing | 2015

Enhanced coherent point drift algorithm for remote sensing image registration

Jun Zhang; Lin Lian; Jun Lei; Shuohao Li; Dan Tu

Abstract. Remote sensing image registration is a key component in many computer vision tasks since it can improve the understanding of information among multisensor images through fusing. After feature detection, the image registration is converted into a point set registration problem. The coherent point drift (CPD) algorithm is regarded as a powerful approach for point set registration. However, for junction set, a serious problem arises when using this algorithm—the structural information of the junction is not included in the Gaussian mixture model. To solve this problem, we present an enhanced coherent point drift (ECPD) algorithm. According to the inherent characteristic of junction, we propose the definition of local structural consistency which measures the similarity between two junctions. Furthermore, we introduce local structural consistency as a part of GMM components’ posterior probabilities to achieve more accurate registration results. The experiments of remote sensing image registration show that the ECPD algorithm is more robust to noises and outliers than CPD and outperforms current state-of-the-art methods.

chinese control and decision conference | 2014

An incremental structure learning approach for Bayesian Network

Shuohao Li; Jun Zhang; Boliang Sun; Jun Lei

Structure learning of Bayesian Network (BN) is one of important topics in machine learning and widely applied in expert system. The traditional algorithms for structure learning are usually focused on the batch data in nature. It is difficult to learn the structure quickly from the huge amounts of data. But in many practical applications, the structure of BN should be learned by using time-series data that are available to us. To achieve this goal, we propose an incremental structure learning approach for BN. Firstly, we proposed the framework of incremental structure learning and a new evaluation criterion “ABIC” (Adopt Bayesian Information Criterion) based on the BIC. Then, three phase algorithm is used to learn the structure. Numerical experiments on two standard networks show that our proposed algorithm can greatly improve the accuracy of the structure and the total of learning time is greatly reduced.

international conference on image vision and computing | 2017

Fine-grained object detection based on self-adaptive anchors

Kaili Ma; Jun Zhang; Fenglei Wang; Dan Tu; Shuohao Li

The fine-grained object detection is an extremely challenging problem due to the subtle variances in the appearances. At present, faster R-CNN is one of the best detection systems. However, it not a wise decision to directly apply the faster R-CNN to the fine-grained object detection. By analyzing the characteristics of fine-grained objects, we found that the anchor mechanism in the faster R-CNN system has a lot of redundancy. By analyzing the characteristics of fine-grained objects, we use self-adaptive anchors to enhance the structure of the system and combine the detection and classification of fine-grained objects. By using self-adaptive anchors, new progress has been made on the small-scale fine-grained datasets (Stanford Cars).We making the detection of mean average precision on the Stanford Cars dataset flush to 88.9%. And we notice that this mechanism used in non-fine-grained detection does not decrease its effect. So this mechanism, which is named self-adaptable anchors, can be used as a general idea in object detection.

international conference on image vision and computing | 2017

Holistic Vertical Regional Proposal Network for scene text detection

Xu Chen; Qiang Guo; Shuohao Li; Jun Zhang

Scene text detection is an important research problem in computer vision community. It has great application value in many fields. Inspired by Faster-RCNN which is a popular method for object detection, we consider to apply the Regional Proposal Network (RPN) method for scene text detection because text can be regarded as the common object. The core of RPN is to detect different sizes of objects with different sizes of anchors. However, when the RPN is applied directly, it is difficult to design many different scale anchors to meet the requirements of different sizes of text boxes. For the above reasons, we adjust the anchor settings and take advantage of vertical anchor to break the restrictions of receptive field. In addition, we refer to the multi-scale network Holistically-Nested Edge Detection (HED) which produce side-output results at different steps of the neural network. The bottom layers have a smaller receptive field, which represent the features of small text area in image. The receptive field of the high-level side-outputs is larger, and it can handle the large-size text area better. We combine the advantages of RPN and HED methods and propose a Holistic Vertical Proposal Regional Network (HVRPN) for scene text detection, and our model shows good results in ICDAR03 and ICDAR11.

Iet Computer Vision | 2017

Deep neural network with attention model for scene text recognition

Shuohao Li; Min Tang; Qiang Guo; Jun Lei; Jun Zhang

The authors present a deep neural network (DNN) with attention model for scene text recognition. The proposed model does not require any segmentation of the input text image. The framework is inspired by the attention model presented recently for speech recognition and image captioning. In the proposed framework, feature extraction, feature attention and sequence recognition are integrated in a jointly trainable network. Compared with previous approaches, the following contributions are mainly made. (i) The attention model is applied into DNN to recognise scene text, and it can effectively solve the sequence recognition problem caused by variable length labels. (ii) Rigorous experiments are performed across a number of challenging benchmarks, including IIIT5K, SVT, ICDAR2003 and ICDAR2013 datasets. Results in experiments show that the proposed model is comparable or better than the state-of-the-art methods. (iii) This model only contains 6.5 million parameters. Compared with other DNN models for scene text recognition, this model has the least number of parameters so far.

Iet Computer Vision | 2017

Generating image descriptions with multidirectional 2D long short-term memory

Shuohao Li; Jun Zhang; Qiang Guo; Jun Lei; Dan Tu

Connecting visual imagery with descriptive language is a challenge for computer vision and machine translation. To approach this problem, the authors propose a novel end-to-end model to generate descriptions for images. Some early works used convolutional neural network-long-short-term memory (CNN-LSTM) model to describe the image, where a CNN encodes the input image into feature vector and an LSTM decodes the feature vector into a description. Since two-dimensional LSTM (2DLSTM) has property of translation invariance and can encode the relationships between regions in an image, they not only apply a CNN to extract global features of an image, but also use a multidirectional 2DLSTM to encode the feature maps extracted by CNN into structural local features. Their model is trained through maximising the likelihood of the target description sentence from the training dataset. Experiments on two challenging datasets show the accuracy of the model and the fluency of the language which is learned by their model. They compare bilingual evaluation understudy score and retrieval metric of their results with current state-of-the-art scores and show the improvements on Flickr30k and MS COCO.

international conference on image vision and computing | 2016

Image semantic segmentation based on FCN-CRF model

Hao Zhou; Jun Zhang; Jun Lei; Shuohao Li; Dan Tu

Image segmentation is a key point for analyzing and understanding image, which occupies an important position in image processing. Recent studies have attempted to tackle pixel level labeling tasks using deep learning. In our paper, we propose an approach of combining fully convolutional network and conditional random field for image semantic segmentation. We utilize FCN model to automatically learn features directly from original image data, and create local predictions and global structure consistency by combining fine layers and coarse layers. CRF is a probabilistic graph and used to fully exploit the context information. Our model train the whole deep network end-to-end with the back-propagation algorithm and maximum likelihood estimation. The key of jointing FCN and CRF is sensitivity of neurons, calculating the sensitivity by CRF and then transferring it to FCN. Experiments show our method achieves improved accuracy compared with several other methods on Pascal VOC 2012 dataset.

arXiv: Computer Vision and Pattern Recognition | 2015