Is this you? Create Your Porfile

Qiang Guo

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qiang Guo is active.

Explore More

Publication

Featured researches published by Qiang Guo.

Neurocomputing | 2016

Convolutional feature learning and Hybrid CNN-HMM for scene number recognition

Qiang Guo; Fenglei Wang; Jun Lei; Dan Tu; Guo-Hui Li

In this work, we investigate to recognize house numbers captured in street view images. We formulate the problem as sequence recognition and present an integrated model by combining Convolutional Neural Network (CNN) and Hidden Markov Model (HMM). Our method utilizes representation capability of CNN to model the highly variable appearance of digits. Meanwhile, HMM is used to handle the dynamics of the image sequence. They are combined in a hybrid way to form the Hybrid CNN-HMM. Using this model, we can perform training and recognition both at the whole image level without explicit segmentation. The model makes CNN applicable to dynamic problems. Experiments show that the Hybrid CNN-HMM can dramatically boost the performance of Gaussian Mixture Model (GMM)-HMM. We evaluate different local features, e.g. LBP, SIFT and HOG, as observations fed into HMM and find CNN features consistently surpass those hand-engineered features with respect to recognition accuracy. To gain insight into performance difference of the features, we map them from the high-dimensional space to a 2-D plane by the t-SNE algorithm to visualize their semantic clustering with respect to the task. The visualization clearly justified the efficiency of features learnt by CNN.

Iet Computer Vision | 2016

Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model

Jun Lei; Guo-Hui Li; Jun Zhang; Qiang Guo; Dan Tu

Continuous action recognition in video is more complicated compared with traditional isolated action recognition. Besides the high variability of postures and appearances of each action, the complex temporal dynamics of continuous action makes this problem challenging. In this study, the authors propose a hierarchical framework combining convolutional neural network (CNN) and hidden Markov model (HMM), which recognises and segments continuous actions simultaneously. The authors utilise the CNNs powerful capacity of learning high level features directly from raw data, and use it to extract effective and robust action features. The HMM is used to model the statistical dependences over adjacent sub-actions and infer the action sequences. In order to combine the advantages of these two models, the hybrid architecture of CNN-HMM is built. The Gaussian mixture model is replaced by CNN to model the emission distribution of HMM. The CNN-HMM model is trained using embedded Viterbi algorithm, and the data used to train CNN are labelled by forced alignment. The authors test their method on two public action dataset Weizmann and KTH. Experimental results show that the authors’ method achieves improved recognition and segmentation accuracy compared with several other methods. The superior property of features learnt by CNN is also illustrated.

Neural Computing and Applications | 2014

Convolutional restricted Boltzmann machines learning for robust visual tracking

Jun Lei; Guohui Li; Dan Tu; Qiang Guo

It is a critical step to choose visual features in object tracking. Most existing tracking approaches adopt handcrafted features, which greatly depend on people’s prior knowledge and easily become invalid in other conditions where the scene structures are different. On the contrary, we learn informative and discriminative features from image data of tracking scenes itself. Local receptive filters and weight sharing make the convolutional restricted Boltzmann machines (CRBM) suit for natural images. The CRBM is applied to model the distribution of image patches sampled from the first frame which shares same properties with other frames. Each hidden variable corresponding to one local filter can be viewed as a feature detector. Local connections to hidden variables and max-pooling strategy make the extracted features invariant to shifts and distortions. A simple naive Bayes classifier is used to separate object from background in feature space. We demonstrate the effectiveness and robustness of our tracking method in several challenging video sequences. Experimental results show that features automatically learned by CRBM are effective for object tracking.

asian conference on computer vision | 2014

Hybrid CNN-HMM Model for Street View House Number Recognition

Qiang Guo; Dan Tu; Jun Lei; Guohui Li

We present an integrated model for using deep neural networks to solve street view number recognition problem. We didn’t follow the traditional way of first doing segmentation then perform recognition on isolated digits, but formulate the problem as a sequence recognition problem under probabilistic treatment. Our model leverage a deep Convolutional Neural Network(CNN) to represent the highly variable appearance of digits in natural images. Meanwhile, hidden Markov model(HMM) is used to deal with the dynamics of the sequence. They are combined in a hybrid fashion to form the hybrid CNN-HMM architecture. By using this model we can perform the training and recognition procedure both at word level. There is no explicit segmentation operation at all which save lots of labour of sophisticated segmentation algorithm design or finegrained character labeling. To the best of our knowledge, this is the first time using hybrid CNN-HMM model directly on the whole scene text images. Experiments show that deep CNN can dramaticly boost the performance compared with shallow Gausian Mixture Model(GMM)-HMM model. We obtaied competitive results on the street view house number(SVHN) dataset.

Chinese Conference on Image and Graphics Technologies | 2015

Robust Visual Tracking via Discriminative Structural Sparse Feature

Fenglei Wang; Jun Zhang; Qiang Guo; Pan Liu; Dan Tu

In this paper, we propose a robust visual tracking method by exploiting both the structural and the context information. Firstly we take use of the sparse coding’s robust to occlusion and illumination and extract the structural local sparse feature, upon which we create a discriminative model between the target and the context. Then we introduce an adaptive online SVM algorithm to searching the feature space and discriminate the target from the context patches. Furthermore, the update of the dictionary and the SVM model consider both the latest observations and the original template, thereby enabling the tracker to deal with appearance change and alleviate the drift problem. Experiments compared with the state of art algorithm demonstrate that the proposed tracker performs excellent in the challenging videos.

international conference on image vision and computing | 2016

Continuous action recognition based on hybrid CNN-LDCRF model

Jun Lei; Guohui Li; Shuohao Li; Dan Tu; Qiang Guo

Continuous action recognition in video is more challenging compared with traditional isolated action recognition. In this paper, we proposed a hybrid framework combining Convolutional Neural Network (CNN) and Latent-Dynamic Conditional Random Field (LDCRF) to segment and recognize continuous actions simultaneously. Most existing action recognition works construct complex handcrafted features, which are highly problem dependent. We utilize CNN model, a type of deep models, to automatically learn high level action features directly from raw inputs. The LDCRF model is used to model the intrinsic and extrinsic dynamics of actions. The CNN is embedded in the bottom layer of LDCRF, which converts the structure of LDCRF from shallow to deep. This framework incorporates action feature learning and continuous action recognition procedures in a unified way. The training of our model is in end-to-end fashion. The parameters of CNN and LDCRF are jointly optimized by gradient descent algorithm. We test our method on two public dataset: KTH and HumanEva. Experiment shows our method achieves improved recognition accuracy compared with several other methods. We also demonstrate the superiority of features learnt by CNN compared with handcrafted features.

Security, Pattern Analysis, and Cybernetics (SPAC), 2014 International Conference on | 2014

Reading numbers in natural scene images with convolutional neural networks

Qiang Guo; Jun Lei; Dan Tu; Guo-Hui Li

Reading text from natural images is a hard computer vision task. We present a method for applying deep convolutional neural networks to recognize numbers in natural scene images. In this paper, we proposed a noval method to eliminating the need of explicit segmentation when deal with multi-digit number recognition in natural scene images. Convolution Neural Network(CNN) requires fixed dimensional input while number images contain unknown amount of digits. Our method integrats CNN with probabilistic graphical model to deal with the problem. We use hidden Markov model(HMM) to model the image and use CNN to model digits appearance. This method combines the advantages of both the two models and make them fit to the problem. By using this method we can perform the training and recognition procedure both at word level. There is no explicit segmentation operation at all which save lots of labour for sophisticated segmentation algorithm design or finegrained character labeling. Experiments show that deep CNN can dramaticly improve the performance compared with using Gaussian Mixture model as the digit model. We obtaied competitive results on the street view house number(SVHN) dataset.

international conference on image vision and computing | 2017

Holistic Vertical Regional Proposal Network for scene text detection

Xu Chen; Qiang Guo; Shuohao Li; Jun Zhang

Scene text detection is an important research problem in computer vision community. It has great application value in many fields. Inspired by Faster-RCNN which is a popular method for object detection, we consider to apply the Regional Proposal Network (RPN) method for scene text detection because text can be regarded as the common object. The core of RPN is to detect different sizes of objects with different sizes of anchors. However, when the RPN is applied directly, it is difficult to design many different scale anchors to meet the requirements of different sizes of text boxes. For the above reasons, we adjust the anchor settings and take advantage of vertical anchor to break the restrictions of receptive field. In addition, we refer to the multi-scale network Holistically-Nested Edge Detection (HED) which produce side-output results at different steps of the neural network. The bottom layers have a smaller receptive field, which represent the features of small text area in image. The receptive field of the high-level side-outputs is larger, and it can handle the large-size text area better. We combine the advantages of RPN and HED methods and propose a Holistic Vertical Proposal Regional Network (HVRPN) for scene text detection, and our model shows good results in ICDAR03 and ICDAR11.

Iet Computer Vision | 2017

Deep neural network with attention model for scene text recognition

Shuohao Li; Min Tang; Qiang Guo; Jun Lei; Jun Zhang

The authors present a deep neural network (DNN) with attention model for scene text recognition. The proposed model does not require any segmentation of the input text image. The framework is inspired by the attention model presented recently for speech recognition and image captioning. In the proposed framework, feature extraction, feature attention and sequence recognition are integrated in a jointly trainable network. Compared with previous approaches, the following contributions are mainly made. (i) The attention model is applied into DNN to recognise scene text, and it can effectively solve the sequence recognition problem caused by variable length labels. (ii) Rigorous experiments are performed across a number of challenging benchmarks, including IIIT5K, SVT, ICDAR2003 and ICDAR2013 datasets. Results in experiments show that the proposed model is comparable or better than the state-of-the-art methods. (iii) This model only contains 6.5 million parameters. Compared with other DNN models for scene text recognition, this model has the least number of parameters so far.

Iet Computer Vision | 2017

Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition

Fenglei Wang; Qiang Guo; Jun Lei; Jun Zhang

Text recognition in natural scene remains a challenging problem due to the highly variable appearance in unconstrained condition. The authors develop a system that directly transcribes scene text images to text without character segmentation. They formulate the problem as sequence labelling. They build a convolutional recurrent neural network (RNN) by using deep convolutional neural networks (CNN) for modelling text appearance and RNNs for sequence dynamics. The two models are complementary in modelling capabilities and so integrated together to form the segmentation free system. They train a Gaussian mixture model–hidden Markov model to supervise the training of the CNN model. The system is data driven and needs no hand labelled training data. Their method has several appealing properties: (i) It can recognise arbitrary length text images. (ii) The recognition process does not involve sophisticated character segmentation. (iii) It is trained on scene text images with only word-level transcriptions. (iv) It can recognise both the lexicon-based or lexicon-free text. The proposed system achieves competitive performance comparison with the state of the art on several public scene text datasets, including both lexicon-based and non-lexicon ones.

Explore More