Yongzhen Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongzhen Huang is active.

Explore More

Publication

Featured researches published by Yongzhen Huang.

computer vision and pattern recognition | 2015

Deep semantic ranking based hashing for multi-label image retrieval

Fang Zhao; Yongzhen Huang; Liang Wang; Tieniu Tan

With the rapid growth of web images, hashing has received increasing interests in large scale image retrieval. Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels. However, most of these hashing methods are designed to handle simple binary similarity. The complex multi-level semantic structure of images associated with multiple labels have not yet been well explored. Here we propose a deep semantic ranking based method for learning hash functions that preserve multilevel semantic similarity between multi-label images. In our approach, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of nonsmooth and multivariate ranking measures involved in the learning procedure. Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods in term of ranking evaluation metrics when tested on multi-label image datasets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Feature Coding in Image Classification: A Comprehensive Study

Yongzhen Huang; Zifeng Wu; Liang Wang; Tieniu Tan

Image classification is a hot topic in computer vision and pattern recognition. Feature coding, as a key component of image classification, has been widely studied over the past several years, and a number of coding algorithms have been proposed. However, there is no comprehensive study concerning the connections between different coding methods, especially how they have evolved. In this paper, we first make a survey on various feature coding methods, including their motivations and mathematical representations, and then exploit their relations, based on which a taxonomy is proposed to reveal their evolution. Further, we summarize the main characteristics of current algorithms, each of which is shared by several coding strategies. Finally, we choose several representatives from different kinds of coding approaches and empirically evaluate them with respect to the size of the codebook and the number of training samples on several widely used databases (15-Scenes, Caltech-256, PASCAL VOC07, and SUN397). Experimental findings firmly justify our theoretical analysis, which is expected to benefit both practical applications and future research.

computer vision and pattern recognition | 2011

Salient coding for image classification

Yongzhen Huang; Kaiqi Huang; Yinan Yu; Tieniu Tan

The codebook based (bag-of-words) model is a widely applied model for image classification. We analyze recent coding strategies in this model, and find that saliency is the fundamental characteristic of coding. The saliency in coding means that if a visual code is much closer to a descriptor than other codes, it will obtain a very strong response. The salient representation under maximum pooling operation leads to the state-of-the-art performance on many databases and competitions. However, most current coding schemes do not recognize the role of salient representation, so that they may lead to large deviations in representing local descriptors. In this paper, we propose “salient coding”, which employs the ratio between descriptors nearest code and other codes to describe descriptors. This approach can guarantee salient representation without deviations. We study salient coding on two sets of image classification databases (15-Scenes and PASCAL VOC2007). The experimental results demonstrate that our approach outperforms all other coding methods in image classification.

international conference on computer vision | 2015

Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks

Chunshui Cao; Xianming Liu; Yi Yang; Yinan Yu; Jiang Wang; Zilei Wang; Yongzhen Huang; Liang Wang; Chang Huang; Wei Xu; Deva Ramanan; Thomas S. Huang

While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to note that the human visual cortex generally contains more feedback than feedforward connections. In this paper, we will briefly introduce the background of feedbacks in the human visual cortex, which motivates us to develop a computational feedback mechanism in deep neural networks. In addition to the feedforward inference in traditional neural networks, a feedback loop is introduced to infer the activation status of hidden layer neurons according to the goal of the network, e.g., high-level semantic labels. We analogize this mechanism as Look and Think Twice. The feedback networks help better visualize and understand how deep neural networks work, and capture visual attention on expected objects, even in images with cluttered background and multiple objects. Experiments on ImageNet dataset demonstrate its effectiveness in solving tasks such as image classification and object localization.

systems man and cybernetics | 2011

Enhanced Biologically Inspired Model for Object Recognition

Yongzhen Huang; Kaiqi Huang; Dacheng Tao; Tieniu Tan; Xuelong Li

The biologically inspired model (BIM) proposed by Serre presents a promising solution to object categorization. It emulates the process of object recognition in primates visual cortex by constructing a set of scale- and position-tolerant features whose properties are similar to those of the cells along the ventral stream of visual cortex. However, BIM has potential to be further improved in two aspects: mismatch by dense input and randomly feature selection due to the feedforward framework. To solve or alleviate these limitations, we develop an enhanced BIM (EBIM) in terms of the following two aspects: 1) removing uninformative inputs by imposing sparsity constraints, 2) apply a feedback loop to middle level feature selection. Each aspect is motivated by relevant psychophysical research findings. To show the effectiveness of the EBIM, we apply it to object categorization and conduct empirical studies on four computer vision data sets. Experimental results demonstrate that the EBIM outperforms the BIM and is comparable to state-of-the-art approaches in terms of accuracy. Moreover, the new system is about 20 times faster than the BIM.The biologically inspired model (BIM) proposed by Serre presents a promising solution to object categorization. It emulates the process of object recognition in primates visual cortex by constructing a set of scale- and position-tolerant features whose properties are similar to those of the cells along the ventral stream of visual cortex. However, BIM has potential to be further improved in two aspects: mismatch by dense input and randomly feature selection due to the feedforward framework. To solve or alleviate these limitations, we develop an enhanced BIM (EBIM) in terms of the following two aspects: 1) removing uninformative inputs by imposing sparsity constraints, 2) apply a feedback loop to middle level feature selection. Each aspect is motivated by relevant psychophysical research findings. To show the effectiveness of the EBIM, we apply it to object categorization and conduct empirical studies on four computer vision data sets. Experimental results demonstrate that the EBIM outperforms the BIM and is comparable to state-of-the-art approaches in terms of accuracy. Moreover, the new system is about 20 times faster than the BIM.

computer vision and pattern recognition | 2008

Enhanced biologically inspired model

Yongzhen Huang; Kaiqi Huang; Liangsheng Wang; Dacheng Tao; Tieniu Tan; Xuelong Li

It has been demonstrated by Serre et al. that the biologically inspired model (BIM) is effective for object recognition. It outperforms many state-of-the-art methods in challenging databases. However, BIM has the following three problems: a very heavy computational cost due to dense input, a disputable pooling operation in modeling relations of the visual cortex, and blind feature selection in a feed-forward framework. To solve these problems, we develop an enhanced BIM (EBIM), which removes uninformative input by imposing sparsity constraints, utilizes a novel local weighted pooling operation with stronger physiological motivations, and applies a feedback procedure that selects effective features for combination. Empirical studies on the CalTech5 database and CalTech101 database show that EBIM is more effective and efficient than BIM. We also apply EBIM to the MIT-CBCL street scene database to show it achieves comparable performance in comparison with the current best performance. Moreover, the new system can process images with resolution 128 times 128 at a rate of 50 frames per second and enhances the speed 20 times at least in comparison with BIM in common applications.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs

Zifeng Wu; Yongzhen Huang; Liang Wang; Xiaogang Wang; Tieniu Tan

This paper studies an approach to gait based human identification via similarity learning by deep convolutional neural networks (CNNs). With a pretty small group of labeled multi-view human walking videos, we can train deep networks to recognize the most discriminative changes of gait patterns which suggest the change of human identity. To the best of our knowledge, this is the first work based on deep CNNs for gait recognition in the literature. Here, we provide an extensive empirical evaluation in terms of various scenarios, namely, cross-view and cross-walking-condition, with different preprocessing approaches and network architectures. The method is first evaluated on the challenging CASIA-B dataset in terms of cross-view gait recognition. Experimental results show that it outperforms the previous state-of-the-art methods by a significant margin. In particular, our method shows advantages when the cross-view angle is large, i.e., no less than 36 degree. And the average recognition rate can reach 94 percent, much better than the previous best result (less than 65 percent). The method is further evaluated on the OU-ISIR gait dataset to test its generalization ability to larger data. OU-ISIR is currently the largest dataset available in the literature for gait recognition, with 4,007 subjects. On this dataset, the average accuracy of our method under identical view conditions is above 98 percent, and the one for cross-view scenarios is above 91 percent. Finally, the method also performs the best on the USF gait dataset, whose gait sequences are imaged in a real outdoor scene. These results show great potential of this method for practical applications.This paper studies an approach to gait based human identification via similarity learning by deep convolutional neural networks (CNNs). With a pretty small group of labeled multi-view human walking videos, we can train deep networks to recognize the most discriminative changes of gait patterns which suggest the change of human identity. To the best of our knowledge, this is the first work based on deep CNNs for gait recognition in the literature. Here, we provide an extensive empirical evaluation in terms of various scenarios, namely, cross-view and cross-walking-condition, with different preprocessing approaches and network architectures. The method is first evaluated on the challenging CASIA-B dataset in terms of cross-view gait recognition. Experimental results show that it outperforms the previous state-of-the-art methods by a significant margin. In particular, our method shows advantages when the cross-view angle is large, i.e., no less than 36 degree. And the average recognition rate can reach 94 percent, much better than the previous best result (less than 65 percent). The method is further evaluated on the OU-ISIR gait dataset to test its generalization ability to larger data. OU-ISIR is currently the largest dataset available in the literature for gait recognition, with 4,007 subjects. On this dataset, the average accuracy of our method under identical view conditions is above 98 percent, and the one for cross-view scenarios is above 91 percent. Finally, the method also performs the best on the USF gait dataset, whose gait sequences are imaged in a real outdoor scene. These results show great potential of this method for practical applications.

computer vision and pattern recognition | 2011

Exploring relations of visual codes for image classification

Yongzhen Huang; Kaiqi Huang; Chong Wang; Tieniu Tan

The classic Bag-of-Features (BOF) model and its extensional work use a single value to represent a visual code. This strategy ignores the relation of visual codes. In this paper, we explore this relation and propose a new algorithm for image classification. It consists of two main parts: 1) construct the codebook graph wherein a visual code is linked with other codes; 2) describe each local feature using a pair of related codes, corresponding to an edge of the graph. Our approach contains richer information than previous BOF models. Moreover, we demonstrate that these models are special cases of ours. Various coding and pooling algorithms can be embedded into our framework to obtain better performance. Experiments on different kinds of image classification databases demonstrate that our approach can stably achieve excellent performance compared with various BOF models.

IEEE Transactions on Image Processing | 2017

Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks

Kaihao Zhang; Yongzhen Huang; Yong Du; Liang Wang

One key challenging issue of facial expression recognition is to capture the dynamic variation of facial physical structure from videos. In this paper, we propose a part-based hierarchical bidirectional recurrent neural network (PHRNN) to analyze the facial expression information of temporal sequences. Our PHRNN models facial morphological variations and dynamical evolution of expressions, which is effective to extract “temporal features” based on facial landmarks (geometry information) from consecutive frames. Meanwhile, in order to complement the still appearance information, a multi-signal convolutional neural network (MSCNN) is proposed to extract “spatial features” from still frames. We use both recognition and verification signals as supervision to calculate different loss functions, which are helpful to increase the variations of different expressions and reduce the differences among identical expressions. This deep evolutional spatial-temporal network (composed of PHRNN and MSCNN) extracts the partial-whole, geometry-appearance, and dynamic-still information, effectively boosting the performance of facial expression recognition. Experimental results show that this method largely outperforms the state-of-the-art ones. On three widely used facial expression databases (CK+, Oulu-CASIA, and MMI), our method reduces the error rates of the previous best ones by 45.5%, 25.8%, and 24.4%, respectively.

IEEE Transactions on Multimedia | 2015

Learning Representative Deep Features for Image Set Analysis

Zifeng Wu; Yongzhen Huang; Liang Wang

This paper proposes to learn features from sets of labeled raw images. With this method, the problem of over-fitting can be effectively suppressed, so that deep CNNs can be trained from scratch with a small number of training data, i.e., 420 labeled albums with about 30 000 photos. This method can effectively deal with sets of images, no matter if the sets bear temporal structures. A typical approach to sequential image analysis usually leverages motions between adjacent frames, while the proposed method focuses on capturing the co-occurrences and frequencies of features. Nevertheless, our method outperforms previous best performers in terms of album classification, and achieves comparable or even better performances in terms of gait based human identification. These results demonstrate its effectiveness and good adaptivity to different kinds of set data.

Explore More