Is this you? Create Your Porfile

Zhen Zuo

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhen Zuo is active.

Explore More

Publication

Featured researches published by Zhen Zuo.

computer vision and pattern recognition | 2016

DAG-Recurrent Neural Networks for Scene Labeling

Bing Shuai; Zhen Zuo; Bing Wang; Gang Wang

In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Specifically, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured images, which enables the network to model long-range semantic dependencies among image units. Our DAG-RNNs are capable of tremendously enhancing the discriminative power of local representations, which significantly benefits the local classification. Meanwhile, we propose a novel class weighting function that attends to rare classes, which phenomenally boosts the recognition accuracy for non-frequent classes. Integrating with convolution and deconvolution layers, our DAG-RNNs achieve new state-of-the-art results on the challenging SiftFlow, CamVid and Barcelona benchmarks.

european conference on computer vision | 2014

Learning Discriminative and Shareable Features for Scene Classification

Zhen Zuo; Gang Wang; Bing Shuai; Lifan Zhao; Qingxiong Yang; Xudong Jiang

In this paper, we propose to learn a discriminative and shareable feature transformation filter bank to transform local image patches (represented as raw pixel values) into features for scene image classification. The learned filters are expected to: (1) encode common visual patterns of a flexible number of categories; (2) encode discriminative and class-specific information. For each category, a subset of the filters are activated in a data-adaptive manner, meanwhile sharing of filters among different categories is also allowed. Discriminative power of the filter bank is further enhanced by enforcing the features from the same category to be close to each other in the feature space, while features from different categories to be far away from each other. The experimental results on three challenging scene image classification datasets indicate that our features can achieve very promising performance. Furthermore, our features also show great complementary effect to the state-of-the-art ConvNets feature.

computer vision and pattern recognition | 2015

Convolutional recurrent neural networks: Learning spatial dependencies for image representation

Zhen Zuo; Bing Shuai; Gang Wang; Xiao Liu; Xingxing Wang; Bing Wang; Yushi Chen

In existing convolutional neural networks (CNNs), both convolution and pooling are locally performed for image regions separately, no contextual dependencies between different image regions have been taken into consideration. Such dependencies represent useful spatial structure information in images. Whereas recurrent neural networks (RNNs) are designed for learning contextual dependencies among sequential data by using the recurrent (feedback) connections. In this work, we propose the convolutional recurrent neural network (C-RNN), which learns the spatial dependencies between image regions to enhance the discriminative power of image representation. The C-RNN is trained in an end-to-end manner from raw pixel images. CNN layers are firstly processed to generate middle level features. RNN layer is then learned to encode spatial dependencies. The C-RNN can learn better image representation, especially for images with obvious spatial contextual dependencies. Our method achieves competitive performance on ILSVRC 2012, SUN 397, and MIT indoor.

Pattern Recognition | 2015

Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification

Zhen Zuo; Gang Wang; Bing Shuai; Lifan Zhao; Qingxiong Yang

In order to encode the class correlation and class specific information in image representation, we propose a new local feature learning approach named Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to hierarchically learn feature transformation filter banks to transform raw pixel image patches to features. The learned filter banks are expected to (1) encode common visual patterns of a flexible number of categories; (2) encode discriminative information; and (3) hierarchically extract patterns at different visual levels. Particularly, in each single layer of DDSFL, shareable filters are jointly learned for classes which share the similar patterns. Discriminative power of the filters is achieved by enforcing the features from the same category to be close, while features from different categories to be far away from each other. Furthermore, we also propose two exemplar selection methods to iteratively select training data for more efficient and effective learning. Based on the experimental results, DDSFL can achieve very promising performance, and it also shows great complementary effect to the state-of-the-art Caffe features. HighlightsWe propose to encode shareable and discriminative information in feature learning.Two exemplar selection methods are proposed to select effective training data.We build a hierarchical learning scheme to capture multiple visual level information.Our DDSFL outperforms most of the existing features.DDSFL features show great complementary effect to Caffe features.

computer vision and pattern recognition | 2015

Integrating parametric and non-parametric models for scene labeling

Bing Shuai; Gang Wang; Zhen Zuo; Bing Wang; Lifan Zhao

We adopt Convolutional Neural Networks (CNN) as our parametric model to learn discriminative features and classifiers for local patch classification. As visually similar pixels are indistinguishable from local context, we alleviate such ambiguity by introducing a global scene constraint. We estimate the global potential in a non-parametric framework. Furthermore, a large margin based CNN metric learning method is proposed for better global potential estimation. The final pixel class prediction is performed by integrating local and global beliefs. Even without any post-processing, we achieve state-of-the-art performance on SiftFlow and competitive results on Stanford Background benchmark.

IEEE Signal Processing Letters | 2015

Quaddirectional 2D-Recurrent Neural Networks For Image Labeling

Bing Shuai; Zhen Zuo; Gang Wang

We adopt Convolutional Neural Networks (CNN) to learn discriminative features for local patch classification. We further introduce quaddirectional 2D Recurrent Neural Networks to model the long range dependencies among pixels. Our quaddirectional 2D-RNN is able to embed the global image context into the compact local representation, which significantly enhance their discriminative power. Our experiments demonstrate that the integration of CNN and quaddirectional 2D-RNN achieves very promising results which are comparable to state-of-the-art on real-world image labeling benchmarks.

IEEE Transactions on Image Processing | 2016

Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks

Zhen Zuo; Bing Shuai; Gang Wang; Xiao Liu; Xingxing Wang; Bing Wang; Yushi Chen

Deep convolutional neural networks (CNNs) have shown their great success on image classification. CNNs mainly consist of convolutional and pooling layers, both of which are performed on local image areas without considering the dependence among different image regions. However, such dependence is very important for generating explicit image representation. In contrast, recurrent neural networks (RNNs) are well known for their ability of encoding contextual information in sequential data, and they only require a limited number of network parameters. Thus, we proposed the hierarchical RNNs (HRNNs) to encode the contextual dependence in image representation. In HRNNs, each RNN layer focuses on modeling spatial dependence among image regions from the same scale but different locations. While the cross RNN scale connections target on modeling scale dependencies among regions from the same location but different scales. Specifically, we propose two RNN models: 1) hierarchical simple recurrent network (HSRN), which is fast and has low computational cost and 2) hierarchical long-short term memory recurrent network, which performs better than HSRN with the price of higher computational cost. In this paper, we integrate CNNs with HRNNs, and develop end-to-end convolutional hierarchical RNNs (C-HRNNs) for image classification. C-HRNNs not only utilize the discriminative representation power of CNNs, but also utilize the contextual dependence learning ability of our HRNNs. On four of the most challenging object/scene image classification benchmarks, our C-HRNNs achieve the state-of-the-art results on Places 205, SUN 397, and MIT indoor, and the competitive results on ILSVRC 2012.

computer vision and pattern recognition | 2016

Joint Learning of Convolutional Neural Networks and Temporally Constrained Metrics for Tracklet Association

Bing Wang; Li Wang; Bing Shuai; Zhen Zuo; Ting Liu; Kap Luk Chan; Gang Wang

In this paper, we study the challenging problem of multiobject tracking in a complex scene captured by a single camera. Different from the existing tracklet associationbased tracking methods, we propose a novel and efficient way to obtain discriminative appearance-based tracklet affinity models. Our proposed method jointly learns the convolutional neural networks (CNNs) and temporally constrained metrics. In our method, a siamese convolutional neural network (CNN) is first pre-trained on the auxiliary data. Then the siamese CNN and temporally constrained metrics are jointly learned online to construct the appearance-based tracklet affinity models. The proposed method can jointly learn the hierarchical deep features and temporally constrained segment-wise metrics under a unified framework. For reliable association between tracklets, a novel loss function incorporating temporally constrained multi-task learning mechanism is proposed. By employing the proposed method, tracklet association can be accomplished even in challenging situations. Moreover, a largescale dataset with 40 fully annotated sequences is created to facilitate the tracking evaluation. Experimental results on five public datasets and the new large-scale dataset show that our method outperforms several state-of-the-art approaches in multi-object tracking.

IEEE Transactions on Image Processing | 2016

Scene Parsing With Integration of Parametric and Non-Parametric Models

Bing Shuai; Zhen Zuo; Gang Wang; Bing Wang

We adopt convolutional neural networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin-based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global beliefs gives rise to the class likelihood of pixels, based on which maximum marginal inference is performed to generate the label prediction maps. Even without any post-processing, we achieve the state-of-the-art results on the challenging SiftFlow and Barcelona benchmarks.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Scene Segmentation with DAG-Recurrent Neural Networks

Bing Shuai; Zhen Zuo; Bing Wang; Gang Wang

In this paper, we address the challenging task of scene segmentation. In order to capture the rich contextual dependencies over image regions, we propose Directed Acyclic Graph-Recurrent Neural Networks (DAG-RNN) to perform context aggregation over locally connected feature maps. More specifically, DAG-RNN is placed on top of pre-trained CNN (feature extractor) to embed context into local features so that their representative capability can be enhanced. In comparison with plain CNN (as in Fully Convolutional Networks-FCN), DAG-RNN is empirically found to be significantly more effective at aggregating context. Therefore, DAG-RNN demonstrates noticeably performance superiority over FCNs on scene segmentation. Besides, DAG-RNN entails dramatically less parameters as well as demands fewer computation operations, which makes DAG-RNN more favorable to be potentially applied on resource-constrained embedded devices. Meanwhile, the class occurrence frequencies are extremely imbalanced in scene segmentation, so we propose a novel class-weighted loss to train the segmentation network. The loss distributes reasonably higher attention weights to infrequent classes during network training, which is essential to boost their parsing performance. We evaluate our segmentation network on three challenging public scene segmentation benchmarks: Sift Flow, Pascal Context and COCO Stuff. On top of them, we achieve very impressive segmentation performance.

Explore More