Zequn Jie
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zequn Jie.
computer vision and pattern recognition | 2016
Xiaodan Liang; Yunchao Wei; Xiaohui Shen; Zequn Jie; Jiashi Feng; Liang Lin; Shuicheng Yan
In this work, we propose a novel Reversible Recursive Instance-level Object Segmentation (R2-IOS) framework to address the challenging instance-level object segmentation task. R2-IOS consists of a reversible proposal refinement sub-network that predicts bounding box offsets for refining the object proposal locations, and an instance-level segmentation sub-network that generates the foreground mask of the dominant object instance in each proposal. By being recursive, R2-IOS iteratively optimizes the two subnetworks during joint training, in which the refined object proposals and improved segmentation predictions are alternately fed into each other to progressively increase the network capabilities. By being reversible, the proposal refinement sub-network adaptively determines an optimal number of refinement iterations required for each proposal during both training and testing. Furthermore, to handle multiple overlapped instances within a proposal, an instance-aware denoising autoencoder is introduced into the segmentation sub-network to distinguish the dominant object from other distracting instances. Extensive experiments on the challenging PASCAL VOC 2012 benchmark well demonstrate the superiority of R2-IOS over other state-of-the-art methods. In particular, the APr over 20 classes at 0:5 IoU achieves 66:7%, which significantly outperforms the results of 58:7% by PFN [17] and 46:3% by [22].
Pattern Recognition | 2016
Yunchao Wei; Xiaodan Liang; Yunpeng Chen; Zequn Jie; Yanhui Xiao; Yao Zhao; Shuicheng Yan
Recently, deep convolutional neural networks (DCNNs) have significantly promoted the development of semantic image segmentation. However, previous works on learning the segmentation network often rely on a large number of ground-truths with pixel-level annotations, which usually require considerable human effort. In this paper, we explore a more challenging problem by learning to segment under image-level annotations. Specifically, our framework consists of two components. First, reliable hypotheses based localization maps are generated by incorporating the hypotheses-aware classification and cross-image contextual refinement. Second, the segmentation network can be trained in a supervised manner by these generated localization maps. We explore two network training strategies for achieving good segmentation performance. For the first strategy, a novel multi-label cross-entropy loss is proposed to train the network by directly using multiple localization maps for all classes, where each pixel contributes to each class with different weights. For the second strategy, the rough segmentation mask can be inferred from the localization maps, and then the network is optimized based on the single-label cross-entropy loss with the produced masks. We evaluate our methods on the PASCAL VOC 2012 segmentation benchmark. Extensive experimental results demonstrate the effectiveness of the proposed methods compared with the state-of-the-arts. HighlightsLocalization map generation is proposed by using the hypothesis-based classification.A novel multi-label loss is proposed to train the network based on localization maps.An effective method is proposed to predict the rough mask of the given training image.Our methods achieve new state-of-the-art results on PASCAL VOC 2012 benchmark.
computer vision and pattern recognition | 2017
Zequn Jie; Yunchao Wei; Xiaojie Jin; Jiashi Feng; Wei Liu
Most existing weakly supervised localization (WSL) approaches learn detectors by finding positive bounding boxes based on features learned with image-level supervision. However, those features do not contain spatial location related information and usually provide poor-quality positive samples for training a detector. To overcome this issue, we propose a deep self-taught learning approach, which makes the detector learn the object-level features reliable for acquiring tight positive samples and afterwards re-train itself based on them. Consequently, the detector progressively improves its detection ability and localizes more informative positive samples. To implement such self-taught learning, we propose a seed sample acquisition method via image-to-object transferring and dense subgraph discovery to find reliable positive samples for initializing the detector. An online supportive sample harvesting scheme is further proposed to dynamically select the most confident tight positive samples and train the detector in a mutual boosting way. To prevent the detector from being trapped in poor optima due to overfitting, we propose a new relative improvement of predicted CNN scores for guiding the self-taught learning process. Extensive experiments on PASCAL 2007 and 2012 show that our approach outperforms the state-of-the-arts, strongly validating its effectiveness.
IEEE Transactions on Circuits and Systems for Video Technology | 2017
Hao Liu; Zequn Jie; Karlekar Jayashree; Meibin Qi; Jianguo Jiang; Shuicheng Yan; Jiashi Feng
Video-based person re-identification plays a central role in realistic security and video surveillance. In this paper, we propose a novel accumulative motion context (AMOC) network for addressing this important problem, which effectively exploits the long-range motion context for robustly identifying the same person under challenging conditions. Given a video sequence of the same or different persons, the proposed AMOC network jointly learns appearance representation and motion context from a collection of adjacent frames using a two-stream convolutional architecture. Then, AMOC accumulates clues from motion context by recurrent aggregation, allowing effective information flow among adjacent frames and capturing dynamic gist of the persons. The architecture of AMOC is end-to-end trainable, and thus, motion context can be adapted to complement appearance clues under unfavorable conditions (e.g., occlusions). Extensive experiments are conduced on three public benchmark data sets, i.e., the iLIDS-VID, PRID-2011, and MARS data sets, to investigate the performance of AMOC. The experimental results demonstrate that the proposed AMOC network outperforms state-of-the-arts for video-based re-identification significantly and confirm the advantage of exploiting long-range motion context for video-based person re-identification, validating our motivation evidently.
IEEE Transactions on Image Processing | 2016
Zequn Jie; Xiaodan Liang; Jiashi Feng; Wen Feng Lu; Eng Hock Tay; Shuicheng Yan
Object proposal is essential for current state-of-the-art object detection pipelines. However, the existing proposal methods generally fail in producing results with satisfying localization accuracy. The case is even worse for small objects, which, however, are quite common in practice. In this paper, we propose a novel scale-aware pixelwise object proposal network (SPOP-net) to tackle the challenges. The SPOP-net can generate proposals with high recall rate and average best overlap, even for small objects. In particular, in order to improve the localization accuracy, a fully convolutional network is employed which predicts locations of object proposals for each pixel. The produced ensemble of pixelwise object proposals enhances the chance of hitting the object significantly without incurring heavy extra computational cost. To solve the challenge of localizing objects at small scale, two localization networks, which are specialized for localizing objects with different scales are introduced, following the divide-and-conquer philosophy. Location outputs of these two networks are then adaptively combined to generate the final proposals by a large-/small-size weighting network. Extensive evaluations on PASCAL VOC 2007 and COCO 2014 show the SPOP network is superior over the state-of-the-art models. The high-quality proposals from SPOP-net also significantly improve the mean average precision of object detection with Fast-Regions with CNN features framework. Finally, the SPOP-net (trained on PASCAL VOC) shows great generalization performance when testing it on ILSVRC 2013 validation set.
IEEE Transactions on Circuits and Systems for Video Technology | 2018
Zequn Jie; Wen Feng Lu; Siavash Sakhavi; Yunchao Wei; Eng Hock Tay; Shuicheng Yan
Object proposal generation, as a preprocessing technique, has been widely used in current object detection pipelines to guide the search of objects and avoid exhaustive sliding window search across images. Current object proposals are mostly based on low-level image cues, such as edges and saliency. However, objectness is possibly a high-level semantic concept showing whether one region contains objects. This paper presents a framework utilizing fully convolutional networks (FCNs) to produce object proposal positions and bounding box location refinement with Support Vector Machine (SVM) to further improve proposal localization. Experiments on the PASCAL VOC 2007 show that using high-level semantic object proposals obtained by FCN, the object recall can be improved. An improvement in detection mean average precision is also seen when using our proposals in the Fast R-convolutional neural network framework. In addition, we also demonstrate that our method shows stronger robustness when introduced to image perturbations, e.g., blurring, JPEG compression, and salt and pepper noise. Finally, the generalization capability of our model (trained on the PASCAL VOC 2007) is evaluated and validated by testing on PASCAL VOC 2012 validation set, ILSVRC 2013 validation set, and MS COCO 2014 validation set.
machine learning and data mining in pattern recognition | 2016
Zequn Jie; Wen Feng Lu; Eng Hock Tay
Vision-based on-road vehicle detection is one of the key problems for autonomous vehicles. Conventional vision-based on-road vehicle detection methods mainly rely on hand-crafted features, such as SIFT and HOG. These hand-crafted features normally require expensive human labor and expert knowledge. Also, they suffer from poor generalization and slow running speed. Therefore, they are difficult to be applied in realistic application which demands accurate and fast detection in all kinds of unpredictable complex environmental conditions. This paper presents a framework utilizing fully convolutional networks (FCN) to produce bounding boxes with high confidence to contain a vehicle, and bounding box location refinement with SVM to further improve localization accuracy. Experiments on the PASCAL VOC 2007 and LISA-Q benchmarks show that using high-level semantic vehicle confidence obtained by FCN, higher precision and recall are achieved. Additionally, FCN enables whole image inference, which makes the proposed method much faster than the object proposal or hand-crafted feature based detectors.
international conference on computer vision | 2017
Hao Liu; Jiashi Feng; Zequn Jie; Karlekar Jayashree; Bo Zhao; Meibin Qi; Jianguo Jiang; Shuicheng Yan
neural information processing systems | 2016
Zequn Jie; Xiaodan Liang; Jiashi Feng; Xiaojie Jin; Wen Feng Lu; Shuicheng Yan
international conference on computer vision | 2017
Xiaojie Jin; Xin Li; Huaxin Xiao; Xiaohui Shen; Zhe Lin; Jimei Yang; Yunpeng Chen; Jian Dong; Luoqi Liu; Zequn Jie; Jiashi Feng; Shuicheng Yan