Is this you? Create Your Porfile

Xingyu Zeng

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xingyu Zeng is active.

Explore More

Publication

Featured researches published by Xingyu Zeng.

computer vision and pattern recognition | 2015

DeepID-Net: Deformable deep convolutional neural networks for object detection

Wanli Ouyang; Xiaogang Wang; Xingyu Zeng; Shi Qiu; Ping Luo; Yonglong Tian; Hongsheng Li; Shuo Yang; Zhe Wang; Chen Change Loy; Xiaoou Tang

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN [14], which was the state-of-the-art, from 31% to 50.3% on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1%. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provide a global view for people to understand the deep learning object detection pipeline.

international conference on computer vision | 2013

Multi-stage Contextual Deep Learning for Pedestrian Detection

Xingyu Zeng; Wanli Ouyang; Xiaogang Wang

Cascaded classifiers have been widely used in pedestrian detection and achieved great success. These classifiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of back propagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a specific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classifier handles samples at a different difficulty level. Unsupervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental results show that the training strategy helps to avoid over fitting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches.

computer vision and pattern recognition | 2013

Modeling Mutual Visibility Relationship in Pedestrian Detection

Wanli Ouyang; Xingyu Zeng; Xiaogang Wang

Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset, the Caltech-Test dataset and the ETH dataset. Including mutual visibility leads to 4% - 8% improvements on multiple benchmark datasets.

european conference on computer vision | 2014

Deep Learning of Scene-Specific Classifier for Pedestrian Detection

Xingyu Zeng; Wanli Ouyang; Meng Wang; Xiaogang Wang

The performance of a detector depends much on its training dataset and drops significantly when the detector is applied to a new scene due to the large variations between the source training dataset and the target scene. In order to bridge this appearance gap, we propose a deep model to automatically learn scene-specific features and visual patterns in static video surveillance without any manual labels from the target scene. It jointly learns a scene-specific classifier and the distribution of the target samples. Both tasks share multi-scale feature representations with both discriminative and representative power. We also propose a cluster layer in the deep model that utilizes the scene-specific visual patterns for pedestrian detection. Our specifically designed objective function not only incorporates the confidence scores of target training samples but also automatically weights the importance of source training samples by fitting the marginal distributions of target samples. It significantly improves the detection rates at 1 FPPI by 10% compared with the state-of-the-art domain adaptation methods on MIT Traffic Dataset and CUHK Square Dataset.

IEEE Transactions on Circuits and Systems for Video Technology | 2017

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

Kai Kang; Hongsheng Li; Junjie Yan; Xingyu Zeng; Bin Yang; Tong Xiao; Cong Zhang; Zhe Wang; Ruohui Wang; Xiaogang Wang; Wanli Ouyang

The state-of-the-art performance for object detection has been significantly improved over the past two years. Besides the introduction of powerful deep neural networks, such as GoogleNet and VGG, novel object detection frameworks, such as R-CNN and its successors, Fast R-CNN, and Faster R-CNN, play an essential role in improving the state of the art. Despite their effectiveness on still images, those frameworks are not specifically designed for object detection from videos. Temporal and contextual information of videos are not fully investigated and utilized. In this paper, we propose a deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos. It is called T-CNN, i.e., tubelets with convolutional neueral networks. The proposed framework won newly introduced an object-detection-from-video task with provided data in the ImageNet Large-Scale Visual Recognition Challenge 2015. Code is publicly available at https://github.com/myfavouritekk/T-CNN.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Crafting GBD-Net for Object Detection

Xingyu Zeng; Wanli Ouyang; Junjie Yan; Hongsheng Li; Tong Xiao; Kun Wang; Yu Liu; Yucong Zhou; Bin Yang; Zhe Wang; Hui Zhou; Xiaogang Wang

The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. Besides the GBD-Net, this paper also shows the details of our approach in winning the ImageNet object detection challenge of 2016, with source code provided on https://github.com/craftGBD/craftGBD. In this winning system, the modified GBD-Net, new pretraining scheme and better region proposal designs are provided. We also show the effectiveness of different network structures and existing techniques for object detection, such as multi-scale testing, left-right flip, bounding box voting, NMS, and context.

european conference on computer vision | 2016

Gated Bi-directional CNN for Object Detection

Xingyu Zeng; Wanli Ouyang; Bin Yang; Junjie Yan; Xiaogang Wang

The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. How to effectively integrate local and contextual visual cues from these regions has become a fundamental problem in object detection. Most existing works simply concatenated features or scores obtained from support regions. In this paper, we proposal a novel gated bi-directional CNN (GBD-Net) to pass messages between features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close iterations are modeled in a much more complex way. It is also shown that message passing is not always helpful depending on individual samples. Gated functions are further introduced to control message transmission and their on-and-off is controlled by extra visual evidence from the input sample. GBD-Net is implemented under the Fast RCNN detection framework. Its effectiveness is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

international conference on computer vision | 2015

Learning Deep Representation with Large-Scale Attributes

Wanli Ouyang; Hongyang Li; Xingyu Zeng; Xiaogang Wang

Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.

arXiv: Computer Vision and Pattern Recognition | 2014

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

Wanli Ouyang; Ping Luo; Xingyu Zeng; Shi Qiu; Yonglong Tian; Hongsheng Li; Shuo Yang; Zhe Wang; Yuanjun Xiong; Chen Qian; Zhenyao Zhu; Ruohui Wang; Chen Change Loy; Xiaogang Wang; Xiaoou Tang

International Journal of Computer Vision | 2016