Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wenze Hu is active.

Publication


Featured researches published by Wenze Hu.


computer vision and pattern recognition | 2008

An integrated background model for video surveillance based on primal sketch and 3D scene geometry

Wenze Hu; Haifeng Gong; Song-Chun Zhu; Yongtian Wang

This paper presents a novel integrated background model for video surveillance. Our model uses a primal sketch representation for image appearance and 3D scene geometry to capture the ground plane and major surfaces in the scene. The primal sketch model divides the background image into three types of regions - flat, sketchable and textured. The three types of regions are modeled respectively by mixture of Gaussians, image primitives and LBP histograms. We calibrate the camera and recover important planes such as ground, horizontal surfaces, walls, stairs in the 3D scene, and use geometric information to predict the sizes and locations of foreground blobs to further reduce false alarms. Compared with the state-of-the-art background modeling methods, our approach is more effective, especially for indoor scenes where shadows, highlights and reflections of moving objects and camera exposure adjusting usually cause problems. Experiment results demonstrate that our approach improves the performance of background/foreground separation at pixel level, and the integrated video surveillance system at the object and trajectory level.


computer vision and pattern recognition | 2010

Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition

Wenze Hu; Song-Chun Zhu

This paper presents a method learning mixed templates for view invariant object recognition. The template is composed of 3D and 2D primitives which are stick-like elements defined in 3D and 2D spaces respectively. The primitives are allowed to perturb within a local range to account for instance variations of an object category. When projected onto images, the appearance of these primitives are represented by Gabor filters. Both 3D and 2D primitives have parameters describing their visible range in a viewing hemisphere. Our algorithm sequentially selects primitives and builds a probabilistic model using the selected primitives. The order of this sequential selection is decided by the information gains of primitives, which can be estimated together with the visible range parameter efficiently. In experiments, we evaluate performance of the learned 3D templates on car recognition and pose estimation. We also show that the algorithm can learn intuitive mixed templates on various object categories, which suggests that our method could be used as a numerical method to justify the debate over viewer-centered and object-centered representations.


international conference on computer vision | 2013

Modeling Occlusion by Discriminative AND-OR Structures

Bo Li; Wenze Hu; Tianfu Wu; Song-Chun Zhu

Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) DPM, voc5 in car detection on both our self-collected street parking dataset and the Pascal VOC 2007 car dataset pascal-voc-2007}, (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.


International Journal of Computer Vision | 2015

Learning Sparse FRAME Models for Natural Image Patterns

Jianwen Xie; Wenze Hu; Song-Chun Zhu; Ying Nian Wu

It is well known that natural images admit sparse representations by redundant dictionaries of basis functions such as Gabor-like wavelets. However, it is still an open question as to what the next layer of representational units above the layer of wavelets should be. We address this fundamental question by proposing a sparse FRAME (Filters, Random field, And Maximum Entropy) model for representing natural image patterns. Our sparse FRAME model is an inhomogeneous generalization of the original FRAME model. It is a non-stationary Markov random field model that reproduces the observed statistical properties of filter responses at a subset of selected locations, scales and orientations. Each sparse FRAME model is intended to represent an object pattern and can be considered a deformable template. The sparse FRAME model can be written as a shared sparse coding model, which motivates us to propose a two-stage algorithm for learning the model. The first stage selects the subset of wavelets from the dictionary by a shared matching pursuit algorithm. The second stage then estimates the parameters of the model given the selected wavelets. Our experiments show that the sparse FRAME models are capable of representing a wide variety of object patterns in natural images and that the learned models are useful for object classification.


Quarterly of Applied Mathematics | 2013

Unsupervised learning of compositional sparse code for natural image representation

Yi Hong; Zhangzhang Si; Wenze Hu; Song-Chun Zhu; Ying Nian Wu

This article proposes an unsupervised method for learning compositional sparse code for representing natural images. Our method is built upon the original sparse coding framework where there is a dictionary of basis functions often in the form of localized, elongated and oriented wavelets, so that each image can be represented by a linear combination of a small number of basis functions automatically selected from the dictionary. In our compositional sparse code, the representational units are composite: they are compositional patterns formed by the basis functions. These compositional patterns can be viewed as shape templates. We propose an unsupervised learning method for learning a dictionary of frequently occurring templates from training images, so that each training image can be represented by a small number of templates automatically selected from the learned dictionary. The compositional sparse code approximates the raw image of a large number of pixel intensities using a small number of templates, thus facilitating the signal-to-symbol transition and allowing a symbolic description of the image. The current form of our model consists of two layers of representational units (basis functions and shape templates). It is possible to extend it to multiple layers of hierarchy. Experiments show that our method is capable of learning meaningful compositional sparse code, and the learned templates are useful for image classification. Received October 23, 2012 and, in revised form, February 10, 2013. 2000 Mathematics Subject Classification. Primary 62M40. E-mail address: [email protected] E-mail address: [email protected] E-mail address: [email protected] E-mail address: [email protected] E-mail address: [email protected] c ©2013 Brown University 373 Licensed to Univ of Calif, Los Angeles. Prepared on Fri Aug 22 16:05:31 EDT 2014 for download from IP 169.232.212.167. License or copyright restrictions may apply to redistribution; see http://www.ams.org/license/jour-dist-license.pdf 374 YI HONG, ZHANGZHANG SI, WENZE HU, SONG-CHUN ZHU, AND YING NIAN WU


computer vision and pattern recognition | 2012

Learning 3D object templates by hierarchical quantization of geometry and appearance spaces

Wenze Hu

This paper presents a method for learning 3D object templates from view labeled object images. The 3D template is defined in a joint appearance and geometry space composed of deformable planar part templates placed at different 3D positions and orientations. Appearance of each part template is represented by Gabor filters, which are hierarchically grouped into line segments and geometric shapes. AND-OR trees are further used to quantize the possible geometry and appearance of part templates, so that learning can be done on a subsampled discrete space. Using information gain as a criterion, the best 3D template can be searched through the AND-OR trees using one bottom-up pass and one top-down pass. Experiments on a new car dataset with diverse views show that the proposed method can learn meaningful 3D car templates, and give satisfactory detection and view estimation performance. Experiments are also performed on a public car dataset, which show comparable performance with recent methods.


computer vision and pattern recognition | 2014

Unsupervised Learning of Dictionaries of Hierarchical Compositional Models

Jifeng Dai; Yi Hong; Wenze Hu; Song-Chun Zhu; Ying Nian Wu

This paper proposes an unsupervised method for learning dictionaries of hierarchical compositional models for representing natural images. Each model is in the form of a template that consists of a small group of part templates that are allowed to shift their locations and orientations relative to each other, and each part template is in turn a composition of Gabor wavelets that are also allowed to shift their locations and orientations relative to each other. Given a set of unannotated training images, a dictionary of such hierarchical templates are learned so that each training image can be represented by a small number of templates that are spatially translated, rotated and scaled versions of the templates in the learned dictionary. The learning algorithm iterates between the following two steps: (1) Image encoding by a template matching pursuit process that involves a bottom-up template matching sub-process and a top-down template localization sub-process. (2) Dictionary re-learning by a shared matching pursuit process. Experimental results show that the proposed approach is capable of learning meaningful templates, and the learned templates are useful for tasks such as domain adaption and image cosegmentation.


international conference on computer vision | 2011

Image representation by active curves

Wenze Hu; Ying Nian Wu; Song-Chun Zhu

This paper proposes a sparse image representation using deformable templates of simple geometric structures that are commonly observed in images of natural scenes. These deformable templates include active curve templates and active corner templates. An active curve template is a composition of Gabor wavelet elements placed with equal spacing on a straight line segment or a circular arc segment of constant curvature, where each Gabor wavelet element is allowed to locally shift its location and orientation, so that the original line and arc segment of the active curve template can be deformed to fit the observed image. An active corner or angle template is a composition of two active curve templates that share a common end point, and the active curve templates are allowed to vary their overall lengths and curvatures, so that the original corner template can deform to match the observed image. This paper then proposes a hierarchical computational architecture of summax maps that pursues a sparse representation of an image by selecting a small number of active curve and corner templates from a dictionary of all such templates. Experiments show that the proposed method is capable of finding sparse representations of natural images. It is also shown that object templates can be learned by selecting and composing active curve and corner templates.


Pattern Recognition | 2014

Coupling-and-decoupling: A hierarchical model for occlusion-free object detection

Bo Li; Xi Song; Tianfu Wu; Wenze Hu; Mingtao Pei

Abstract Handling occlusion is a very challenging problem in object detection. This paper presents a method of learning a hierarchical model for X-to-X occlusion-free object detection (e.g., car-to-car and person-to-person occlusions in our experiments). The proposed method is motivated by an intuitive coupling-and-decoupling strategy. In the learning stage, the pair of occluding X׳s (e.g., car pairs or person pairs) is represented directly and jointly by a hierarchical And–Or directed acyclic graph (AOG) which accounts for the statistically significant co-occurrence (i.e., coupling). The structure and the parameters of the AOG are learned using the latent structural SVM (LSSVM) framework. In detection, a dynamic programming (DP) algorithm is utilized to find the best parse trees for all sliding windows with detection scores being greater than the learned threshold. Then, the two single X׳s are decoupled from the declared detections of X-to-X occluding pairs together with some non-maximum suppression (NMS) post-processing. In experiments, our method is tested on both a roadside-car dataset collected by ourselves (which will be released with this paper) and two public person datasets, the MPII-2Person dataset and the TUD-Crossing dataset. Our method is compared with state-of-the-art deformable part-based methods, and obtains comparable or better detection performance.


international conference on computer vision | 2011

Inferring social roles in long timespan video sequence

Jiangen Zhang; Wenze Hu; Benjamin Z. Yao; Yongtian Wang; Song-Chun Zhu

In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agents position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agents feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including “manager”, “researcher”, “developer”, “engineer”, “staff”, “visitor” and “mailman”. Results show that our proposed method can predict the role of each agent with high precision.

Collaboration


Dive into the Wenze Hu's collaboration.

Top Co-Authors

Avatar

Song-Chun Zhu

University of California

View shared research outputs
Top Co-Authors

Avatar

Ying Nian Wu

University of California

View shared research outputs
Top Co-Authors

Avatar

Tianfu Wu

University of California

View shared research outputs
Top Co-Authors

Avatar

Bo Li

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Haifeng Gong

University of California

View shared research outputs
Top Co-Authors

Avatar

Yi Hong

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mingtao Pei

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yongtian Wang

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge