Is this you? Create Your Porfile

Yongtian Wang

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongtian Wang is active.

Explore More

Publication

Featured researches published by Yongtian Wang.

international conference on computer vision | 2007

An Empirical Study of Object Category Recognition: Sequential Testing with Generalized Samples

Liang Lin; Shaowu Peng; Jake Porway; Song-Chun Zhu; Yongtian Wang

In this paper we present an empirical study of object category recognition using generalized samples and a set of sequential tests. We study 33 categories, each consisting of a small data set of 30 instances. To increase the amount of training data we have, we use a compositional object model to learn a representation for each category from which we select 30 additional templates with varied appearance from the training set. These samples better span the appearance space and form an augmented training set OmegaT of 1980 (60times33) training templates. To perform recognition on a testing image, we use a set of sequential tests to project OmegaT into different representation spaces to narrow the number of candidate matches in OmegaT. We usegraphlets(structural elements), as our local features and model OmegaT at each stage using histograms of graphlets over categories, histograms of graphlets over object instances, histograms of pairs of graphlets over objects, shape context. Each test is increasingly computationally expensive, and by the end of the cascade we have a small candidate set remaining to use with our most powerful test, a top-down graph matching algorithm. We achieve an 81.4 % classification rate on classifying 800 testing images in 33 categories, 15.2% more accurate than a method without generalized samples.

Pattern Recognition | 2012

Object categorization with sketch representation and generalized samples

Liang Lin; Xiaobai Liu; Shaowu Peng; Hongyang Chao; Yongtian Wang; Bo Jiang

In this paper, we present a framework for object categorization via sketch graphs that incorporate shape and structure information. In this framework, we integrate the learnable And-Or graph model, a hierarchical structure that combines the reconfigurability of a stochastic context free grammar (SCFG) with the constraints of a Markov random field (MRF). Considering the computation efficiency, we generalize instances from the And-Or graph models and perform a set of sequential tests for cascaded object categorization, rather than directly inferring with the And-Or graph models. We study 33 categories, each consisting of a small data set of 30 instances, and 30 additional templates with varied appearance are generalized from the learned And-Or graph model. These samples better span the appearance space and form an augmented training set @WT of 1980 (60x33) training templates. To perform recognition on a testing image, we use a set of sequential tests to project @WT into different representation spaces to narrow the number of candidate matches in @WT. We use graphlets (structural elements), as our local features and model @WT at each stage using histograms of graphlets over categories, histograms of graphlets over object instances, histograms of pairs of graphlets over objects, and shape context. Each test is increasingly computationally expensive, and by the end of the cascade we have a small candidate set remaining to use with our most powerful test, a top-down graph matching algorithm. We apply the proposed approach on the challenging public dataset including 33 object categories, and achieve state-of-the-art performance.

computer vision and pattern recognition | 2008

An integrated background model for video surveillance based on primal sketch and 3D scene geometry

Wenze Hu; Haifeng Gong; Song-Chun Zhu; Yongtian Wang

This paper presents a novel integrated background model for video surveillance. Our model uses a primal sketch representation for image appearance and 3D scene geometry to capture the ground plane and major surfaces in the scene. The primal sketch model divides the background image into three types of regions - flat, sketchable and textured. The three types of regions are modeled respectively by mixture of Gaussians, image primitives and LBP histograms. We calibrate the camera and recover important planes such as ground, horizontal surfaces, walls, stairs in the 3D scene, and use geometric information to predict the sizes and locations of foreground blobs to further reduce false alarms. Compared with the state-of-the-art background modeling methods, our approach is more effective, especially for indoor scenes where shadows, highlights and reflections of moving objects and camera exposure adjusting usually cause problems. Experiment results demonstrate that our approach improves the performance of background/foreground separation at pixel level, and the integrated video surveillance system at the object and trajectory level.

computer vision and pattern recognition | 2007

Layered Graph Match with Graph Editing

Liang Lin; Song-Chun Zhu; Yongtian Wang

Many vision tasks are posed as either graph partitioning (coloring) or graph matching (correspondence) problems. The former include segmentation and grouping, and the latter include wide baseline stereo, large motion, object tracking and recognition. In this paper, we present an integrated solution for both graph matching and graph partition using an effective sampling algorithm in a Bayesian framework. Given two images for matching, we extract two graphs using a primal sketch algorithm [4]. The graph nodes are linelets and primitives (junctions). Both graphs are automatically partitioned into an unknown number of K + 1 layers of subgraphs so that K pairs of subgraphs are matched and the remaining layer contains unmatched backgrounds. Each matched pair represent a moving object with a TPS (thin-plate-spline) transform to account for its deformations and a set of graph operators to edit the pair of subgraphs to achieve perfect structural match. The matching energy between two subgraphs includes geometric deformations, appearance dissimilarities, and the cost of graph editing operators. We demonstrate its application on two tasks: (i) large motion with occlusion, and (ii) automatic detection and recognition of common objects in a pair of images.

Pattern Recognition | 2016

Convex hull indexed Gaussian mixture model (CH-GMM) for 3D point set registration

Jingfan Fan; Jian Yang; Danni Ai; Likun Xia; Yitian Zhao; Xing Gao; Yongtian Wang

To solve the problem of rigid/non-rigid 3D point set registration, a novel convex hull indexed Gaussian mixture model (CH-GMM) is proposed in this paper. The model works by computing a weighted Gaussian mixture model (GMM) response over the convex hull of each point set. Three conditions, proximity, area conservation and projection consistency, are incorporated into the model so as to improve its performance. Given that the convex hull is the tightest convex set of a point set, the combination of Gaussian mixture and convex hull can effectively preserve the topological structure of a point set. Furthermore, computational complexity can be significantly reduced since only the GMM of the convex hull (instead of the whole point set) needs to be calculated. Rigid registration is achieved by seeking the best rigid transformation parameters yielding the most similar CH-GMM responses. Non-rigid deformation is realized by optimizing the coordinates of the control points used by the thin-plate spline model for interpolating the entire point set. Experiments are designed to evaluate a methods robustness to rotational changes between two point sets, positional noise, differences in density and partial overlap. The results demonstrated better robustness and registration accuracy of CH-GMM based method over state-of-the-art methods including iterative closest point, coherent point drift and the GMM method. Besides, the computation of CH-GMM is efficient. A novel CH-GMM method is proposed for non-rigid registration of point sets.The CH-GMM can effectively present the topological information of point set.The CH-GMM matching of the point sets is very fast.The registration is very robust for points with large differences.

international conference on computer vision | 2015

Deformable 3D Fusion: From Partial Dynamic 3D Observations to Complete 4D Models

Weipeng Xu; Mathieu Salzmann; Yongtian Wang; Yue Liu

Capturing the 3D motion of dynamic, non-rigid objects has attracted significant attention in computer vision. Existing methods typically require either complete 3D volumetric observations, or a shape template. In this paper, we introduce a template-less 4D reconstruction method that incrementally fuses highly-incomplete 3D observations of a deforming object, and generates a complete, temporally-coherent shape representation of the object. To this end, we design an online algorithm that alternatively registers new observations to the current model estimate and updates the model. We demonstrate the effectiveness of our approach at reconstructing non-rigidly moving objects from highly-incomplete measurements on both sequences of partial 3D point clouds and Kinect videos.

workshop on applications of computer vision | 2012

Reconfigurable templates for robust vehicle detection and classification

Yang Lv; Benjamin Z. Yao; Yongtian Wang; Song-Chun Zhu

In this paper, we learn a reconfigurable template for detecting vehicles and classifying their types. We adopt a popular design for the part based model that has one coarse template covering entire object window and several small high-resolution templates representing parts. The reconfigurable template can learn part configurations that capture the spatial correlation of features for a deformable part based model. The features of templates are Histograms of Gradients (HoG). In order to better describe the actual dimensions and locations of “parts” (i.e. features with strong spatial correlations), we design a dictionary of rectangular primitives of various sizes, aspect-ratios and positions. A configuration is defined as a subset of non-overlapping primitives from this dictionary. To learn the optimal configuration using SVM amounts, we need to find the subset of parts that minimize the regularized hinge loss, which leads to a non-convex optimization problem. We solve this problem by replacing the hinge loss with a negative sigmoid loss that can be approximately decomposed into losses (or negative sigmoid scores) of individual parts. In the experiment, we compare our method empirically with group lasso and a state of the art method [7] and demonstrate that models learned with our method outperform others on two computer vision applications: vehicle localization and vehicle model recognition.

european conference on computer vision | 2014

Nonrigid Surface Registration and Completion from RGBD Images

Weipeng Xu; Mathieu Salzmann; Yongtian Wang; Yue Liu

Nonrigid surface registration is a challenging problem that suffers from many ambiguities. Existing methods typically assume the availability of full volumetric data, or require a global model of the surface of interest. In this paper, we introduce an approach to nonrigid registration that performs on relatively low-quality RGBD images and does not assume prior knowledge of the global surface shape. To this end, we model the surface as a collection of patches, and infer the patch deformations by performing inference in a graphical model. Our representation lets us fill in the holes in the input depth maps, thus essentially achieving surface completion. Our experimental evaluation demonstrates the effectiveness of our approach on several sequences, as well as its robustness to missing data and occlusions.

international conference on computer vision | 2011

Inferring social roles in long timespan video sequence

Jiangen Zhang; Wenze Hu; Benjamin Z. Yao; Yongtian Wang; Song-Chun Zhu

In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agents position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agents feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including “manager”, “researcher”, “developer”, “engineer”, “staff”, “visitor” and “mailman”. Results show that our proposed method can predict the role of each agent with high precision.

international conference on image processing | 2013

Real-time keystone correction for hand-held projectors with an RGBD camera

Weipeng Xu; Yongtian Wang; Yue Liu; Dongdong Weng; Mengwen Tan; Mathieu Salzmann

This paper introduces a novel and simple approach to realtime continuous keystone correction for hand-held projectors. An RGBD camera is attached to the projector to form a projector-RGBD-camera system. The system is first calibrated in an offline stage. At run-time, we then estimate the relative pose between the projector and the screen using the RGBD camera, which lets us correct the keystone distortion by warping the projected image accordingly. Experimental results show that our method outperforms existing techniques in terms of both accuracy and efficiency.

Explore More