Is this you? Create Your Porfile

James M. Rehg

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James M. Rehg is active.

Explore More

Publication

Featured researches published by James M. Rehg.

computer vision and pattern recognition | 1999

Statistical color models with application to skin detection

Michael Jones; James M. Rehg

The existence of large image datasets such as the set of photos on the World Wide Web make it possible to build powerful generic models for low-level image attributes like color using simple histogram learning techniques. We describe the construction of color models for skin and non-skin classes from a dataset of nearly 1 billion labelled pixels. These classes exhibit a surprising degree of separability which we exploit by building a skin pixel detector achieving a detection rate of 80% with 8.5% false positives. We compare the performance of histogram and mixture models in skin detection and find histogram models to be superior in accuracy and computational cost. Using aggregate features computed from the skin pixel detector we build a surprisingly effective detector for naked people. Our results suggest that color can be a more powerful cue for detecting people in unconstrained imagery than was previously suspected. We believe this work is the most comprehensive and detailed exploration of skin color models to date.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

CENTRIST: A Visual Descriptor for Scene Categorization

Jianxin Wu; James M. Rehg

CENsus TRansform hISTogram (CENTRIST), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, require its visual descriptor to possess properties that are different from other vision domains (e.g., object recognition). CENTRIST satisfies these properties and suits the place and scene recognition task. It is a holistic representation and has strong generalizability for category recognition. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information. Our experiments demonstrate that CENTRIST outperforms the current state of the art in several place and scene recognition data sets, compared with other descriptors such as SIFT and Gist. Besides, it is easy to implement and evaluates extremely fast.

international conference on computer vision | 1995

Model-based tracking of self-occluding articulated objects

James M. Rehg; Takeo Kanade

Computer sensing of hand and limb motion is an important problem for applications in human computer interaction and computer graphics. We describe a framework for local trading of self occluding motion, in which one part of an object obstructs the visibility of another. Our approach uses a kinematic model to predict occlusions and windowed templates to track partially occluded objects. We present offline 3D tracking results for hand motion with significant self occlusion.<<ETX>>

european conference on computer vision | 1994

Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking

James M. Rehg; Takeo Kanade

Passive sensing of human hand and limb motion is important for a wide range of applications from human-computer interaction to athletic performance measurement. High degree of freedom articulated mechanisms like the human hand are difficult to track because of their large state space and complex image appearance. This article describes a model-based hand tracking system, called DigitEyes, that can recover the state of a 27 DOF hand model from ordinary gray scale images at speeds of up to 10 Hz.

computer vision and pattern recognition | 1999

A multiple hypothesis approach to figure tracking

Tat-Jen Cham; James M. Rehg

This paper describes a probabilistic multiple-hypothesis framework for tracking highly articulated objects. In this framework, the probability density of the tracker state is represented as a set of modes with piecewise Gaussians characterizing the neighborhood around these modes. The temporal evolution of the probability density is achieved through sampling from the prior distribution, followed by local optimization of the sample positions to obtain updated modes. This method of generating hypotheses from state-space search does not require the use of discrete features unlike classical multiple-hypothesis tracking. The parametric form of the model is suited for high dimensional state-spaces which cannot be efficiently modeled using non-parametric approaches. Results are shown for tracking Fred Astaire in a movie dance sequence.

computer vision and pattern recognition | 2014

The Secrets of Salient Object Segmentation

Yin Li; Xiaodi Hou; Christof Koch; James M. Rehg; Alan L. Yuille

In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object benchmarks, called the dataset design bias, by over emphasising the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Finally, we report significant benchmark progress on 3 existing datasets of segmenting salient objects.

international conference on computer vision | 2007

A Scalable Approach to Activity Recognition based on Object Use

Jianxin Wu; Adebola Osuntogun; Tanzeem Choudhury; Matthai Philipose; James M. Rehg

We propose an approach to activity recognition based on detecting and analyzing the sequence of objects that are being manipulated by the user. In domains such as cooking, where many activities involve similar actions, object-use information can be a valuable cue. In order for this approach to scale to many activities and objects, however, it is necessary to minimize the amount of human-labeled data that is required for modeling. We describe a method for automatically acquiring object models from video without any explicit human supervision. Our approach leverages sparse and noisy readings from RFID tagged objects, along with common-sense knowledge about which objects are likely to be used during a given activity, to bootstrap the learning process. We present a dynamic Bayesian network model which combines RFID and video data to jointly infer the most likely activity and object labels. We demonstrate that our approach can achieve activity recognition rates of more than 80% on a real-world dataset consisting of 16 household activities involving 33 objects with significant background clutter. We show that the combination of visual object recognition with RFID data is significantly more effective than the RFID sensor alone. Our work demonstrates that it is possible to automatically learn object models from video of household activities and employ these models for activity recognition, without requiring any explicit human labeling.

international conference on computer vision | 2009

Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel

Jianxin Wu; James M. Rehg

Common visual codebook generation methods used in a Bag of Visual words model, e.g. k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks with histogram features. In this paper, we demonstrate that HIK can also be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as k-means. The HIK codebook has consistently higher recognition accuracy over k-means codebooks by 2–4%. In addition, we propose a one-class SVM formulation to create more effective visual code words which can achieve even higher accuracy. The proposed method has established new state-of-the-art performance numbers for 3 popular benchmark datasets on object and scene recognition. In addition, we show that the standard k-median clustering method can be used for visual codebook generation and can act as a compromise between HIK and k-means approaches.

international conference on computer vision | 1999

A dynamic Bayesian network approach to figure tracking using learned dynamic models

Vladimir Pavlovic; James M. Rehg; Tat-Jen Cham; Kevin P. Murphy

The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. However most work on tracking and synthesizing figure motion has employed either simple, generic dynamic models or highly specific hand-tailored ones. Recently, a broad class of learning and inference algorithms for time-series models have been successfully cast in the framework of dynamic Bayesian networks (DBNs). This paper describes a novel DBN-based switching linear dynamic system (SLDS) model and presents its application to figure motion analysis. A key feature of our approach is an approximate Viterbi inference technique for overcoming the intractability of exact inference in mixed-state DBNs. We present experimental results for learning figure dynamics from video data and show promising initial results for tracking, interpolation, synthesis, and classification using learned models.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Fast Asymmetric Learning for Cascade Face Detection

Jianxin Wu; S.C. Brubaker; Matthew D. Mullin; James M. Rehg

A cascade face detector uses a sequence of node classifiers to distinguish faces from nonfaces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper: The first is a categorization of asymmetries in the learning goal and why they make face detection hard. The second is the forward feature selection (FFS) algorithm and a fast precomputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is a linear asymmetric classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in an improved ensemble classifier performance.

Explore More