Leonid Sigal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leonid Sigal is active.

Explore More

Publication

Featured researches published by Leonid Sigal.

european conference on computer vision | 2002

Implicit Probabilistic Models of Human Motion for Synthesis and Tracking

Hedvig Sidenbladh; Michael J. Black; Leonid Sigal

This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution. These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set; efficiency is particularly important for tracking. Towards that end, we learn a low dimensional linear model of human motion that is used to structure the example motion database into a binary tree. An approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time. This probabilistic tree search returns a particular sample human motion with probability approximating the true distribution of human motions in the database. This sampling method is suitable for use with particle filtering techniques and is applied to articulated 3D tracking of humans within a Bayesian framework. Successful tracking results are presented, along with examples of synthesizing human motion using the model.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

Skin color-based video segmentation under time-varying illumination

Leonid Sigal; Stan Sclaroff; Vassilis Athitsos

A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling, and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using maximum likelihood estimation and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences. In all but one case, the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.

computer vision and pattern recognition | 2006

Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation

Leonid Sigal; Michael J. Black

Part-based tree-structured models have been widely used for 2D articulated human pose-estimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body parts fit the same image region resulting in global pose estimates that poorly explain the overall image evidence. Attempts to solve this problem have focused on the use of strong prior models that are limited to learned activities such as walking. We argue that the problem actually lies with the image observations and not with the prior. In particular, image evidence for each body part is estimated independently of other parts without regard to self-occlusion. To address this we introduce occlusion-sensitive local likelihoods that approximate the global image likelihood using per-pixel hidden binary variables that encode the occlusion relationships between parts. This occlusion reasoning introduces interactions between non-adjacent body parts creating loops in the underlying graphical model. We deal with this using an extension of an approximate belief propagation algorithm (PAMPAS). The algorithm recovers the real-valued 2D pose of the body in the presence of occlusions, does not require strong priors over body pose and does a quantitatively better job of explaining image evidence than previous methods.

computer vision and pattern recognition | 2007

Detailed Human Shape and Pose from Images

Alexandru O. Balan; Leonid Sigal; Michael J. Black; James Davis; Horst W. Haussecker

Much of the research on video-based human motion capture assumes the body shape is known a priori and is represented coarsely (e.g. using cylinders or superquadrics to model limbs). These body models stand in sharp contrast to the richly detailed 3D body models used by the graphics community. Here we propose a method for recovering such models directly from images. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies. Previous work showed that the parameters of the SCAPE model could be estimated from marker-based motion capture data. Here we go further to estimate the parameters directly from image data. We define a cost function between image observations and a hypothesized mesh and formulate the problem as optimization over the body shape and pose parameters using stochastic search. Our results show that such rich generative models enable the automatic recovery of detailed human shape and pose from images.

international conference on computer vision | 2001

3D hand pose reconstruction using specialized mappings

Rómer Rosales; Vassilis Athitsos; Leonid Sigal; Stan Sclaroff

A system for recovering 3D hand pose from monocular color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMAs fundamental components are a set of specialized forward mapping functions, and a single feedback matching function. The forward functions are estimated directly from training data, which in our case are examples of hand joint configurations and their corresponding visual features. The joint angle data in the training set is obtained via a CyberGlove, a glove with 22 sensors that monitor the angular motions of the palm and fingers. In training, the visual features are generated using a computer graphics module that renders the hand from arbitrary viewpoints given the 22 joint angles. The viewpoint is encoded by two real values, therefore 24 real values represent a hand pose. We test our system both on synthetic sequences and on sequences taken with a color camera. The system automatically detects and tracks both bands of the user, calculates the appropriate features, and estimates the 3D hand joint angles and viewpoint from those features. Results are encouraging given the complexity of the task.

international conference on computer communications and networks | 2005

A Quantitative Evaluation of Video-based 3D Person Tracking

Alexandru O. Balan; Leonid Sigal; Michael J. Black

The Bayesian estimation of 3D human motion from video sequences is quantitatively evaluated using synchronized, multi-camera, calibrated video and 3D ground truth poses acquired with a commercial motion capture system. While many methods for human pose estimation and tracking have been proposed, to date there has been no quantitative comparison. Our goal is to evaluate how different design choices influence tracking performance. Toward that end, we independently implemented two fairly standard Bayesian person trackers using two variants of particle filtering and propose an evaluation measure appropriate for assessing the quality of probabilistic tracking methods. In the Bayesian framework we compare various image likelihood functions and prior models of human motion that have been proposed in the literature. Our results suggest that in constrained laboratory environments, current methods perform quite well. Multiple cameras and background subtraction, however, are required to achieve reliable tracking suggesting that many current methods may be inappropriate in more natural settings. We discuss the implications of the study and the directions for future research that it entails

computer vision and pattern recognition | 2013

Poselet Key-Framing: A Model for Human Activity Recognition

Michalis Raptis; Leonid Sigal

In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.

international conference on computer graphics and interactive techniques | 2011

Motion capture from body-mounted cameras

Takaaki Shiratori; Hyun Soo Park; Leonid Sigal; Yaser Sheikh; Jessica K. Hodgins

Motion capture technology generally requires that recordings be performed in a laboratory or closed stage setting with controlled lighting. This restriction precludes the capture of motions that require an outdoor setting or the traversal of large areas. In this paper, we present the theory and practice of using body-mounted cameras to reconstruct the motion of a subject. Outward-looking cameras are attached to the limbs of the subject, and the joint angles and root pose are estimated through non-linear optimization. The optimization objective function incorporates terms for image matching error and temporal continuity of motion. Structure-from-motion is used to estimate the skeleton structure and to provide initialization for the non-linear optimization procedure. Global motion is estimated and drift is controlled by matching the captured set of videos to reference imagery. We show results in settings where capture would be difficult or impossible with traditional motion capture systems, including walking outside and swinging on monkey bars. The quality of the motion reconstruction is evaluated by comparing our results against motion capture data produced by a commercially available optical system.

computer vision and pattern recognition | 2012

Social roles in hierarchical models for human activity recognition

Tian Lan; Leonid Sigal; Greg Mori

We present a hierarchical model for human activity recognition in entire multi-person scenes. Our model describes human behaviour at multiple levels of detail, ranging from low-level actions through to high-level events. We also include a model of social roles, the expected behaviours of certain people, or groups of people, in a scene. The hierarchical model includes these varied representations, and various forms of interactions between people present in a scene. The model is trained in a discriminative max-margin framework. Experimental results demonstrate that this model can improve performance at all considered levels of detail, on two challenging datasets.

computer vision and pattern recognition | 2008

Physical simulation for probabilistic motion tracking

Marek Vondrak; Leonid Sigal; Odest Chadwicke Jenkins

Human motion tracking is an important problem in computer vision. Most prior approaches have concentrated on efficient inference algorithms and prior motion models; however, few can explicitly account for physical plausibility of recovered motion. The primary purpose of this work is to enforce physical plausibility in the tracking of a single articulated human subject. Towards this end, we propose a full-body 3D physical simulation-based prior that explicitly incorporates motion control and dynamics into the Bayesian filtering framework. We consider the humanpsilas motion to be generated by a ldquocontrol looprdquo. In this control loop, Newtonian physics approximates the rigid-body motion dynamics of the human and the environment through the application and integration of forces. Collisions generate interaction forces to prevent physically impossible hypotheses. This allows us to properly model human motion dynamics, ground contact and environment interactions. For efficient inference in the resulting high-dimensional state space, we introduce exemplar-based control strategy to reduce the effective search space. As a result we are able to recover the physically-plausible kinematic and dynamic state of the body from monocular and multi-view imagery. We show, both quantitatively and qualitatively, that our approach performs favorably with respect to standard Bayesian filtering methods.

Explore More