Yaser Sheikh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yaser Sheikh is active.

Explore More

Publication

Featured researches published by Yaser Sheikh.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Bayesian modeling of dynamic scenes for object detection

Yaser Sheikh; Mubarak Shah

Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foregrounds modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes.

computer vision and pattern recognition | 2016

Convolutional Pose Machines

Shih-En Wei; Varun Ramakrishna; Takeo Kanade; Yaser Sheikh

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

international conference on computer vision | 2009

Background Subtraction for Freely Moving Cameras

Yaser Sheikh; Omar Javed; Takeo Kanade

Background subtraction algorithms define the background as parts of a scene that are at rest. Traditionally, these algorithms assume a stationary camera, and identify moving objects by detecting areas in a video that change over time. In this paper, we extend the concept of ‘subtracting’ areas at rest to apply to video captured from a freely moving camera. We do not assume that the background is well-approximated by a plane or that the camera center remains stationary during motion. The method operates entirely using 2D image measurements without requiring an explicit 3D reconstruction of the scene. A sparse model of background is built by robustly estimating a compact trajectory basis from trajectories of salient features across the video, and the background is ‘subtracted’ by removing trajectories that lie within the space spanned by the basis. Foreground and background appearance models are then built, and an optimal pixel-wise foreground/background labeling is obtained by efficiently maximizing a posterior function.

computer vision and pattern recognition | 2017

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Zhe Cao; Tomas Simon; Shih-En Wei; Yaser Sheikh

We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed first in the inaugural COCO 2016 keypoints challenge, and significantly exceeds the previous state-of-the-art result on the MPII Multi-Person benchmark, both in performance and efficiency.

international conference on computer vision | 2005

Exploring the space of a human action

Yaser Sheikh; Mumtaz Sheikh; Mubarak Shah

One of the fundamental challenges of recognizing actions is accounting for the variability that arises when arbitrary cameras capture humans performing actions. In this paper, we explicitly identify three important sources of variability: (1) viewpoint, (2) execution rate, and (3) anthropometry of actors, and propose a model of human actions that allows us to investigate all three. Our hypothesis is that the variability associated with the execution of an action can be closely approximated by a linear combination of action bases in joint spatio-temporal space. We demonstrate that such a model bounds the rank of a matrix of image measurements and that this bound can be used to achieve recognition of actions based only on imaged data. A test employing principal angles between subspaces that is robust to statistical fluctuations in measurement data is presented to find the membership of an instance of an action. The algorithm is applied to recognize several actions, and promising results have been obtained.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Trajectory Space: A Dual Representation for Nonrigid Structure from Motion

Ijaz Akhter; Yaser Sheikh; Sohaib Khan; Takeo Kanade

Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes. These bases are object dependent and therefore have to be estimated anew for each video sequence. In contrast, we propose a dual approach to describe the evolving 3D structure in trajectory space by a linear combination of basis trajectories. We describe the dual relationship between the two approaches, showing that they both have equal power for representing 3D structure. We further show that the temporal smoothness in 3D trajectories alone can be used for recovering nonrigid structure from a moving camera. The principal advantage of expressing deforming 3D structure in trajectory space is that we can define an object independent basis. This results in a significant reduction in unknowns and corresponding stability in estimation. We propose the use of the Discrete Cosine Transform (DCT) as the object independent basis and empirically demonstrate that it approaches Principal Component Analysis (PCA) for natural motions. We report the performance of the proposed method, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions, including piecewise rigid motion, partially nonrigid motion (such as a facial expressions), and highly nonrigid motion (such as a person walking or dancing).

international conference on computer graphics and interactive techniques | 2011

Motion capture from body-mounted cameras

Takaaki Shiratori; Hyun Soo Park; Leonid Sigal; Yaser Sheikh; Jessica K. Hodgins

Motion capture technology generally requires that recordings be performed in a laboratory or closed stage setting with controlled lighting. This restriction precludes the capture of motions that require an outdoor setting or the traversal of large areas. In this paper, we present the theory and practice of using body-mounted cameras to reconstruct the motion of a subject. Outward-looking cameras are attached to the limbs of the subject, and the joint angles and root pose are estimated through non-linear optimization. The optimization objective function incorporates terms for image matching error and temporal continuity of motion. Structure-from-motion is used to estimate the skeleton structure and to provide initialization for the non-linear optimization procedure. Global motion is estimated and drift is controlled by matching the captured set of videos to reference imagery. We show results in settings where capture would be difficult or impossible with traditional motion capture systems, including walking outside and swinging on monkey bars. The quality of the motion reconstruction is evaluated by comparing our results against motion capture data produced by a commercially available optical system.

international conference on computer vision | 2007

Mode-seeking by Medoidshifts

Yaser Sheikh; Erum Arif Khan; Takeo Kanade

We present a nonparametric mode-seeking algorithm, called medoidshift, based on approximating the local gradient using a weighted estimate of medoids. Like meanshift, medoidshift clustering automatically computes the number of clusters and the data does not have to be linearly separable. Unlike meanshift, the proposed algorithm does not require the definition of a mean. This property allows medoidshift to find modes even when only a distance measure between samples is defined. In this sense, the relationship between the medoidshift algorithm and the meanshift algorithm is similar to the relationship between the k-medoids and the k-means algorithms. We show that medoidshifts can also be used for incremental clustering of growing datasets by recycling previous computations. We present experimental results using medoidshift for image segmentation, incremental clustering for shot segmentation and clustering on nonlinearly separable data.

computer vision and pattern recognition | 2005

Bayesian object detection in dynamic scenes

Yaser Sheikh; Mubarak Shah

Detecting moving objects using stationary cameras is an important precursor to many activity recognition, object recognition and tracking algorithms. In this paper, three innovations are presented over existing approaches. Firstly, the model of the intensities of image pixels as independently distributed random variables is challenged and it is asserted that useful correlation exists in the intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of nominal camera motion and dynamic textures. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, multi-modal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. Secondly, temporal persistence is proposed as a detection criteria. Unlike previous approaches to object detection which detect objects by building adaptive models of the only background, the foreground is also modeled to augment the detection of objects (without explicit tracking) since objects detected in a preceding frame contain substantial evidence for detection in a current frame. Third, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of pixel-wise labeling and the posterior function is maximized efficiently using graph cuts. Experimental validation of the proposed method is presented on a diverse set of dynamic scenes.

european conference on computer vision | 2012

Reconstructing 3d human pose from 2d image landmarks

Varun Ramakrishna; Takeo Kanade; Yaser Sheikh

Reconstructing an arbitrary configuration of 3D points from their projection in an image is an ill-posed problem. When the points hold semantic meaning, such as anatomical landmarks on a body, human observers can often infer a plausible 3D configuration, drawing on extensive visual memory. We present an activity-independent method to recover the 3D configuration of a human figure from 2D locations of anatomical landmarks in a single image, leveraging a large motion capture corpus as a proxy for visual memory. Our method solves for anthropometrically regular body pose and explicitly estimates the camera via a matching pursuit algorithm operating on the image projections. Anthropometric regularity (i.e., that limbs obey known proportions) is a highly informative prior, but directly applying such constraints is intractable. Instead, we enforce a necessary condition on the sum of squared limb-lengths that can be solved for in closed form to discourage implausible configurations in 3D. We evaluate performance on a wide variety of human poses captured from different viewpoints and show generalization to novel 3D configurations and robustness to missing data.

Explore More