Vinay D. Shet
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinay D. Shet.
computer vision and pattern recognition | 2003
Ahmed M. Elgammal; Vinay D. Shet; Yaser Yacoob; Larry S. Davis
This paper addresses the problem of capturing the dynamics for exemplar-based recognition systems. Traditional HMM provides a probabilistic tool to capture system dynamics and in exemplar paradigm, HMM states are typically coupled with the exemplars. Alternatively, we propose a non-parametric HMM approach that uses a discrete HMM with arbitrary states (decoupled from exemplars) to capture the dynamics over a large exemplar space where a nonparametric estimation approach is used to model the exemplar distribution. This reduces the need for lengthy and non-optimal training of the HMM observation model. We used the proposed approach for view-based recognition of gestures. The approach is based on representing each gesture as a sequence of learned body poses (exemplars). The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constraints between different poses using the proposed non-parametric HMM.
computer vision and pattern recognition | 2007
Vinay D. Shet; Jan Neumann; Visvanathan Ramesh; Larry S. Davis
The capacity to robustly detect humans in video is a critical component of automated visual surveillance systems. This paper describes a bilattice based logical reasoning approach that exploits contextual information and knowledge about interactions between humans, and augments it with the output of different low level detectors for human detection. Detections from low level parts-based detectors are treated as logical facts and used to reason explicitly about the presence or absence of humans in the scene. Positive and negative information from different sources, as well as uncertainties from detections and logical rules, are integrated within the bilattice framework. This approach also generates proofs or justifications for each hypothesis it proposes. These justifications (or lack thereof) are further employed by the system to explain and validate, or reject potential hypotheses. This allows the system to explicitly reason about complex interactions between humans and handle occlusions. These proofs are also available to the end user as an explanation of why the system thinks a particular hypothesis is actually a human. We employ a boosted cascade of gradient histograms based detector to detect individual body parts. We have applied this framework to analyze the presence of humans in static images from different datasets.
advanced video and signal based surveillance | 2005
Vinay D. Shet; David Harwood; Larry S. Davis
This paper describes the architecture of a visual surveillance system that combines real time computer vision algorithms with logic programming to represent and recognize activities involving interactions amongst people, packages and the environments through which they move. The low level computer vision algorithms log primitive events of interest as observed facts, while the higher level Prolog based reasoning engine uses these facts in conjunction with predefined rules to recognize various activities in the input video streams. The system is illustrated in action on a multi-camera surveillance scenario that includes both security and safety violations.
workshop on applications of computer vision | 2013
Cheng-Hao Kuo; Sameh Khamis; Vinay D. Shet
We address the problem of appearance-based person re-identification, which has been drawing an increasing amount of attention in computer vision. It is a very challenging task since the visual appearance of a person can change dramatically due to different backgrounds, camera characteristics, lighting conditions, view-points, and human poses. Among the recent studies on person re-id, color information plays a major role in terms of performance. Traditional color information like color histogram, however, still has much room to improve. We propose to apply semantic color names to describe a person image, and compute probability distribution on those basic color terms as image descriptors. To be better combined with other features, we define our appearance affinity model as linear combination of similarity measurements of corresponding local descriptors, and apply the RankBoost algorithm to find the optimal weights for the similarity measurements. We evaluate our proposed system on the highly challenging VIPeR dataset, and show improvements over the state-of-the-art methods in terms of widely used person re-id evaluation metrics.
european conference on computer vision | 2006
Vinay D. Shet; David Harwood; Larry S. Davis
Recognition of complex activities from surveillance video requires detection and temporal ordering of its constituent “atomic” events. It also requires the capacity to robustly track individuals and maintain their identities across single as well as multiple camera views. Identity maintenance is a primary source of uncertainty for activity recognition and has been traditionally addressed via different appearance matching approaches. However these approaches, by themselves, are inadequate. In this paper, we propose a prioritized, multivalued, default logic based framework that allows reasoning about the identities of individuals. This is achieved by augmenting traditional appearance matching with contextual information about the environment and self identifying traits of certain actions. This framework also encodes qualitative confidence measures for the identity decisions it takes and finally, uses this information to reason about the occurrence of certain predefined activities in video.
computer vision and pattern recognition | 2009
Vinay D. Shet; Maneesh Kumar Singh; Claus Bahlmann; Visvanathan Ramesh
In this paper, an extended work reported in [Shet, et al , 2007] to detect complex objects in aerial images was discussed. Such objects, e.g. surface to air missile launcher sites, are highly variable in appearance and can only be characterized by their functional design and surrounding context, such as physical arrangement of access structures. Constraints in acquiring sufficient annotated data for learning make it challenging for purely data driven approaches to adequately generalize. In this work, structure arising from functional requirements and surrounding context has been encoded using predicate logic based grammars. Observation and model uncertainties have been integrated within the bi lattice framework. Also in this paper a proposed method to automatically optimize weights associated with logical rules is presented. Automated logical rule weight learning is an important aspect of the application of such systems in the computer vision domain. The proposed approach casts the instantiated inference tree as a knowledge based neural net, interprets rule uncertainties as link weights in the network, and applies a constrained, back propagation (BP) algorithm to converge upon a set of weights for optimal performance. The BP algorithm has been accordingly modified to compute local gradients over the bi lattice specific inference operation and respect constraints specific to vision applications. Both extension have been evaluated over real and simulated data with favorable results.
Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks | 2006
Vinay D. Shet; David Harwood; Larry S. Davis
Persistent tracking systems require the capacity to track individuals by maintaining identity across visibility gaps caused by occlusion events. In traditional computer vision systems, the flow of information is typically bottom-up. The low level image processing modules take video input, perform early vision tasks such as background subtraction and object detection,and pass this information to the high level reasoning module. This paper describes the architecture of a system that uses top-down information flow to perform identity maintenance across occlusion events. This system uses the high level reasoning module to provide control feedback to the low level image processing module to perform forensic analysis of archival video and actively acquire information required to arrive at identity decisions. This functionality is in addition to traditional bottom-up reasoning about identity, employing contextual cues and appearance matching, within the multivalued default logic framework proposed in [18]. This framework, in addition to bestowing upon the system the property of nonmonotonicity, also allows for it to qualitatively encode its confidence in the identity decisions it takes.
computer vision and pattern recognition | 2012
Toufiq Parag; Claus Bahlmann; Vinay D. Shet; Maneesh Kumar Singh
Modeling objects using formal grammars has recently regained much attention in computer vision. Probabilistic logic programming, such as Bilattice based Logical Reasoning (BLR), is shown to produce impressive results in object detection/recognition. Although hierarchical object descriptions are preferred in high-level vision tasks for several reasons, BLR has been applied to non-hierarchical object grammars (compositional descriptions of object class). To better align logic programs (esp. BLR) with compositional object hierarchies, we provide a formal grammar, which can guide domain experts to describe objects. That is, we introduce a context-sensitive specification grammar or a meta-grammar, the language of which is the set of all possible object grammars. We show the practicality of the approach by an automatic compiler that translates example object grammars into a BLR logic program and applied it for detecting Graphical User Interface (GUI) components.
3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the | 2003
Ahmed M. Elgammal; Vinay D. Shet; Yaser Yacoob; Larry S. Davis
This paper presents a probabilistic exemplar-based framework for recognizing gestures. The approach is based on representing each gesture as a sequence of learned body poses. The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constrains between different poses. Matching individual poses to image data is performed using a probabilistic formulation for edge matching to obtain a likelihood measurement for each individual pose. The paper introduces a correspondence-free weighted matching scheme for edge templates that emphasize discriminating features in the matching. The weighting does not require establishing correspondences between the different pose models. The probabilistic framework also imposes temporal constrains between different pose through a learned hidden Markov model (HMM) of each gesture.
Archive | 2008
Vinay D. Shet; Jan Neumann; Vasudev Parameswaran; Visvanathan Ramesh; Imad Zoghlami