Raffay Hamid
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Raffay Hamid.
Artificial Intelligence | 2009
Raffay Hamid; Siddhartha Maddi; Amos Y. Johnson; Aaron F. Bobick; Irfan A. Essa; Charles Lee Isbell
Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification. In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments.
international conference on computer vision | 2007
Raffay Hamid; Siddhartha Maddi; Aaron F. Bobick; M. Essa
Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.
computer vision and pattern recognition | 2003
Raffay Hamid; Yan Huang; Irfan A. Essa
This paper presents a new framework for tracking and recognizing complex multi-agent activities using probabilistic tracking coupled with graphical models for recognition. We employ statistical feature based particle filter to robustly track multiple objects in cluttered environments. Both color and shape characteristics are used to differentiate and track different objects so that low level visual information can be reliably extracted for recognition of complex activities. Such extracted spatio-temporal features are then used to build temporal graphical models for characterization of these activities. We demonstrate through examples in different scenarios, the generalizability and robustness of our framework.
computer vision and pattern recognition | 2008
Cha Zhang; Raffay Hamid; Zhengyou Zhang
Because of the large variation across different environments, a generic classifier trained on extensive data-sets may perform sub-optimally in a particular test environment. In this paper, we present a general framework for classifier adaptation, which improves an existing generic classifier in the new test environment. Viewing classifier learning as a cost minimization problem, we perform classifier adaptation by combining the cost function on the old data-sets with the cost function on the data-set collected from the new environment. The former term is further approximated with its second order Taylor expansion to reduce the amount of information that needs to be saved for adaptation. Unlike traditional approaches that are often designed for a specific application or classifier, our scheme is applicable to various types of classifiers and user labels. We demonstrate this property on two popular classifiers (logistic regression and boosting), while using two types of user labels (direct labels and similarity labels). Extensive experiments conducted for the task of person detection in conference-room environments show that significant performance improvement can be achieved with our proposed method.
Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks | 2006
Raffay Hamid; Siddhartha Maddi; Aaron F. Bobick; Irfan A. Essa
We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.
computer vision and pattern recognition | 2016
Diego Marcos; Raffay Hamid; Devis Tuia
The growing availability of very high resolution (<;1 m/pixel) satellite and aerial images has opened up unprecedented opportunities to monitor and analyze the evolution of land-cover and land-use across the world. To do so, images of the same geographical areas acquired at different times and, potentially, with different sensors must be efficiently parsed to update maps and detect land-cover changes. However, a naϊve transfer of ground truth labels from one location in the source image to the corresponding location in the target image is generally not feasible, as these images are often only loosely registered (with up to ± 50m of non-uniform errors). Furthermore, land-cover changes in an area over time must be taken into account for an accurate ground truth transfer. To tackle these challenges, we propose a mid-level sensor-invariant representation that encodes image regions in terms of the spatial distribution of their spectral neighbors. We incorporate this representation in a Markov Random Field to simultaneously account for nonlinear mis-registrations and enforce locality priors to find matches between multi-sensor images. We show how our approach can be used to assist in several multimodal land-cover update and change detection problems.
international conference on image processing | 2004
Raffay Hamid; Aaron F. Bobick; Anthony J. Yezzi
Just as a motion field is associated to a moving object, an audio field can be associated to an object that can behave as a sound source. The flow field of such a sound source which moves over time would not only have an optical component, but also an audio component; something we call audio-visual flow. In this paper we present a common structure tensor based variational framework for dense audiovisual flow-field estimation. The proposed scheme improves the rank of the local structure tensor by incorporating an audio information channel which is substantially uncorrelated from the complementing visual information channel. The scheme allows ascribing weights to individual sensor modalities based on the confidence in their corresponding measurements. Results are presented to demonstrate how combining multiple modalities in our proposed framework can provide a possible solution to temporary full visual occlusions.
human factors in computing systems | 2004
Anind K. Dey; Raffay Hamid; Chris Beckmann; Ian Li; Daniel Hsu
computer vision and pattern recognition | 2005
Raffay Hamid; Amos Y. Johnson; Samir Batta; Aaron F. Bobick; Charles Lee Isbell; Graham Coleman
computer vision and pattern recognition | 2010
Raffay Hamid; Ramkrishan K. Kumar; Matthias Grundmann; Kihwan Kim; Irfan A. Essa; Jessica K. Hodgins