Asaad Hakeem | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Asaad Hakeem is active.

Explore More

Publication

Featured researches published by Asaad Hakeem.

Artificial Intelligence | 2007

Learning, detection and representation of multi-agent events in videos

Asaad Hakeem; Mubarak Shah

In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to learn the event structure from training videos, we automatically encode the sub-event dependency graph, which is the learnt event model that depicts the conditional dependency between sub-events. Second, we pose the problem of event detection in novel videos as clustering the maximally correlated sub-events using normalized cuts. The principal assumption made in this work is that the events are composed of a highly correlated chain of sub-events that have high weights (association) within the cluster and relatively low weights (disassociation) between the clusters. The event detection does not require prior knowledge of the number of agents involved in an event and does not make any assumptions about the length of an event. Third, we recognize the fact that any abstract event model should extend to representations related to human understanding of events. Therefore, we propose an extension of CASE representation of natural languages that allows a plausible means of interface between users and the computer. We show results of learning, detection, and representation of events for videos in the meeting, surveillance, and railroad monitoring domains.

international conference on pattern recognition | 2004

Ontology and taxonomy collaborated framework for meeting classification

Asaad Hakeem; Mubarak Shah

A framework for classification of meeting videos is proposed in this paper. We define our framework consisting of a four level concept hierarchy having movements, events, behavior, and genre; which is based on the meeting ontology and taxonomy. Ontology is the formal specification of domain concepts and their relationships. Taxonomy is the general categorization based on class/subclass relationships. This concept hierarchy is mapped to an implementation of finite state machines (FSM) and rule-based system (RBS) to classify the meetings. Events are detected by the FSMs based on the movements (head and hand tracks). Classification of the meetings is performed by the RBS based on the events, and behaviors of the people present in the meetings. Our framework is novel and scalable, capable of adding new meeting types with no re-training. We conducted experiments on various meeting sequences and classified meetings into voting, argument, presentation, and object passing. This framework has applications in automated video surveillance, video segmentation and retrieval (multimedia), human computer interaction, and augmented reality.

acm multimedia | 2005

An object-based video coding framework for video sequences obtained from static cameras

Asaad Hakeem; Khurram Shafique; Mubarak Shah

This paper presents a novel object-based video coding framework for videos obtained from a static camera. As opposed to most existing methods, the proposed method does not require explicit 2D or 3D models of objects and hence is general enough to cater for varying types of objects in the scene. The proposed system detects and tracks objects in the scene and learns the appearance model of each object online using incremental principal component analysis (IPCA). Each object is then coded using the coefficients of the most significant principal components of its learned appearance space. Due to smooth transitions between limited number of poses of an object, usually a limited number of significant principal components contribute to most of the variance in the objects appearance space and therefore only a small number of coefficients are required to code the object. The rigid component of the objects motion is coded in terms of its affine parameters. The framework is applied to compressing videos in surveillance and video phone domains. The proposed method is evaluated on videos containing a variety of scenarios such as multiple objects undergoing occlusion, splitting, merging, entering and exiting, as well as a changing background. Results on standard MPEG-7 videos are also presented. For all the videos, the proposed method displays higher Peak Signal to Noise Ratio (PSNR) compared to MPEG-2 and MPEG-4 methods, and provides comparable or better compression.

computer vision and pattern recognition | 2008

SAVE: A framework for semantic annotation of visual events

Mun Wai Lee; Asaad Hakeem; Niels Haering; Song-Chun Zhu

In this paper we propose a framework that performs automatic semantic annotation of visual events (SAVE). This is an enabling technology for content-based video annotation, query and retrieval with applications in Internet video search and video data mining. The method involves identifying objects in the scene, describing their inter-relations, detecting events of interest, and representing them semantically in a human readable and query-able format. The SAVE framework is composed of three main components. The first component is an image parsing engine that performs scene content extraction using bottom-up image analysis and a stochastic attribute image grammar, where we define a visual vocabulary from pixels, primitives, parts, objects and scenes, and specify their spatio-temporal or compositional relations; and a bottom-up top-down strategy is used for inference. The second component is an event inference engine, where the video event markup language (VEML) is adopted for semantic representation, and a grammar-based approach is used for event analysis and detection. The third component is the text generation engine that generates text report using head-driven phrase structure grammar (HPSG). The main contribution of this paper is a framework for an end-to-end system that infers visual events and annotates a large collection of videos. Experiments with maritime and urban scenes indicate the feasibility of the proposed approach.

international conference on pattern recognition | 2006

Estimating Geospatial Trajectory of a Moving Camera

Asaad Hakeem; Roberto Vezzani; Mubarak Shah; Rita Cucchiara

This paper proposes a novel method for estimating the geospatial trajectory of a moving camera. The proposed method uses a set of reference images with known GPS (global positioning system) locations to recover the trajectory of a moving camera using geometric constraints. The proposed method has three main steps. First, scale invariant features transform (SIFT) are detected and matched between the reference images and the video frames to calculate a weighted adjacency matrix (WAM) based on the number of SIFT matches. Second, using the estimated WAM, the maximum matching reference image is selected for the current video frame, which is then used to estimate the relative position (rotation and translation) of the video frame using the fundamental matrix constraint. The relative position is recovered up to a scale factor and a triangulation among the video frame and two reference images is performed to resolve the scale ambiguity. Third, an outlier rejection and trajectory smoothing (using b-spline) post processing step is employed. This is because the estimated camera locations may be noisy due to bad point correspondence or degenerate estimates of fundamental matrices. Results of recovering camera trajectory are reported for real sequences

workshop on applications of computer vision | 2008

Self Calibrating Visual Sensor Networks

Khurram Shafique; Asaad Hakeem; Omar Javed; Niels Haering

This paper presents an unsupervised data driven scheme to automatically estimate the relative topology of overlapping cameras in a large visual sensor network. The proposed method learns the camera topology by employing the statistics of co-occurring observations (of moving targets) in each sensor. Since target observation data is typically very noisy in realistic scenarios, an efficient two step method is used for robust estimation of the planar homography between camera views. In the first step, modes in the co-occurrence data are learned using meanshift. In the second step, a RANSAC based procedure is used to estimate the homography from weighted co-occurrence modes. Note that the first step not only lessens the effects of noise but also reduces the search space for efficient calculation. Unlike most existing algorithms for overlapping camera calibration, the proposed method uses an update mechanism to adapt online to the changes in network topology. The method does not assume prior knowledge about the scene, target, or network properties. It is also robust to noise, traffic intensity, and the amount of overlap between the fields of view. Experiments and quantitative evaluation using both synthetic and real data are presented to support the above claims.

computer vision and pattern recognition | 2007

On the Direct Estimation of the Fundamental Matrix

Yaser Sheikh; Asaad Hakeem; Mubarak Shah

The fundamental matrix is a central construct in the analysis of images captured from a pair of cameras and many feature-based methods have been proposed for its computation. In this paper, we propose a direct method for estimating the fundamental matrix where the motion between the frames is small (e.g. between successive frames of a video). To achieve this, a warping function is presented for the fundamental matrix by using the brightness constancy constraint in conjunction with geometric constraints. Using this warping function, an iterative hierarchical algorithm is described to recover accurate estimates of the fundamental matrix. We present results of experimentation to evaluate the performance of the proposed approach and demonstrate improved accuracy in the computation of the fundamental matrix.

acm multimedia | 2009

Semantic video search using natural language queries

Asaad Hakeem; Mun Wai Lee; Omar Javed; Niels Haering

Recent advances in computer vision and artificial intelligence algorithms have allowed automatic extraction of metadata from video. This metadata can be represented by using the RDF/OWL ontology which can encode scene objects and their relationships in an unambiguous and well-formed manner. The encoded data can be queried using SPARQL. However, SPARQL has a steep learning curve and cannot be directly utilized by a general user for video content search. In this paper, we propose a method to bridge this gap by automatically translating user provided natural language query into an ontology-based SPARQL query for semantic video search. The proposed method consists of three major steps. First, semantically labeled training corpus of natural language query sentences is used for learning the Semantic Stochastic Context Free Grammar (SSCFG). Second, given a user provided natural language query sentence, we use the Earley-Stolcke parsing algorithm to determine the maximum likelihood semantic parsing of the query sentence. This parsing infers the semantic meaning for each word in the query sentence from which the SPARQL query is constructed. Third, the SPARQL query is executed to retrieve relevant video segments from the RDF-OWL video content database. The method is evaluated by running natural language queries on surveillance videos from maritime and land-based domains, though the framework itself is general and extensible to search videos from other domains.

Video Analytics for Business Intelligence | 2012

Video Analytics for Business Intelligence

Asaad Hakeem; Himaanshu Gupta; Atul Kanaujia; Tae Eun Choe; Kiran Gunda; Andrew W. Scanlon; Li Yu; Zhong Zhang; Peter L. Venetianer; Zeeshan Rasheed; Niels Haering

This chapter focuses on various algorithms and techniques in video analytics that can be applied to the business intelligence domain. The goal is to provide the reader with an overview of the state of the art approaches in the field of video analytics, and also describe the various applications where these technologies can be applied. We describe existing algorithms for extraction and processing of target and scene information, multi-sensor cross camera analysis, inferencing of simple, complex and abnormal video events, data mining, image search and retrieval, intuitive UIs for efficient customer experience, and text summarization of visual data. We have also presented the evaluation results of each of these technology components using in-house and other publicly available datasets.

computer vision and pattern recognition | 2010

Rapidly Deployable Video Analysis Sensor units for wide area surveillance

Zeeshan Rasheed; Geoffrey Taylor; Li Yu; Mun Wai Lee; Tae Eun Choe; Feng Guo; Asaad Hakeem; Krishnan Ramnath; Michael R. Smith; Atul Kanaujia; Dana Eubanks; Niels Haering

This paper presents an overview of self-contained automated video analytics units that are man-portable and constitute nodes of a large-scale distributed sensor network. The paper highlights issues with traditional video surveillance systems in volatile environments such as a battle field and provides solutions to them in the form of Rapidly Deployable Video Analysis sensors. We discuss scientific and engineering aspects of the system and present the outcome of a field deployment in an exercise conducted by the Office of Naval Research.

Explore More