Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Saad Ali is active.

Publication


Featured researches published by Saad Ali.


acm multimedia | 2007

A 3-dimensional sift descriptor and its application to action recognition

Paul Scovanner; Saad Ali; Mubarak Shah

In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.


computer vision and pattern recognition | 2007

A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis

Saad Ali; Mubarak Shah

This paper proposes a framework in which Lagrangian particle dynamics is used for the segmentation of high density crowd flows and detection of flow instabilities. For this purpose, a flow field generated by a moving crowd is treated as an aperiodic dynamical system. A grid of particles is overlaid on the flow field, and is advected using a numerical integration scheme. The evolution of particles through the flow is tracked using a flow map, whose spatial gradients are subsequently used to setup a Cauchy Green deformation tensor for quantifying the amount by which the neighboring particles have diverged over the length of the integration. The maximum eigenvalue of the tensor is used to construct a finite time Lyapunov exponent (FTLE) field, which reveals the Lagrangian coherent structures (LCS) present in the underlying flow. The LCS divide flow into regions of qualitatively different dynamics and are used to locate boundaries of the flow segments in a normalized cuts framework. Any change in the number of flow segments over time is regarded as an instability, which is detected by establishing correspondences between flow segments over time. The experiments are conducted on a challenging set of videos taken from Google Video and a National Geographic documentary.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning

Saad Ali; Mubarak Shah

We propose a set of kinematic features that are derived from the optical flow for human action recognition in videos. The set of kinematic features includes divergence, vorticity, symmetric and antisymmetric flow fields, second and third principal invariants of flow gradient and rate of strain tensor, and third principal invariant of rate of rotation tensor. Each kinematic feature, when computed from the optical flow of a sequence of images, gives rise to a spatiotemporal pattern. It is then assumed that the representative dynamics of the optical flow are captured by these spatiotemporal patterns in the form of dominant kinematic trends or kinematic modes. These kinematic modes are computed by performing principal component analysis (PCA) on the spatiotemporal volumes of the kinematic features. For classification, we propose the use of multiple instance learning (MIL) in which each action video is represented by a bag of kinematic modes. Each video is then embedded into a kinematic-mode-based feature space and the coordinates of the video in that space are used for classification using the nearest neighbor algorithm. The qualitative and quantitative results are reported on the benchmark data sets.


european conference on computer vision | 2008

Floor Fields for Tracking in High Density Crowd Scenes

Saad Ali; Mubarak Shah

This paper presents an algorithm for tracking individual targets in high density crowd scenes containing hundreds of people. Tracking in such a scene is extremely challenging due to the small number of pixels on the target, appearance ambiguity resulting from the dense packing, and severe inter-object occlusions. The novel tracking algorithm, which is outlined in this paper, will overcome these challenges using a scene structure based force model. In this force model an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in the scene. The key ingredients of the force model are three floor fields, which are inspired by the research in the field of evacuation dynamics, namely Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of move from one location to another by converting the long-range forces into local ones. The SFF specifies regions of the scene which are attractive in nature (e.g. an exit location). The DFF specifies the immediate behavior of the crowd in the vicinity of the individual being tracked. The BFF specifies influences exhibited by the barriers in the scene (e.g. walls, no-go areas). By combining cues from all three fields with the available appearance information, we track individual targets in high density crowds.


computer vision and pattern recognition | 2008

Recognizing human actions using multiple features

Jingen Liu; Saad Ali; Mubarak Shah

In this paper, we propose a framework that fuses multiple features for improved action recognition in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use two types of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images, which aims to capture the shape deformation of the actor by considering actions as 3D objects (x, y, t). To optimally combine these features, we treat different features as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by converting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler embedding. The performance of the proposed framework is tested on publicly available data sets. The results demonstrate that fusion of multiple features helps in achieving improved performance, and allows retrieval of meaningful features and videos from the embedding space.


international conference on computer vision | 2007

Chaotic Invariants for Human Action Recognition

Saad Ali; Arslan Basharat; Mubarak Shah

The paper introduces an action recognition framework that uses concepts from the theory of chaotic systems to model and analyze nonlinear dynamics of human actions. Trajectories of reference joints are used as the representation of the non-linear dynamical system that is generating the action. Each trajectory is then used to reconstruct a phase space of appropriate dimension by employing a delay-embedding scheme. The properties of the reconstructed phase space are captured in terms of dynamical and metric invariants that include Lyapunov exponent, correlation integral and correlation dimension. Finally, the action is represented by a feature vector which is a combination of these invariants over all the reference trajectories. Our contributions in this paper include :1) investigation of the appropriateness of theory of chaotic systems for human action modelling and recognition, 2) a new set of features to characterize nonlinear dynamics of human actions, 3) experimental validation of the feasibility and potential merits of carrying out action recognition using methods from theory of chaotic systems.


international conference on computer vision | 2009

Tracking in unstructured crowded scenes

Mikel Rodriguez; Saad Ali; Takeo Kanade

This paper presents a target tracking framework for unstructured crowded scenes. Unstructured crowded scenes are defined as those scenes where the motion of a crowd appears to be random with different participants moving in different directions over time. This means each spatial location in such scenes supports more than one, or multi-modal, crowd behavior. The case of tracking in structured crowded scenes, where the crowd moves coherently in a common direction, and the direction of motion does not vary over time, was previously handled in [1]. In this work, we propose to model various crowd behavior (or motion) modalities at different locations of the scene by employing Correlated Topic Model (CTM) of [16]. In our construction, words correspond to low level quantized motion features and topics correspond to crowd behaviors. It is then assumed that motion at each location in an unstructured crowd scene is generated by a set of behavior proportions, where behaviors represent distributions over low-level motion features. This way any one location in the scene may support multiple crowd behavior modalities and can be used as prior information for tracking. Our approach enables us to model a diverse set of unstructured crowd domains, which range from cluttered time-lapse microscopy videos of cell populations in vitro, to footage of crowded sporting events.


computer vision and pattern recognition | 2005

Online detection and classification of moving objects using progressively improving detectors

Omar Javed; Saad Ali; Mubarak Shah

Boosting based detection methods have successfully been used for robust detection of faces and pedestrians. However, a very large amount of labeled examples are required for training such a classifier. Moreover, once trained, the boosted classifier cannot adjust to the particular scenario in which it is employed. In this paper, we propose a co-training based approach to continuously label incoming data and use it for online update of the boosted classifier that was initially trained from a small labeled example set. The main contribution of our approach is that it is an online procedure in which separate views (features) of the data are used for co-training, while the combined view (all features) is used to make classification decisions in a single boosted framework. The features used for classification are derived from principal component analysis of the appearance templates of the training examples. In order to speed up the classification, background modeling is used to prune away stationary regions in an image. Our experiments indicate that starting from a classifier trained on a small training set, significant performance gains can be made through online updation from the unlabeled data.


computer vision and pattern recognition | 2012

Evaluation of low-level features and their combinations for complex event detection in open source videos

Amir Tamrakar; Saad Ali; Qian Yu; Jingen Liu; Omar Javed; Ajay Divakaran; Hui Cheng; Harpreet S. Sawhney

Low-level appearance as well as spatio-temporal features, appropriately quantized and aggregated into Bag-of-Words (BoW) descriptors, have been shown to be effective in many detection and recognition tasks. However, their effcacy for complex event recognition in unconstrained videos have not been systematically evaluated. In this paper, we use the NIST TRECVID Multimedia Event Detection (MED11 [1]) open source dataset, containing annotated data for 15 high-level events, as the standardized test bed for evaluating the low-level features. This dataset contains a large number of user-generated video clips. We consider 7 different low-level features, both static and dynamic, using BoW descriptors within an SVM approach for event detection. We present performance results on the 15 MED11 events for each of the features as well as their combinations using a number of early and late fusion strategies and discuss their strengths and limitations.


Airborne intelligence, surveillance, reconnaissance (ISR) systems and applications. Conference | 2006

COCOA: tracking in aerial imagery

Saad Ali; Mubarak Shah

Unmanned Aerial Vehicles (UAVs) are becoming a core intelligence asset for reconnaissance, surveillance and target tracking in urban and battlefield settings. In order to achieve the goal of automated tracking of objects in UAV videos we have developed a system called COCOA. It processes the video stream through number of stages. At first stage platform motion compensation is performed. Moving object detection is performed to detect the regions of interest from which object contours are extracted by performing a level set based segmentation. Finally blob based tracking is performed for each detected object. Global tracks are generated which are used for higher level processing. COCOA is customizable to different sensor resolutions and is capable of tracking targets as small as 100 pixels. It works seamlessly for both visible and thermal imaging modes. The system is implemented in Matlab and works in a batch mode.

Collaboration


Dive into the Saad Ali's collaboration.

Top Co-Authors

Avatar

Mubarak Shah

University of Central Florida

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge