Mubarak Shah | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mubarak Shah is active.

Explore More

Publication

Featured researches published by Mubarak Shah.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1999

Shape-from-shading: a survey

Ruo Zhang; Ping-Sing Tsai; James Edwin Cryer; Mubarak Shah

Since the first shape-from-shading (SFS) technique was developed by Horn in the early 1970s, many different approaches have emerged. In this paper, six well-known SFS algorithms are implemented and compared. The performance of the algorithms was analyzed on synthetic images using mean and standard deviation of depth (Z) error, mean of surface gradient (p, q) error, and CPU timing. Each algorithm works well for certain images, but performs poorly for others. In general, minimization approaches are more robust, while the other approaches are faster.

acm multimedia | 2007

A 3-dimensional sift descriptor and its application to action recognition

Paul Scovanner; Saad Ali; Mubarak Shah

In this paper we introduce a 3-dimensional (3D) SIFT descriptor for video or 3D imagery such as MRI data. We also show how this new descriptor is able to better represent the 3D nature of video data in the application of action recognition. This paper will show how 3D SIFT is able to outperform previously used description methods in an elegant and efficient manner. We use a bag of words approach to represent videos, and present a method to discover relationships between spatio-temporal words in order to better describe the video data.

computer vision and pattern recognition | 2009

Abnormal crowd behavior detection using social force model

Ramin Mehran; Alexis Oyama; Mubarak Shah

In this paper we introduce a novel method to detect and localize abnormal behaviors in crowd videos using Social Force model. For this purpose, a grid of particles is placed over the image and it is advected with the space-time average of optical flow. By treating the moving particles as individuals, their interaction forces are estimated using social force model. The interaction force is then mapped into the image plane to obtain Force Flow for every pixel in every frame. Randomly selected spatio-temporal volumes of Force Flow are used to model the normal behavior of the crowd. We classify frames as normal and abnormal by using a bag of words approach. The regions of anomalies in the abnormal frames are localized using interaction forces. The experiments are conducted on a publicly available dataset from University of Minnesota for escape panic scenarios and a challenging dataset of crowd videos taken from the web. The experiments show that the proposed method captures the dynamics of the crowd behavior successfully. In addition, we have shown that the social force approach outperforms similar approaches based on pure optical flow.

computer vision and pattern recognition | 2008

Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition

Mikel Rodriguez; Javed Ahmed; Mubarak Shah

In this paper we introduce a template-based method for recognizing human actions called action MACH. Our approach is based on a maximum average correlation height (MACH) filter. A common limitation of template-based methods is their inability to generate a single template using a collection of examples. MACH is capable of capturing intra-class variability by synthesizing a single Action MACH filter for a given action class. We generalize the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data. By analyzing the response of the filter in the frequency domain, we avoid the high computational cost commonly incurred in template-based approaches. Vector valued data is analyzed using the Clifford Fourier transform, a generalization of the Fourier transform intended for both scalar and vector-valued data. Finally, we perform an extensive set of experiments and compare our method with some of the most recent approaches in the field by using publicly available datasets, and two new annotated human action datasets which include actions performed in classic feature films and sports broadcast television.

computer vision and pattern recognition | 2009

Recognizing realistic actions from videos “in the wild”

Jingen Liu; Jiebo Luo; Mubarak Shah

In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild”. Such unconstrained videos are abundant in personal collections as well as on the Web. Recognizing action from such videos has not been addressed extensively, primarily due to the tremendous variations that result from camera motion, background clutter, changes in object appearance, and scale, etc. The main challenge is how to extract reliable and informative features from the unconstrained videos. We extract both motion and static features from the videos. Since the raw features of both types are dense yet noisy, we propose strategies to prune these features. We use motion statistics to acquire stable motion features and clean static features. Furthermore, PageRank is used to mine the most informative static features. In order to further construct compact yet discriminative visual vocabularies, a divisive information-theoretic algorithm is employed to group semantically related features. Finally, AdaBoost is chosen to integrate all the heterogeneous yet complementary features for recognition. We have tested the framework on the KTH dataset and our own dataset consisting of 11 categories of actions collected from YouTube and personal videos, and have obtained impressive results for action recognition and action localization.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Visual Tracking: An Experimental Survey

Arnold W. M. Smeulders; Dung Manh Chu; Rita Cucchiara; Simone Calderara; Afshin Dehghan; Mubarak Shah

There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.

acm multimedia | 2006

Visual attention detection in video sequences using spatiotemporal cues

Yun Zhai; Mubarak Shah

Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract our first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, a fast method for computing pixel-level saliency maps has been developed using color histograms of images. A hierarchical spatial attention representation is established to reveal the interesting points in images as well as the interesting regions. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Bayesian modeling of dynamic scenes for object detection

Yaser Sheikh; Mubarak Shah

Accurate detection of moving objects is an important precursor to stable tracking or recognition. In this paper, we present an object detection scheme that has three innovations over existing approaches. First, the model of the intensities of image pixels as independent random variables is challenged and it is asserted that useful correlation exists in intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of dynamic backgrounds. By using a nonparametric density estimation method over a joint domain-range representation of image pixels, multimodal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. We propose a model of the background as a single probability density. Second, temporal persistence is proposed as a detection criterion. Unlike previous approaches to object detection which detect objects by building adaptive models of the background, the foregrounds modeled to augment the detection of objects (without explicit tracking) since objects detected in the preceding frame contain substantial evidence for detection in the current frame. Finally, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of detecting interesting objects and the posterior function is maximized efficiently by finding the minimum cut of a capacitated graph. Experimental validation of the proposed method is performed and presented on a diverse set of dynamic scenes.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

Contour-based object tracking with occlusion handling in video acquired using mobile cameras

Alper Yilmaz; Xin Li; Mubarak Shah

We propose a tracking method which tracks the complete object regions, adapts to changing visual features, and handles occlusions. Tracking is achieved by evolving the contour from frame to frame by minimizing some energy functional evaluated in the contour vicinity defined by a band. Our approach has two major components related to the visual features and the object shape. Visual features (color, texture) are modeled by semiparametric models and are fused using independent opinion polling. Shape priors consist of shape level sets and are used to recover the missing object regions during occlusion. We demonstrate the performance of our method in real sequences with and without object occlusions.

International Journal of Computer Vision | 2002

View-Invariant Representation and Recognition of Actions

Cen Rao; Alper Yilmaz; Mubarak Shah

Analysis of human perception of motion shows that information for representing the motion is obtained from the dramatic changes in the speed and direction of the trajectory. In this paper, we present a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called dynamic instants and intervals. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics. An interval represents the time period between two dynamic instants during which the motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The proposed method can discover instances of the same action performed by differentpeople from different view points. Experiments on 47 actions performed by 7 individuals in an environment with no constraints shows the robustness of the proposed method.

Explore More