Zeeshan Rasheed | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zeeshan Rasheed is active.

Explore More

Publication

Featured researches published by Zeeshan Rasheed.

Computer Vision and Image Understanding | 2008

Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views

Omar Javed; Khurram Shafique; Zeeshan Rasheed; Mubarak Shah

Tracking across cameras with non-overlapping views is a challenging problem. Firstly, the observations of an object are often widely separated in time and space when viewed from non-overlapping cameras. Secondly, the appearance of an object in one camera view might be very different from its appearance in another camera view due to the differences in illumination, pose and camera properties. To deal with the first problem, we observe that people or vehicles tend to follow the same paths in most cases, i.e., roads, walkways, corridors etc. The proposed algorithm uses this conformity in the traversed paths to establish correspondence. The algorithm learns this conformity and hence the inter-camera relationships in the form of multivariate probability density of space-time variables (entry and exit locations, velocities, and transition times) using kernel density estimation. To handle the appearance change of an object as it moves from one camera to another, we show that all brightness transfer functions from a given camera to another camera lie in a low dimensional subspace. This subspace is learned by using probabilistic principal component analysis and used for appearance matching. The proposed approach does not require explicit inter-camera calibration, rather the system learns the camera topology and subspace of inter-camera brightness transfer functions during a training phase. Once the training is complete, correspondences are assigned using the maximum likelihood (ML) estimation framework using both location and appearance cues. Experiments with real world videos are reported which validate the proposed approach.

IEEE Transactions on Multimedia | 2005

Detection and representation of scenes in videos

Zeeshan Rasheed; Mubarak Shah

This paper presents a method to perform a high-level segmentation of videos into scenes. A scene can be defined as a subdivision of a play in which either the setting is fixed, or when it presents continuous action in one place. We exploit this fact and propose a novel approach for clustering shots into scenes by transforming this task into a graph partitioning problem. This is achieved by constructing a weighted undirected graph called a shot similarity graph (SSG), where each node represents a shot and the edges between the shots are weighted by their similarity based on color and motion information. The SSG is then split into subgraphs by applying the normalized cuts for graph partitioning. The partitions so obtained represent individual scenes in the video. When clustering the shots, we consider the global similarities of shots rather than the individual shot pairs. We also propose a method to describe the content of each scene by selecting one representative image from the video as a scene key-frame. Recently, DVDs have become available with a chapter selection option where each chapter is represented by one image. Our algorithm automates this objective which is useful for applications such as video-on-demand, digital libraries, and the Internet. Experiments are presented with promising results on several Hollywood movies and one sitcom.

computer vision and pattern recognition | 2003

Scene detection in Hollywood movies and TV shows

Zeeshan Rasheed; Mubarak Shah

A scene can be defined as one of the subdivisions of a play in which the setting is fixed, or when it presents continuous action in one place. We propose a novel two-pass algorithm for scene boundary detection, which utilizes the motion content, shot length and color properties of shots as the features. In our approach, shots are first clustered by computing Backward Shot Coherence (BSC) - a shot color similarity measure that detects Potential Scene Boundaries (PSBs) in the videos. In the second pass we compute Scene Dynamics (SD), a function of shot length and the motion content in the potential scenes. In this pass, a scene merging criteria has been developed to remove weak PSBs in order to reduce over segmentation. We also propose a method to describe the content of each scene by selecting one representative image. The segmentation of video data into number of scenes facilitates an improved browsing of videos in electronic form, such as video on demand, digital libraries, Internet. The proposed algorithm has been tested on a variety of videos that include five Hollywood movies, one sitcom, and one interview program and promising results have been obtained.

international conference on computer vision | 2001

Human tracking in multiple cameras

Sohaib Khan; Omar Javed; Zeeshan Rasheed; Mubarak Shah

Multiple cameras are needed to cover large environments for monitoring activity. To track people successfully in multiple perspective imagery, one needs to establish correspondence between objects captured in multiple cameras. We present a system for tracking people in multiple uncalibrated cameras. The system is able to discover spatial relationships between the camera fields of view and use this information to correspond between different perspective views of the same person. We employ the novel approach of finding the limits of field of view (FOV) of a camera as visible in the other cameras. Using this information, when a person is seen in one camera, we are able to predict all the other cameras in which this person will be visible. Moreover, we apply the FOV constraint to disambiguate between possible candidates of correspondence. We present results on sequences of up to three cameras with multiple people. The proposed approach is very fast compared to camera calibration based approaches.

international conference on multimedia and expo | 2003

KNIGHT/spl trade/: a real time surveillance system for multiple and non-overlapping cameras

Omar Javed; Zeeshan Rasheed; Orkun Alatas; Mubarak Shah

In this paper, we present a wide area surveillance system that detects, tracks and classifies moving objects across multiple cameras. At the single camera level, tracking is performed using a voting based approach that utilizes color and shape cues to establish correspondence. The system uses the single camera tracking results along with the relationship between camera field of view (FOV) boundaries to establish correspondence between views of the same object in multiple cameras. To this end, a novel approach is described to find the relationships between the FOV lines of cameras. The proposed approach combines tracking in cameras with overlapping and/or non-overlapping FOVs in a unified framework, without requiring explicit calibration. The proposed algorithm has been implemented in a real time system. The system uses a client-server architecture and runs at 10 Hz with three cameras.

international conference on pattern recognition | 2002

Movie genre classification by exploiting audio-visual features of previews

Zeeshan Rasheed; Mubarak Shah

We present a method to classify movies on the basis of audio-visual cues present in previews. A preview summarizes the main idea of a movie providing a suitable amount of information to perform genre classification. In our approach movies are initially classified into action and non-action by computing the visual disturbance feature and average shot length of every movie. Visual disturbance is defined as a measure of motion content in a clip. Next we use color, audio and cinematic principles for further classification into comedy, horror drama/other and movies containing explosions and gunfire. This work is a step towards automatically building and updating a video database, thus resulting in minimum human intervention. Other potential applications include browsing and retrieval of videos on the Internet (video-on-demand), video libraries, and rating of movies.

international conference on computer vision | 2001

A framework for segmentation of talk and game shows

Omar Javed; Zeeshan Rasheed; Mubarak Shah

In this paper, we present a method to remove commercials from talk and game show videos and to segment these videos into host and guest shots. In our approach, we mainly rely on information contained in shot transitions, rather than analyzing the scene content of individual frames. We utilize the inherent differences in scene structure of commercials and talk shows to differentiate between them. Similarly, we make use of the well-defined structure of talk shows, which can be exploited to classify shots as host or guest shots. The entire show is first segmented into camera shots based on color histogram. Then, we construct a data-structure (shot connectivity graph) which links similar shots over time. Analysis of the shot connectivity graph helps us to automatically separate commercials from program segments. This is done by first detecting stories, and then assigning a weight to each story based on its likelihood of being a commercial. Further analysis on stories is done to distinguish shots of the hosts from shots of the guests. We have tested our approach on several full-length shows (including commercials) and have achieved video segmentation with high accuracy. The whole scheme is fast and works even on low quality video (160/spl times/120 pixel images at 5 Hz).

conference on image and video retrieval | 2004

A framework for semantic classification of scenes using Finite State Machines

Yun Zhai; Zeeshan Rasheed; Mubarak Shah

We address the problem of classifying scenes from feature films into semantic categories and propose a robust framework for this problem. We propose that the Finite State Machines (FSM) are suitable for detecting and classifying scenes and demonstrate their usage for three types of movie scenes; conversation, suspense and action. Our framework utilizes the structural information of the scenes together with the low and mid-level features. Low level features of video including motion and audio energy and a mid-level feature, face detection, are used in our approach. The transitions of the FSMs are determined by the features of each shot in the scene. Our FSMs have been experimented on over 60 clips and convincing results have been achieved.

International Journal of Pattern Recognition and Artificial Intelligence | 2010

AUTOMATIC GEO-REGISTRATION FOR PORT SURVEILLANCE

Xiaochun Cao; Lin Wu; Zeeshan Rasheed; Haiying Liu; Tae Eun Choe; Feng Guo; Niels Haering

This paper proposes a new solution to geo-register the nearly feature-less maritime video feeds. We detect the horizon using sizable or uniformly moving vessels, and estimate the vertical apex using water reflections of the street lamps. The computed horizon and apex provide a metric rectification that removes the affine distortions and reduces the searching space for geo-registration. Geo-registration is obtained by searching the best orientation where the estimated water masks on satellite images and camera views are matched. The proposed solution has the following contributions: first, water and coastlines are used as features for registration between horizontally looking maritime views and satellite images. Second, water reflections are proposed to estimate the vertical vanishing point. Third, we give algorithms for the detection of water areas in both satellite images and camera views. Experimental results and applications on cross camera tracking are demonstrated. We also discuss several observations, as well as limitations of the proposed approach.

Archive | 2003

Video categorization using semantics and semiotics

Zeeshan Rasheed; Mubarak Shah

There is a great need to automatically segment, categorize, and annotate video data, and to develop efficient tools for browsing and searching. We believe that the categorization of videos can be achieved by exploring the concepts and meanings of the videos. This task requires bridging the gap between low-level content and high-level concepts (or semantics). Once a relationship is established between the low-level computable features of the video and its semantics, the user would be able to navigate through videos through the use of concepts and ideas (for example, a user could extract only those scenes in an action film that actually contain fights) rather than sequentially browsing the whole video. However, this relationship must follow the norms of human perception and abide by the rules that are most often followed by the creators (directors) of these videos. These rules are called film grammar in video production literature. Like any natural language, this grammar has several dialects, but it has been acknowledged to be universal. Therefore, the knowledge of film grammar can be exploited effectively for the understanding of films. To interpret an idea using the grammar, we need to first understand the symbols, as in natural languages, and second, understand the rules of combination of these symbols to represent concepts. In order to develop algorithms that exploit this film grammar, it is necessary to relate the symbols of the grammar to computable video features. In this dissertation, we have identified a set of computable features of videos and have developed methods to estimate them. (Abstract shortened by UMI.)

Explore More