Ajay Divakaran
Mitsubishi
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Ajay Divakaran.
international conference on multimedia and expo | 2003
Fatih Porikli; Ajay Divakaran
An automatic object tracking and video summarization method for multi-camera systems with a large number of non-overlapping field-of-view cameras is explained. In this framework, video sequences are stored for each object as opposed to storing a sequence for each camera. Object-based representation enables annotation of video segments, and extraction of content semantics for further analysis. We also present a novel solution to the inter-camera color calibration problem. The transitive model function enables effective compensation for lighting changes and radiometric distortions for large-scale systems. After initial calibration, objects are tracked at each camera by background subtraction and mean-shift analysis. The correspondence of objects between different cameras is established by using a Bayesian belief network. This framework empowers the user to get a concise response to queries such as which locations did an object visit on Monday and what did it do there?.
international conference on image processing | 2003
Zixiang Xiong; Regunathan Radhakrishnan; Ajay Divakaran
In our past work we have used temporal patterns of motion activity to extract sports highlights. We have also used audio classification based approaches to develop a common audio-based platform for feature extraction that works across three different sports. In this paper, we combine the two aforementioned complementary approaches so as to get higher accuracy. We propose a framework for mining the semantic audio-visual labels in order to detect interesting events. Our results show that the proposed techniques work well across our three sports of interest, soccer, golf and baseball.
international conference on multimedia and expo | 2004
Kadir A. Peker; Ajay Divakaran
We present a novel compressed domain measure of spatio-temporal activity or visual complexity of a video segment. The visual complexity measure indicates how fast a video segment can be played within human perceptual limits. We present an adaptive smart fast-forward based video skimming method where the playback speed is varied based on the visual complexity. Alternatively, spatio-temporal smoothing is used to reduce visual complexity for an acceptable playback at a given playback speed. The complexity measure and the skimming method are based on early vision principles, thus they are applicable across a wide range of content type and applications. It is best suited for low temporal compression instant skims. It preserves the temporal continuity and eliminates the risk of missing an important event. It can be extended to include semantic inputs such as face or event detection, or can be a presentation end to semantic summarization.
international conference on image processing | 2002
Ajay Divakaran; Regunathan Radhakrishnan; Kadir A. Peker
We describe a key-frame extraction technique based on the intuition that the higher the motion the more the number of key-frames required for summarization. We verify experimentally that the intensity of motion activity directly indicates the summarizability of the video segment, by using the MPEG-7 motion activity descriptor (see Jeannin, S. and Divakaran, A., IEEE Trans. Circuits and Systems for Video Tech., vol.11, no.6, p.720-4, 2001) and the fidelity measure described by H.S. Chang et al. (see IEEE Trans. Circuits and Systems for Video Tech., vol.9, no.8, p.1269-79, 1999). We obtain the key-frames by dividing the shot in parts of equal cumulative motion activity, and then selecting the frame located at the half-way point of each sub-segment. Furthermore, we establish an empirical relationship between the motion activity of a segment and the required number of key-frames. We thus provide a unique and rapid way to find the required number of key-frames and compute them. Our scheme is much faster than conventional color-based key-frame extraction schemes since it relies on simple computation and compressed domain extraction. It is close to the theoretical optimum in accuracy.
IEEE Transactions on Consumer Electronics | 2000
Ajay Divakaran; Anthony Vetro; Kohtaro Asai; Hirofumi Nishikawa
We present a video browsing system that dynamically extracts features and other content-description (meta-data) from compressed video. We devise a description scheme for a video browsing system that combines color and motion descriptors extracted in the compressed domain. The extraction is fast and enables our proposed dynamic feature extraction based video browsing system. We examine the tradeoff between fast feature extraction and accuracy of matching to arrive at a combination of color and motion descriptors that is easy to extract and provides accurate matching.
international conference on multimedia and expo | 2007
Naveen Goela; Kevin W. Wilson; Feng Niu; Ajay Divakaran; Isao Otsuka
We present a novel genre-independent SVM framework for detecting scene changes in broadcast video. Our framework works on content from a diverse range of genres by allowing sets of features, extracted from both audio and video streams, to be combined and compared automatically without the use of explicit thresholds. For ground truth, we use hand-labeled video scene boundaries from a wide variety of broadcast genres to generate positive and negative samples for the SVM. Our experiments include high-and low-level audio features such as semantic histograms and distances between Gaussian models, as well as video features such as shot cut positions. We evaluate the importance of these measures in a structured framework, with performance comparisons obtained via ROC curves. We achieve over 70% detection rate for 10% false positive rate on our corpus of over 7.5 hours of data collected from news, talk shows, sitcoms, dramas, music videos, and how-to shows.
pacific rim conference on multimedia | 2003
Regunathan Radhakrishan; Ziyou Xiong; Ajay Divakaran; Y. Ishikawa
In our past work we have used supervised audio classification to develop a common audio-based platform for highlight extraction that works across three different sports. We then use a heuristic to post-process the classification results to identify interesting events and also to adjust the summary length. In this paper, we propose a combination of unsupervised and supervised learning approaches to replace the heuristic. The proposed unsupervised framework mines the semantic audio-visual labels so as to detect interesting events. We then use a hidden Markov model based approach to control the length of the summary. Our experimental results show that the proposed techniques are promising.
international conference on consumer electronics | 2005
Ajay Divakaran; Clifton Forlines; Tom Lanning; Sam Shipman; Kent Wittenburg
This work describes a set of interfaces for augmenting fast-forward and rewind on consumer digital video recorders. Our method overlays a series of images sampled from the video over top of the traditional full screen accelerated playback. This sequence creates a trail that provides contextual information and highlights upcoming scene changes in the video stream. With this augmentation, consumers are more accurate at traversing to a desired location in a recorded video. This advantage is achieved by taking advantage of compressed-domain processing and adds little computational and storage overhead.
international conference on consumer electronics | 2003
Kazuhiko Nakane; Isao Otsuka; K. Esumi; Ajay Divakaran; T. Murakami
The personal video recorder such as recordable-DVD recorder and/or hard disk recorder has become popular as a large volume storage device for video/audio content and a browsing function that would quickly provide a desired scene to the user is required as an essential part of such a large capacity system. We propose an intra-program content browsing system using not only a combination of motion based video summarization and topic-related metadata in the incoming video stream but also an audio-assisted video browsing feature that enables completely automatic topic-based browsing.
international conference on image analysis and processing | 2007
Ajay Divakaran; Isao Otsuka
We present the worlds first highlights-playback-capable hard disk drive (HDD)-enhanced DVD recorder (personal video recorder-PVR). It automatically detects highlights in sports video by detecting portions with a mixture of the commentators excited speech and cheering, using Gaussian mixture models (GMMs) trained using the MDL criterion. Our computation is carried out directly on the MDCT coefficients from the AC-3 coefficients thus giving us a tremendous speed advantage. Our accuracy of detection of sports highlights is high across a variety of sports. Our user-study shows that viewers like the new functionality even if it makes mistakes. Finally, we propose genre-independent temporal segmentation of non-sports content using computationally inexpensive audio-visual features. Such segmentation enables smart skipping, from one semantic unit to another.
