Partha Pratim Mohanta
Indian Statistical Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Partha Pratim Mohanta.
IEEE Transactions on Multimedia | 2012
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
We have presented a unified model for detecting different types of video shot transitions. Based on the proposed model, we formulate frame estimation scheme using the previous and the next frames. Unlike other shot boundary detection algorithms, instead of properties of frames, frame transition parameters and frame estimation errors based on global and local features are used for boundary detection and classification. Local features include scatter matrix of edge strength and motion matrix. Finally, the frames are classified as no change (within shot frame), abrupt change, or gradual change frames using a multilayer perceptron network. The proposed method is relatively less dependent on user defined thresholds and is free from sliding window size as widely used by various schemes found in the literature. Moreover, handling both abrupt and gradual transitions along with non-transition frames under a single framework using model guided visual feature is another unique aspect of the work.
pattern recognition and machine intelligence | 2007
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
We have presented a unified model for various types of video shot transitions. Based on that model, we adhere to frame estimation scheme using previous and next frames. The frame parameters accompanied by a scatter measure of edge strength and average intensity constitute the feature vector of a frame. Finally, the frames are classified as no change (within shot frame), abrupt change or gradual change frames using a multilayer perceptron network. The scheme is free from the problems of selecting thresholds and/or window size as used by various schemes. Moreover, the handling of both, abrupt and gradual transitions along with non-transition frames under a single and uniform framework is the unique feature of the work.
indian conference on computer vision, graphics and image processing | 2010
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
In this paper, we present a novel scheme for segmenting video data into scenes. Based on visual similarity, the shots are first classified into clusters using modified k-means algorithm. Number of optimal clusters is decided using cluster validity analysis based on Davies-Bouldin index. Each shot is assigned a tag denoting the cluster it belongs to. Thus, the video data is represented by a sequence of cluster tags. The sequence is then analyzed by introducing the concept of stable and quasi-stable state. The elements of the sequence are merged into states and isolated elements are linked with the states to generate the scenes. The scheme is free from the dependency on critical parameters and capable of handling different types of scenes.
international conference on pattern recognition | 2008
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
For efficient indexing, browsing and retrieval of video data and also for video summarization, extraction of representative frames is essential. Once a video stream is segmented into shots, the representative frames or key-frames for the shot are selected. Automatic selection of suitable representatives for a wide variety of shots is still a challenge as the number of such frames in a shot may also vary depending on the variation in the content. In this work, we propose a novel scheme that relies on Wald-Wolfowitz runs test based hypothesis testing to detect the subshots within a shot and then for each subshot, the frame rendering the highest fidelity is extracted as the key-frame. Experimental result shows that the scheme works satisfactorily for a wide variety of shots.
international conference on pattern recognition | 2002
Partha Pratim Mohanta; Dipti Prasad Mukherjee; Scott T. Acton
The paper presents an agglomerative clustering technique for image segmentation. To initiate agglomeration, a set of homogeneous segments is found in the image using level set analysis. A relational matrix is then defined establishing relations between neighbouring segments present in the image. These relations are derived based on the intensity and boundary features of the segments. The agglomeration is performed on this relationship based on asymmetric agglomerative clustering criteria. The performance is demonstrated through results on a number of natural images and through cluster validity criteria.
international conference on advances in pattern recognition | 2009
Partha Pratim Mohanta; Sanjoy Kumar Saha
Semantic grouping of the shots in a video can bethought of as first step towards scene detection. It also facilitates the easy identification of visually similar scenes. Such grouping also help in the creation of semantic content table and efficient content browsing. In this work, we present an effective scheme to form such groupings. We address the important issue of representing a shot with the help of keyframes and other sampled frames from the shot. Finally, the content of the shot is denoted by the low-level feature vectors corresponding to the representative frames. Similar shots are grouped following a modified k-means clustering algorithm. Modifications have been incorporated to accommodate the differing roles played by the keyframes and sampled frames. We have carried out the experiment with different type of video data and result obtained is satisfactory.
indian conference on computer vision, graphics and image processing | 2008
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
Detection of representative frames, also called key-frames, is essential for efficient indexing, browsing and retrieval of video data and also for video summarization. Once a video stream is segmented into shots, the representative frames or key-frames for the shot are selected. The number of such frames in a shot may vary depending on the variation in the content. Thus, for a wide variety of shots automatic selection of suitable number of representative frames still remains a challenge. In this work, we propose a novel scheme for key-frame detection by dividing an available shot into subshots using hypothesis testing and majority voting. Each subshot is supposed to be uniform in terms of visual content. Then for each subshot, the frame rendering the highest fidelity is extracted as the key-frame. Experimental result shows that the scheme works satisfactorily for a wide variety of shots.
International Journal of Image and Graphics | 2013
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
Storyboard consisting of key-frames is a popular format of video summarization as it helps in efficient indexing, browsing and partial or complete retrieval of video. In this paper, we have presented a size constrained storyboard generation scheme. Given the shots i.e. the output of the video segmentation process, the method has two major steps: extraction of appropriate key-frame(s) from each shot and finally, selection of a specified number of key-frames from the set thus obtained. The set of selected key-frames should retain the variation in visual content originally possessed by the video. The number of key-frames or representative frames in a shot may vary depending on the variation in its visual content. Thus, automatic selection of suitable number of representative frames from a shot still remains a challenge. In this work, we propose a novel scheme for detecting the sub-shots, having consistent visual content, from a shot using Wald–Wolfowitz runs test. Then from each sub-shot a frame rendering the highest fidelity is extracted as key-frame. Finally, a spanning tree based novel method is proposed to select a subset of key-frames having specific cardinality. Chronological arrangement of such frames generates the size constrained storyboard. Experimental result and comparative study show that the scheme works satisfactorily for a wide variety of shots. Moreover, the proposed technique rectifies mis-detection error, if any, incurred in video segmentation process. Similarly, though not implemented, the proposed hypothesis test has ability to rectify the false-alarm in shot detection if it is applied on pair of adjacent shots.
international conference on internet multimedia computing and service | 2009
Partha Pratim Mohanta; Sanjoy Kumar Saha; Bhabatosh Chanda
Video storyboard is a common way to render the summarized view of video data. Video is first segmented and a representative frame from each of the segmented units are ordered chronologically to form the storyboard. The representative frames also can be organized to form Table of Content(ToC). But size of such storyboard/ToC may be prohibitive in the applications where the views are to be transmitted over a limited bandwidth. In this work, we propose a spanning tree based novel method to select a subset of representative frames of specific size. It is also shown that the proposed method is powerful enough that it can generate almost similar storyboard from a set of equi-spaced samples (of frames) from the video stream without explicit segmentation or shot detection.
pattern recognition and machine intelligence | 2013
Partha Pratim Mohanta; Sudipta Chowdhury; Arnab Roy; Sanjoy Kumar Saha; Bhabatosh Chanda
The common practice for providing a static summarized view of a video is to create a storyboard. Storyboard is the chronological arrangement of the representative frames. Shot level storyboard suffers from redundancy as in a scene constituting shots normally repeat. Also the size of such storyboard is a constraint for many application. In this work we have considered scene as the more meaningful unit. We propose a state-based scene segmentation algorithm and also a minimal spanning tree based novel method to select the representative frames for the scenes. Storyboard consisting of scene level representative frames are much more compact than shot level storyboard. Moreover, scene being the semantic unit, flow of semantic content of the video data is well preserved. Experimental result confirms the claim.