Junaid Baber
Asian Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Junaid Baber.
international conference on digital signal processing | 2011
Junaid Baber; Nitin Afzulpurkar; Matthew N. Dailey; Maheen Bakhtyar
Video shot segmentation is an important step in key frame selection, video copy detection, video summarization, and video indexing for retrieval. Although some types of video data, e.g., live sports coverage, have abrupt shot boundaries that are easy to identify using simple heuristics, it is much more difficult to identify shot boundaries in other types such as cinematic movies. We propose an algorithm for shot boundary detection able to accurately identify not only abrupt shot boundaries, but also the fade-in and fade-out boundaries typical of cinematic movies. The algorithm is based on analysis of changes in the entropy of the gray scale intensity over consecutive frames and analysis of correspondences between SURF features over consecutive frames. In an experimental evaluation on the TRECVID-2007 shot boundary test set, the algorithm achieves substantial improvements over state of the art methods, with a precision of 97.8% and a recall of 99.3%.
International Journal of Pattern Recognition and Artificial Intelligence | 2013
Junaid Baber; Nitin Afzulpurkar; Shin'ichi Satoh
Rapid increase in video databases has forced the industry to have efficient and effective frameworks for video retrieval and indexing. Video segmentation into scenes is widely used for video summarization, partitioning, indexing and retrieval. In this paper, we propose a framework for scene detection mainly based on entropy and Speeded Up Robust Features (SURF) features. First, we detect the fade and abrupt boundaries based on frame entropy analysis and SURF features matching. Fade boundaries are smart indication of scenes beginning or ending in many videos and dramas, and are detected by frame entropy analysis. Before abrupt boundary detection, unnecessary frames which are obviously not abrupt boundaries, such as blank screens, high intensity influenced images, sliding credits, are removed. Candidate boundaries are detected to make SURF features efficient for abrupt boundary detection, and SURF features between candidate boundaries and their adjacent frames are used to detect the abrupt boundaries. Second, key frames are extracted from abrupt shots. We evaluate our key frame extraction with other famous algorithms and show the effectiveness of the key frames. Finally, scene boundaries are detected using sliding window of size K over the key frames in temporal order. In experimental evaluation on the TRECVID-2007 shot boundary test set, the algorithm for shot boundary achieves substantial improvements over state-of-the-art methods with the precision of 99% and the recall of 97.8%. Experimental results for video segmentation into scenes are also promising, compared to famous state-of-the-art techniques.
Image and Vision Computing | 2014
Junaid Baber; Matthew N. Dailey; Shin'ichi Satoh; Nitin Afzulpurkar; Maheen Bakhtyar
Extracting local keypoints and keypoint descriptions from images is a primary step for many computer vision and image retrieval applications. In the literature, many researchers have proposed methods for representing local texture around keypoints with varying levels of robustness to photometric and geometric transformations. Gradient-based descriptors such as the Scale Invariant Feature Transform (SIFT) are among the most consistent and robust descriptors. The SIFT descriptor, a 128-element vector consisting of multiple gradient histograms computed from local image patches around a keypoint, is widely considered as the gold standard keypoint descriptor. However, SIFT descriptors require at least 128bytes of storage per descriptor. Since images are typically described by thousands of keypoints, it may require more space to store the SIFT descriptors for an image than the original image itself. This may be prohibitive in extremely large-scale applications and applications on memory-constrained devices such as tablets and smartphones. In this paper, with the goal of reducing the memory requirements of keypoint descriptors such as SIFT, without affecting their performance, we propose BIG-OH, a simple yet extremely effective method for binary quantization of any descriptor based on gradient orientation histograms. BIG-OHs memory requirements are very small-when it uses SIFTs default parameters for the construction of the gradient orientation histograms, it only requires 16bytes per descriptor. BIG-OH quantizes gradient orientation histograms by computing a bit vector representing the relative magnitudes of local gradients associated with neighboring orientation bins. In a series of experiments on keypoint matching with different types of keypoint detectors under various photometric and geometric transformations, we find that the quantized descriptor has performance comparable to or better than other descriptors, including BRISK, CARD, BRIEF, D-BRIEF, SQ, and PCA-SIFT. Our experiments also show that BIG-OH is extremely effective for image retrieval, with modestly better performance than SIFT. BIG-OHs drastic reduction in memory requirements, obtained while preserving or improving the image matching and image retrieval performance of SIFT, makes it an excellent descriptor for large image databases and applications running on memory-constrained devices. BIG-OH, binary quantization of gradient orientation based descriptors, is proposed.Quantized SIFT descriptors reduce memory by 88% compared to classical SIFT.BIG-OH has performance comparable to SIFT and GLOH.BIG-OH has better performance than BRISK, CARD, BRIEF, and other descriptors.BIG-OH is effective for large scale applications such as copy detection.
international conference on emerging technologies | 2011
Junaid Baber; Nitin Afzulpurkar; Maheen Bakhtyar
In this paper we present a framework for video segmentation into scenes. Segmenting videos into scenes is the basic step for video analysis, efficient video indexing and content-based video retrieval. In our framework we used frame entropy and SURF descriptor to find shot boundaries from the videos. We extracted key frames from each shot and segmented the video into semantic scenes by key frame matching. The proposed algorithm gave promising results when applied to different genres of videos and dramas.
international conference on internet multimedia computing and service | 2013
Junaid Baber; Shin'ichi Satoh; Nitin Afzulpurkar; Chadaporn Keatmanee
With the advancement in multimedia technologies, the video databases are exponentially increasing in size which are creating many challenges for efficient indexing and retrieval of the videos. Interactive and efficient search engines allow users to query some part of the videos or search some particular scenes in the video databases. Segmentation and retrieval of videos scenes are getting popular which make video indexing more flexible and efficient. In this paper, we automatically segment the videos into scenes (visually related video shots). We first segment the videos into shots and then merge the shots which are visually similar. We represent shots by bag of visual words (BoVW) model and compute the similarity of shots with each other within the sliding window of length L, sliding window makes similarity computation efficient as the similarity of each shot is computed with its L neighbors only, instead of whole pool of shots. Experiments on cinematic videos and dramas show the effectiveness of proposed technique.
advances in multimedia | 2012
Junaid Baber; Shin'ichi Satoh; Nitin Afzulpurkar; Maheen Bakhtyar
Center Symmetric-Local Binary Pattern (CSLBP) is textured based operator which is mostly used as keypoint descriptor, it is 256-length descriptor to represent single keypoint or affine patch. This operator is an extension of Local Binary Pattern (LBP) operator. The CSLBP descriptor is computationally simple, effective, and robust for various image transformations such as illumination change and image blurring. However, the space and time utilization of CSLBP can be improved by simple compression which can make CSLBP a smart selection for large databases and smart phones. In this paper, we propose simple compression of CSLBP without loss of its discriminative power. We reduce the descriptor length (dimensions) upto 50% without applying any dimensionality reduction techniques such as PCA or LDA. We evaluate our framework on state-of-the-art matching protocols and compare the effectiveness of proposed compressed descriptor (Q-CSLBP) with CSLBP, SIFT and PCA-SIFT.
international conference on telecommunications | 2013
Junaid Baber; Shin'ichi Satoh; Chadaporn Keatmanee; Nitin Afzulpurkar
Image copy detection is widely used for many applications such as content based image retrieval, image piracy detection, object retrieval, and near duplicate detection. Local features are mostly used to represent the images in database corpus. Initially, keypoints are detected and represented by some distinctive and robust descriptors. The descriptors are computed from the affine local patches around the keypoints. These patches play vital roles for descriptors performance. The main limitation of state-of-the art descriptors include lack of robustness and distinctiveness under severe image distortions, and local patches around keypoints cannot capture distinctive spatial information and structure context. We propose an effective technique for descriptors computation for image copy detection by adding more spatial information from the vicinity of the keypoints. We show experimentally that the performance for retrieving image copies is improved under severe image distortions and attacks. On average, the robustness of SIFT is increased upto 22% and distinctiveness upto 30% for image copy detection task.
Information Processing and Management | 2013
Sher Muhammad Doudpota; Sumanta Guha; Junaid Baber
Musical sequences with actors dancing and lip-synching to songs sung by playback singers are integral parts, particularly of South Asian movies. Fans seek out movies for their songs and they often seek songs of a particular genre. In fact, song and dance sequence of South Asian movies are an industry of their own. Given the huge numbers of movies produced in South Asia over the past decades, most of which are in digital archives, it is an important problem to automatically extract and categorise their musical sequences. This paper proposes a system for musical sequences extraction from movies. Our method invokes an SVM-based classifier and makes as well a novel application of probabilistic timed automaton to distinguish musical sequences from non-musical. Our system analyses both audio and video signals to give a classifier that not only extracts musical sequences from movies but identifies their genre. We achieved a recall of 93.24% with precision of 87.34% in song extraction when applied on 10 popular Bollywood movies. An accuracy of 89.5% has been achieved on Bollywood song genre identification.
International Journal of Advanced Computer Science and Applications | 2016
Junaid Baber; Maheen Bakhtyar; Waheed Noor; Abdul Basit; Ihsan Ullah
Images have become main sources for the informa-tion, learning, and entertainment, but due to the advancement and progress in multimedia technologies, millions of images are shared on Internet daily which can be easily duplicated and redistributed. Distribution of these duplicated and transformed images cause a lot of problems and challenges such as piracy, redundancy, and content-based image indexing and retrieval. To address these problems, copy detection system based on local features are widely used. Initially, keypoints are detected and represented by some robust descriptors. The descriptors are computed over the affine patches around the keypoints, these patches should be repeatable under photometric and geometric transformations. However, there exist two main challenges with patch based descriptors, (1) the affine patch over the keypoint can produce similar descriptors under entirely different scene or the context which causes “ambiguity”, and (2) the descriptors are not enough “distinctive” under image noise. Due to these limitations, the copy detection systems suffer in performance. We present a framework that makes descriptor more distinguishable and robust by influencing them with the texture and gradients in vicinity. The experimental evaluation on keypoints matching and image copy detection under severe transformations shows the effectiveness of the proposed framework.
digital image computing techniques and applications | 2015
Erum Fida; Junaid Baber; Maheen Bakhtyar; Muhammad Javid Iqbal
Image segmentation is one of the most significant tasks in computer vision. Since automatic techniques are hard for this purpose, a number of interactive techniques are used for image segmentation. The result of these techniques largely depends on user feedback. It is difficult to get good interactions for large databases. On the other hand, automatic image segmentation is becoming a significant objective in computer vision and image analysis. We propose an automatic framework to detect foreground. We are applying Maximal Similarity Based Region Merging (MSRM) technique for region merging and using image boundary to identify foreground regions. The results confirm the effectiveness of the proposed framework. The proposed framework reveals its effectiveness especially to extract multiple objects from background.