Rajvi Shah
International Institute of Information Technology, Hyderabad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rajvi Shah.
IEEE Transactions on Circuits and Systems for Video Technology | 2013
Rajvi Shah; P. J. Narayanan
Traditional video editing interfaces model and represent videos as a collection of frames against a timeline, which makes the object-centric manipulation of videos a laborious task. We enable a simple and meaningful interaction for object-centric navigation and manipulation of long shot videos by introducing operators on three high-level video semantics: background mosaics, object motions, and camera motions. We estimate the scene background and represent the object motion using 3-D space-time trajectories. We use the 3-D object trajectories as basic interaction elements, and define several object and camera operations as simple and intuitive curve manipulations. These allow users to perform various video object temporal manipulations by interactively manipulating the object trajectories. The camera operations model the camera as a movable and scalable aperture and allow the users to simulate pan, tilt, and zoom effects by creating new camera trajectories. With several example compositions, we demonstrate that our representation and operations allow users to simply and interactively perform numerous seemingly complex, high-level video manipulation tasks.
conference on visual media production | 2012
Rajvi Shah; Vivek Kwatra
We propose a framework for automatic enhancement of group photographs by facial expression analysis. We are motivated by the observation that group photographs are seldom perfect. Subjects may have inadvertently closed their eyes, may be looking away, or may not be smiling at that moment. Given a set of photographs of the same group of people, our algorithm uses facial analysis to determine a goodness score for each face instance in those photos. This scoring function is based on classifiers for facial expressions such as smiles and eye-closure, trained over a large set of annotated photos. Given these scores, a best composite for the set is synthesized by (a) selecting the photo with the best overall score, and (b) replacing any low-scoring faces in that photo with high-scoring faces of the same person from other photos, using alignment and seamless composition.
workshop on applications of computer vision | 2015
Rajvi Shah; Vanshika Srivastava; P. J. Narayanan
We present a two-stage, geometry-aware approach for matching SIFT-like features in a fast and reliable manner. Our approach first uses a small sample of features to estimate the epipolar geometry between the images and leverages it for guided matching of the remaining features. This simple and generalized two-stage matching approach produces denser feature correspondences while allowing us to formulate an accelerated search strategy to gain significant speedup over the traditional matching. The traditional matching punitively rejects many true feature matches due to a global ratio test. The adverse effect of this is particularly visible when matching image pairs with repetitive structures. The geometry-aware approach prevents such pre-emptive rejection using a selective ratio-test and works effectively even on scenes with repetitive structures. We also show that the proposed algorithm is easy to parallelize and implement it on the GPU. We experimentally validate our algorithm on publicly available datasets and compare the results with state-of-the-art methods.
international conference on 3d vision | 2014
Rajvi Shah; Aditya Deshpande; P. J. Narayanan
In this paper, we present a new multistage approach for SfM reconstruction of a single component. Our method begins with building a coarse 3D reconstruction using high-scale features of given images. This step uses only a fraction of features and is fast. We enrich the model in stages by localizing remaining images to it and matching and triangulating remaining features. Unlike traditional incremental SfM, localization and triangulation steps in our approach are made efficient and embarrassingly parallel using geometry of the coarse model. The coarse model allows us to use 3D-2D correspondences based direct localization techniques to register remaining images. We further utilize the geometry of the coarse model to reduce the pair-wise image matching effort as well as to perform fast guided feature matching for majority of features. Our method produces similar quality models as compared to incremental SfM methods while being notably fast and parallel. Our algorithm can reconstruct a 1000 images dataset in 15 hours using a single core, in about 2 hours using 8 cores and in a few minutes by utilizing full parallelism of about 200 cores.
indian conference on computer vision, graphics and image processing | 2016
Aditya Singh; Saurabh Saini; Rajvi Shah; P. J. Narayanan
User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of ∼ 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.
german conference on pattern recognition | 2016
Aditya Singh; Saurabh Saini; Rajvi Shah; P. J. Narayanan
Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilize semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlabled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilize a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.
conference on multimedia modeling | 2018
Saumya Rawat; Siddhartha Gairola; Rajvi Shah; P. J. Narayanan
Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. We propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of \(\sim \)1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to \(L_2\) similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method.
international conference on multimedia and expo | 2017
Ishit Mehta; Parikshit Sakurikar; Rajvi Shah; P. J. Narayanan
Smartphones have become the de-facto capture devices for everyday photography. Unlike traditional digital cameras, smartphones are versatile devices with auxiliary sensors, processing power, and networking capabilities. In this work, we harness the communication capabilities of smartphones and present a synchronous/co-ordinated multi-camera capture system. Synchronous capture is important for many image/video fusion and 3D reconstruction applications. The proposed system provides an inexpensive and effective means to capture multi-camera media for such applications. Our coordinated capture system is based on a wireless protocol that uses NTP based synchronization and device specific lag compensation. It achieves sub-frame synchronization across all participating smartphones of even heterogeneous make and model. We propose a new method based on fiducial markers displayed on an LCD screen to temporally calibrate smart-phone cameras. We demonstrate the utility and versatility of this system to enhance traditional videography and to create novel visual representations such as panoramic videos, HDR videos, multi-view 3D reconstruction, multi-flash imaging, and multi-camera social media.
Archive | 2017
Rajvi Shah; Aditya Deshpande; Anoop M. Namboodiri; P. J. Narayanan
Several methods have been proposed for large-scale 3D reconstruction from large, unorganized image collections. A large reconstruction problem is typically divided into multiple components which are reconstructed independently using structure from motion (SFM) and later merged together. Incremental SFM methods are most popular for the basic structure recovery of a single component. They are robust and effective but strictly sequential in nature. We present a multistage approach for SFM reconstruction of a single component that breaks the sequential nature of the incremental SFM methods. Our approach begins with quickly building a coarse 3D model using only a fraction of features from given images. The coarse model is then enriched by localizing remaining images and matching and triangulating remaining features in subsequent stages. The geometric information available in the form of the coarse model allows us to make these stages effective, efficient, and highly parallel. We show that our method produces similar quality models as compared to standard SFM methods while being notably fast and parallel.
Archive | 2012
Vivek Kwatra; Rajvi Shah