Vijay Badrinarayanan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vijay Badrinarayanan is active.

Explore More

Publication

Featured researches published by Vijay Badrinarayanan.

computer vision and pattern recognition | 2010

Label propagation in video sequences

Vijay Badrinarayanan; Fabio Galasso; Roberto Cipolla

This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels through the rest of the frames of the video sequence. As a result, the user obtains pixelwise labelled video sequences along with the class probabilities at each pixel. Our novel algorithm provides an essential tool to reduce tedious hand labelling of video sequences, thus producing copious amounts of useable ground truth data. A novel application of this algorithm is in semi-supervised learning of discriminative classifiers for video segmentation and scene parsing. The label propagation scheme can be based on pixel-wise correspondences obtained from motion estimation, image patch based similarities as seen in epitomic models or even the more recent, semantically consistent hierarchical regions. We compare the abilities of each of these variants, both via quantitative and qualitative studies against ground truth data. We then report studies on a state of the art Random forest classifier based video segmentation scheme, trained using fully ground truth data and with data obtained from label propagation. The results of this study strongly support and encourage the use of the proposed label propagation algorithm.

computer vision and pattern recognition | 2011

Semi-supervised video segmentation using tree structured graphical models

Ignas Budvytis; Vijay Badrinarayanan; Roberto Cipolla

We present a novel, implementation friendly and occlusion aware semi-supervised video segmentation algorithm using tree structured graphical models, which delivers pixel labels alongwith their uncertainty estimates. Our motivation to employ supervision is to tackle a task-specific segmentation problem where the semantic objects are pre-defined by the user. The video model we propose for this problem is based on a tree structured approximation of a patch based undirected mixture model, which includes a novel time-series and a soft label Random Forest classifier participating in a feedback mechanism. We demonstrate the efficacy of our model in cutting out foreground objects and multi-class segmentation problems in lengthy and complex road scene sequences. Our results have wide applicability, including harvesting labelled video data for training discriminative models, shape/pose/articulation learning and large scale statistical analysis to develop priors for video segmentation.

computer vision and pattern recognition | 2016

Understanding RealWorld Indoor Scenes with Synthetic Data

Ankur Handa; Viorica Patraucean; Vijay Badrinarayanan; Simon Stent; Roberto Cipolla

Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive when considering the manual labour needed to collect such data. In this work, we focus our attention on depth based semantic per-pixel labelling as a scene understanding problem and show the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models we show comparable performance to state-of-the-art RGBD systems on NYUv2 dataset despite using only depth data as input and set a benchmark on depth-based segmentation on SUN RGB-D dataset.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Semi-Supervised Video Segmentation Using Tree Structured Graphical Models

Vijay Badrinarayanan; Ignas Budvytis; Roberto Cipolla

We present a novel patch-based probabilistic graphical model for semi-supervised video segmentation. At the heart of our model is a temporal tree structure that links patches in adjacent frames through the video sequence. This permits exact inference of pixel labels without resorting to traditional short time window-based video processing or instantaneous decision making. The input to our algorithm is labeled key frame(s) of a video sequence and the output is pixel-wise labels along with their confidences. We propose an efficient inference scheme that performs exact inference over the temporal tree, and optionally a per frame label smoothing step using loopy BP, to estimate pixel-wise labels and their posteriors. These posteriors are used to learn pixel unaries by training a Random Decision Forest in a semi-supervised manner. These unaries are used in a second iteration of label inference to improve the segmentation quality. We demonstrate the efficacy of our proposed algorithm using several qualitative and quantitative tests on both foreground/background and multiclass video segmentation problems using publicly available and our own datasets.

british machine vision conference | 2010

Label propagation in complex video sequences using semi-supervised learning

Ignas Budvytis; Vijay Badrinarayanan; Roberto Cipolla

We propose a novel directed graphical model for label propagation in lengthy and complex video sequences. Given hand-labelled start and end frames of a video sequence, a variational EM based inference strategy propagates either one of several class labels or assigns an unknown class (void) label to each pixel in the video. These labels are used to train a multi-class classifier. The pixel labels estimated by this classifier are injected back into the Bayesian network for another iteration of label inference. The novel aspect of this iterative scheme, as compared to a recent approach [1], is its ability to handle occlusions. This is attributed to a hybrid of generative propagation and discriminative classification in a pseudo time-symmetric video model. The end result is a conservative labelling of the video; large parts of the static scene are labelled into known classes, and a void label is assigned to moving objects and remaining parts of the static scene. These labels can be used as ground truth data to learn the static parts of a scene from videos of it or more generally for semantic video segmentation. We demonstrate the efficacy of the proposed approach using extensive qualitative and quantitative tests over six challenging sequences. We bring out the advantages and drawbacks of our approach, both to encourage its repeatability and motivate future research directions.

european conference on computer vision | 2014

Robust Instance Recognition in Presence of Occlusion and Clutter

Ujwal Bonde; Vijay Badrinarayanan; Roberto Cipolla

We present a robust learning based instance recognition framework from single view point clouds. Our framework is able to handle real-world instance recognition challenges, i.e, clutter, similar looking distractors and occlusion. Recent algorithms have separately tried to address the problem of clutter [9] and occlusion [16] but fail when these challenges are combined. In comparison we handle all challenges within a single framework. Our framework uses a soft label Random Forest [5] to learn discriminative shape features of an object and use them to classify both its location and pose. We propose a novel iterative training scheme for forests which maximizes the margin between classes to improve recognition accuracy, as compared to a conventional training procedure. The learnt forest outperforms template matching, DPM [7] in presence of similar looking distractors. Using occlusion information, computed from the depth data, the forest learns to emphasize the shape features from the visible regions thus making it robust to occlusion. We benchmark our system with the state-of-the-art recognition systems [9,7] in challenging scenes drawn from the largest publicly available dataset. To complement the lack of occlusion tests in this dataset, we introduce our Desk3D dataset and demonstrate that our algorithm outperforms other methods in all settings.

International Journal of Computer Vision | 2014

Mixture of Trees Probabilistic Graphical Model for Video Segmentation

Vijay Badrinarayanan; Ignas Budvytis; Roberto Cipolla

We present a novel mixture of trees probabilistic graphical model for semi-supervised video segmentation. Each component in this mixture represents a tree structured temporal linkage between super-pixels from the first to the last frame of a video sequence. We provide a variational inference scheme for this model to estimate super-pixel labels, their corresponding confidences, as well as the confidences in the temporal linkages. Our algorithm performs inference over full video volume which helps to avoid erroneous label propagation caused by using short time-window processing. In addition, our proposed inference scheme is very efficient both in terms of computational speed and use of RAM and so can be applied in real-time video segmentation scenarios. We bring out the pros and cons of our approach using extensive quantitative comparisons on challenging binary and multi-class video segmentation datasets.

international conference on scale space and variational methods in computer vision | 2013

Multi Scale Shape Index for 3D Object Recognition

Ujwal Bonde; Vijay Badrinarayanan; Roberto Cipolla

We present Multi Scale Shape Index (MSSI), a novel feature for 3D object recognition. Inspired by the scale space filtering theory and Shape Index measure proposed by Koenderink & Van Doorn [6], this feature associates different forms of shape, such as umbilics, saddle regions, parabolic regions to a real valued index. This association is useful for representing an object based on its constituent shape forms. We derive closed form scale space equations which computes a characteristic scale at each 3D point in a point cloud without an explicit mesh structure. This characteristic scale is then used to estimate the Shape Index. We quantitatively evaluate the robustness and repeatability of the MSSI feature for varying object scales and changing point cloud density. We also quantify the performance of MSSI for object category recognition on a publicly available dataset.

IEEE ACM Transactions on Networking | 2016

SCORE: Exploiting Global Broadcasts to Create Offline Personal Channels for On-Demand Access

Gianfranco Nencioni; Nishanth Sastry; Gareth Tyson; Vijay Badrinarayanan; Dmytro Karamshuk; Jigna Chandaria; Jon Crowcroft

The last 5 years have seen a dramatic shift in media distribution. For decades, TV and radio were solely provisioned using push-based broadcast technologies, forcing people to adhere to fixed schedules. The introduction of catch-up services, however, has now augmented such delivery with online pull-based alternatives. Typically, these allow users to fetch content for a limited period after initial broadcast, allowing users flexibility in accessing content. Whereas previous work has investigated both of these technologies, this paper explores and contrasts them, focusing on the network consequences of moving towards this multifaceted delivery model. Using traces from nearly 6 million users of BBC iPlayer, one of the largest catch-up TV services, we study this shift from push- to pull-based access. We propose a novel technique for unifying both push- and pull-based delivery: the Speculative Content Offloading and Recording Engine (SCORE). SCORE operates as a set-top box, which interacts with both broadcast push and online pull services. Whenever users wish to access media, it automatically switches between these distribution mechanisms in an attempt to optimize energy efficiency and network resource utilization. SCORE also can predict user viewing patterns, automatically recording certain shows from the broadcast interface. Evaluations using our BBC iPlayer traces show that, based on parameter settings, an oracle with complete knowledge of user consumption can save nearly 77% of the energy, and over 90% of the peak bandwidth, of pure IP streaming. Optimizing for energy consumption, SCORE can recover nearly half of both traffic and energy savings.

arXiv: Computer Vision and Pattern Recognition | 2015

SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes.

Ankur Handa; Viorica Patraucean; Vijay Badrinarayanan; Simon Stent; Roberto Cipolla

We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any noise modelling. We present encouraging preliminary results.

Explore More