Ankur Handa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ankur Handa is active.

Explore More

Publication

Featured researches published by Ankur Handa.

international conference on robotics and automation | 2014

A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM

Ankur Handa; Thomas Whelan; John McDonald; Andrew J. Davison

We introduce the Imperial College London and National University of Ireland Maynooth (ICL-NUIM) dataset for the evaluation of visual odometry, 3D reconstruction and SLAM algorithms that typically use RGB-D data. We present a collection of handheld RGB-D camera sequences within synthetically generated environments. RGB-D sequences with perfect ground truth poses are provided as well as a ground truth surface model that enables a method of quantitatively evaluating the final map or surface reconstruction accuracy. Care has been taken to simulate typically observed real-world artefacts in the synthetic imagery by modelling sensor noise in both RGB and depth data. While this dataset is useful for the evaluation of visual odometry and SLAM trajectory estimation, our main focus is on providing a method to benchmark the surface reconstruction accuracy which to date has been missing in the RGB-D community despite the plethora of ground truth RGB-D datasets available.

european conference on computer vision | 2012

Real-Time camera tracking: when is high frame-rate best?

Ankur Handa; Richard A. Newcombe; Adrien Angeli; Andrew J. Davison

Higher frame-rates promise better tracking of rapid motion, but advanced real-time vision systems rarely exceed the standard 10–60Hz range, arguing that the computation required would be too great. Actually, increasing frame-rate is mitigated by reduced computational cost per frame in trackers which take advantage of prediction. Additionally, when we consider the physics of image formation, high frame-rate implies that the upper bound on shutter time is reduced, leading to less motion blur but more noise. So, putting these factors together, how are application-dependent performance requirements of accuracy, robustness and computational cost optimised as frame-rate varies? Using 3D camera tracking as our test problem, and analysing a fundamental dense whole image alignment approach, we open up a route to a systematic investigation via the careful synthesis of photorealistic video using ray-tracing of a detailed 3D scene, experimentally obtained photometric response and noise models, and rapid camera motions. Our multi-frame-rate, multi-resolution, multi-light-level dataset is based on tens of thousands of hours of CPU rendering time. Our experiments lead to quantitative conclusions about frame-rate selection and highlight the crucial role of full consideration of physical image formation in pushing tracking performance.

british machine vision conference | 2014

Simultaneous mosaicing and tracking with an event camera

Hanme Kim; Ankur Handa; Ryad Benosman; Sio-Hoi Ieng; Andrew J. Davison

© 2014. The copyright of this document resides with its authors. An event camera is a silicon retina which outputs not a sequence of video frames like a standard camera, but a stream of asynchronous spikes, each with pixel location, sign and precise timing, indicating when individual pixels record a threshold log intensity change. By encoding only image change, it offers the potential to transmit the information in a standard video but at vastly reduced bitrate, and with huge added advantages of very high dynamic range and temporal resolution. However, event data calls for new algorithms, and in particular we believe that algorithms which incrementally estimate global scene models are best placed to take full advantages of its properties. Here, we show for the first time that an event stream, with no additional sensing, can be used to track accurate camera rotation while building a persistent and high quality mosaic of a scene which is super-resolution accurate and has high dynamic range. Our method involves parallel camera rotation tracking and template reconstruction from estimated gradients, both operating on an event-by-event basis and based on probabilistic filtering.

computer vision and pattern recognition | 2016

Understanding RealWorld Indoor Scenes with Synthetic Data

Ankur Handa; Viorica Patraucean; Vijay Badrinarayanan; Simon Stent; Roberto Cipolla

Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive when considering the manual labour needed to collect such data. In this work, we focus our attention on depth based semantic per-pixel labelling as a scene understanding problem and show the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes. By carefully synthesizing training data with appropriate noise models we show comparable performance to state-of-the-art RGBD systems on NYUv2 dataset despite using only depth data as input and set a benchmark on depth-based segmentation on SUN RGB-D dataset.

international conference on robotics and automation | 2017

SemanticFusion: Dense 3D semantic mapping with convolutional neural networks

John McCormac; Ankur Handa; Andrew J. Davison; Stefan Leutenegger

Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need to extend beyond geometry and appearance — they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNNs semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.

european conference on computer vision | 2016

gvnn: neural network library for geometric computer vision

Ankur Handa; Michael Bloesch; Viorica Pătrăucean; Simon Stent; John McCormac; Andrew J. Davison

We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through warping with a parametric transformation for image reconstruction error.

computer vision and pattern recognition | 2010

Scalable active matching

Ankur Handa; Margarita Chli; Hauke Strasdat; Andrew J. Davison

In matching tasks in computer vision, and particularly in real-time tracking from video, there are generally strong priors available on absolute and relative correspondence locations thanks to motion and scene models. While these priors are often partially used post-hoc to resolve matching consensus in algorithms like RANSAC, it was recently shown that fully integrating them in an ‘Active Matching’ (AM) approach permits efficient guided image processing with rigorous decisions guided by Information Theory. AMs weakness was that the overhead induced by intermediate Bayesian updates required meant poor scaling to cases where many correspondences were sought. In this paper we show that relaxation of the rigid probabilistic model of AM, where every feature measurement directly affects the prediction of every other, permits dramatically more scalable operation without affecting accuracy. We take a general graph-theoretic view of the structure of prior information in matching to sparsify and approximate the interconnections. We demonstrate the performance of two variations, CLAM and SubAM, in the context of sequential camera tracking. These algorithms are highly competitive with other techniques at matching hundreds of features per frame while retaining great intuitive appeal and the full probabilistic capability to digest prior information.

international conference on robotics and automation | 2016

SceneNet: An annotated model generator for indoor scene understanding

Ankur Handa; Viorica Patraucean; Simon Stent; Roberto Cipolla

We introduce SceneNet, a framework for generating high-quality annotated 3D scenes to aid indoor scene understanding. SceneNet leverages manually-annotated datasets of real world scenes such as NYUv2 to learn statistics about object co-occurrences and their spatial relationships. Using a hierarchical simulated annealing optimisation, these statistics are exploited to generate a potentially unlimited number of new annotated scenes, by sampling objects from various existing databases of 3D objects such as ModelNet, and textures such as OpenSurfaces and ArchiveTextures. Depending on the task, SceneNet can be used directly in the form of annotated 3D models for supervised training and 3D reconstruction benchmarking, or in the form of rendered annotated sequences of RGB-D frames or videos.

international conference information processing | 2014

Robust Real-Time Visual Odometry for Stereo Endoscopy Using Dense Quadrifocal Tracking

Ping-Lin Chang; Ankur Handa; Andrew J. Davison; Danail Stoyanov; Philip “Eddie” Edwards

Visual tracking in endoscopic scenes is known to be a difficult task due to the lack of textures, tissue deformation and specular reflection. In this paper, we devise a real-time visual odometry framework to robustly track the 6-DoF stereo laparoscope pose using the quadrifocal relationship. The instant motion of a stereo camera creates four views which can be constrained by the quadrifocal geometry. Using the previous stereo pair as a reference frame, the current pair can be warped back by minimising a photometric error function with respect to a camera pose constrained by the quadrifocal geometry. Using a robust estimator can further remove the outliers caused by occlusion, deformation and specular highlights during the optimisation. Since the optimisation uses all pixel data in the images, it results in a very robust pose estimation even for a textureless scene. The quadrifocal geometry is initialised by using real-time stereo reconstruction algorithm which can be efficiently parallelised and run on the GPU together with the proposed tracking framework. Our system is evaluated using a ground truth synthetic sequence with a known model and we also demonstrate the accuracy and robustness of the approach using phantom and real examples of endoscopic augmented reality.

arXiv: Computer Vision and Pattern Recognition | 2015

SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes.

Ankur Handa; Viorica Patraucean; Vijay Badrinarayanan; Simon Stent; Roberto Cipolla

We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any noise modelling. We present encouraging preliminary results.

Explore More