Julien P. C. Valentin
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julien P. C. Valentin.
international conference on computer graphics and interactive techniques | 2016
Jonathan Taylor; Lucas Bordeaux; Thomas J. Cashman; Bob Corish; Cem Keskin; Toby Sharp; Eduardo Soto; David Sweeney; Julien P. C. Valentin; Benjamin Luff; Arran Haig Topalian; Erroll Wood; Sameh Khamis; Pushmeet Kohli; Shahram Izadi; Richard Banks; Andrew W. Fitzgibbon; Jamie Shotton
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Todays dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.
computer vision and pattern recognition | 2013
Julien P. C. Valentin; Sunando Sengupta; Jonathan Warrell; Ali Shahrokni; Philip H. S. Torr
Semantic reconstruction of a scene is important for a variety of applications such as 3D modelling, object recognition and autonomous robotic navigation. However, most object labelling methods work in the image domain and fail to capture the information present in 3D space. In this work we propose a principled way to generate object labelling in 3D. Our method builds a triangulated meshed representation of the scene from multiple depth estimates. We then define a CRF over this mesh, which is able to capture the consistency of geometric properties of the objects present in the scene. In this framework, we are able to generate object hypotheses by combining information from multiple sources: geometric properties (from the 3D mesh), and appearance properties (from images). We demonstrate the robustness of our framework in both indoor and outdoor scenes. For indoor scenes we created an augmented version of the NYU indoor scene dataset (RGBD images) with object labelled meshes for training and evaluation. For outdoor scenes, we created ground truth object labellings for the KITTY odometry dataset (stereo image sequence). We observe a significant speed-up in the inference stage by performing labelling on the mesh, and additionally achieve higher accuracies.
ACM Transactions on Graphics | 2015
Julien P. C. Valentin; Vibhav Vineet; Ming-Ming Cheng; David Kim; Jamie Shotton; Pushmeet Kohli; Matthias Nießner; Antonio Criminisi; Shahram Izadi; Philip H. S. Torr
We present a new interactive and online approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the environment. Unlike offline systems where capture, labeling, and batch learning often take hours or even days to perform, our approach is fully online. This provides users with continuous live feedback of the recognition during capture, allowing to immediately correct errors in the segmentation and/or learning—a feature that has so far been unavailable to batch and offline methods. This leads to models that are tailored or personalized specifically to the users environments and object classes of interest, opening up the potential for new applications in augmented reality, interior design, and human/robot navigation. It also provides the ability to capture substantial labeled 3D datasets for training large-scale visual recognition systems.
computer vision and pattern recognition | 2015
Julien P. C. Valentin; Matthias NieBner; Jamie Shotton; Andrew W. Fitzgibbon; Shahram Izadi; Philip H. S. Torr
Recent advances in camera relocalization use predictions from a regression forest to guide the camera pose optimization procedure. In these methods, each tree associates one pixel with a point in the scenes 3D world coordinate frame. In previous work, these predictions were point estimates and the subsequent camera pose optimization implicitly assumed an isotropic distribution of these estimates. In this paper, we train a regression forest to predict mixtures of anisotropic 3D Gaussians and show how the predicted uncertainties can be taken into account for continuous pose optimization. Experiments show that our proposed method is able to relocalize up to 40% more frames than the state of the art.
international conference on robotics and automation | 2016
Olaf Kähler; Victor Adrian Prisacariu; Julien P. C. Valentin; David W. Murray
Many modern 3D reconstruction methods accumulate information volumetrically using truncated signed distance functions. While this usually imposes a regular grid with fixed voxel size, not all parts of a scene necessarily need to be represented at the same level of detail. For example, a flat table needs less detail than a highly structured keyboard on it. We introduce a novel representation for the volumetric 3D data that uses hash functions rather than trees for accessing individual blocks of the scene, but which still provides different resolution levels. We show that our data structure provides efficient access and manipulation functions that can be very well parallelised, and also describe an automatic way of choosing appropriate resolutions for different parts of the scene. We embed the novel representation in a system for simultaneous localization and mapping from RGB-D imagery and also investigate the implications of the irregular grid on interpolation routines. Finally, we evaluate our system in experiments, demonstrating state-of-the-art representation accuracy at typical frame-rates around 100 Hz, along with 40% memory savings.
computer vision and pattern recognition | 2017
Tommaso Cavallari; Stuart Golodetz; Nicholas A. Lord; Julien P. C. Valentin; Luigi Di Stefano; Philip H. S. Torr
Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known poses coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the camera pose. Recently, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but must be trained offline on the target scene, preventing relocalisation in new environments. In this paper, we show how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. Our adapted forests achieve relocalisation performance that is on par with that of offline forests, and our approach runs in under 150ms, making it desirable for real-time systems that require online relocalisation.
computer vision and pattern recognition | 2017
Sean Ryan Fanello; Julien P. C. Valentin; Christoph Rhemann; Adarsh Kowdle; Vladimir Tankovich; Philip L. Davidson; Shahram Izadi
Efficient estimation of depth from pairs of stereo images is one of the core problems in computer vision. We efficiently solve the specialized problem of stereo matching under active illumination using a new learning-based algorithm. This type of active stereo i.e. stereo matching where scene texture is augmented by an active light projector is proving compelling for designing depth cameras, largely due to improved robustness when compared to time of flight or traditional structured light techniques. Our algorithm uses an unsupervised greedy optimization scheme that learns features that are discriminative for estimating correspondences in infrared images. The proposed method optimizes a series of sparse hyperplanes that are used at test time to remap all the image patches into a compact binary representation in O(1). The proposed algorithm is cast in a PatchMatch Stereo-like framework, producing depth maps at 500Hz. In contrast to standard structured light methods, our approach generalizes to different scenes, does not require tedious per camera calibration procedures and is not adversely affected by interference from overlapping sensors. Extensive evaluations show we surpass the quality and overcome the limitations of current depth sensing technologies.
british machine vision conference | 2015
Anurag Arnab; Michael Sapienza; Stuart Golodetz; Julien P. C. Valentin; Ondrej Miksik; Shahram Izadi; Philip H. S. Torr
It is not always possible to recognise objects and infer material properties for a scene from visual cues alone, since objects can look visually similar whilst being made of very different materials. In this paper, we therefore present an approach that augments the available dense visual cues with sparse auditory cues in order to estimate dense object and material labels. Since estimates of object class and material properties are mutually informative, we optimise our multi-output labelling jointly using a random-field framework. We evaluate our system on a new dataset with paired visual and auditory data that we make publicly available. We demonstrate that this joint estimation of object and material labels significantly outperforms the estimation of either category in isolation.
international conference on image processing | 2016
Shrenik Lad; Bernardino Romera Paredes; Julien P. C. Valentin; Philip H. S. Torr; Devi Parikh
Learning attribute models for applications like Zero-Shot Learning (ZSL) and image search is challenging because they require attribute classifiers to generalize to test data that may be very different from the training data. A typical scenario is when the notion of an attribute may differ from one user to another, e.g. one user may find a shoe formal whereas another user may not. In this case, the distribution of labels at test time is different from that at training time. We argue that due to the uncertainty in what the test distribution might be, committing to one attribute model during training is not advisable. We propose a novel framework for attribute learning which involves training an ensemble of diverse models for attributes and identifying experts from them at test time given a small amount of personalized annotations from a user. Our approach for attribute personalization is not specific to any classification model and we show results using Random Forest and SVM ensembles. We experiment with 2 datasets: SUN Attributes and Shoes and show significant improvements over baselines.
international conference on 3d vision | 2016
Julien P. C. Valentin; Angela Dai; Matthias Niessner; Pushmeet Kohli; Philip H. S. Torr; Shahram Izadi; Cem Keskin