Victor Adrian Prisacariu
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor Adrian Prisacariu.
International Journal of Computer Vision | 2012
Victor Adrian Prisacariu; Ian D. Reid
We formulate a probabilistic framework for simultaneous region-based 2D segmentation and 2D to 3D pose tracking, using a known 3D model. Given such a model, we aim to maximise the discrimination between statistical foreground and background appearance models, via direct optimisation of the 3D pose parameters. The foreground region is delineated by the zero-level-set of a signed distance embedding function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities (as opposed to likelihoods). We derive the differentials of this energy with respect to the pose parameters of the 3D object, meaning we can conduct a search for the correct pose using standard gradient-based non-linear minimisation techniques. We propose novel enhancements at the pixel level based on temporal consistency and improved online appearance model adaptation. Furthermore, straightforward extensions of our method lead to multi-camera and multi-object tracking as part of the same framework. The parallel nature of much of the processing in our algorithm means it is amenable to GPU acceleration, and we give details of our real-time implementation, which we use to generate experimental results on both real and artificial video sequences, with a number of 3D models. These experiments demonstrate the benefit of using pixel-wise posteriors rather than likelihoods, and showcase the qualities, such as robustness to occlusions and motion blur (and also some failure modes), of our tracker.
computer vision and pattern recognition | 2013
Amaury Dame; Victor Adrian Prisacariu; Carl Yuheng Ren; Ian D. Reid
We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAM system with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining image data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yielding faster and more reliable convergence than when using 2D image data alone.
IEEE Transactions on Visualization and Computer Graphics | 2015
Olaf Kähler; Victor Adrian Prisacariu; Carl Yuheng Ren; Xin Sun; Philip H. S. Torr; David W. Murray
Volumetric methods provide efficient, flexible and simple ways of integrating multiple depth images into a full 3D model. They provide dense and photorealistic 3D reconstructions, and parallelised implementations on GPUs achieve real-time performance on modern graphics hardware. To run such methods on mobile devices, providing users with freedom of movement and instantaneous reconstruction feedback, remains challenging however. In this paper we present a range of modifications to existing volumetric integration methods based on voxel block hashing, considerably improving their performance and making them applicable to tablet computer applications. We present (i) optimisations for the basic data structure, and its allocation and integration; (ii) a highly optimised raycasting pipeline; and (iii) extensions to the camera tracker to incorporate IMU data. In total, our system thus achieves frame rates up 47 Hz on a Nvidia Shield Tablet and 910 Hz on a Nvidia GTX Titan XGPU, or even beyond 1.1 kHz without visualisation.
computer vision and pattern recognition | 2011
Victor Adrian Prisacariu; Ian D. Reid
We propose a novel nonlinear, probabilistic and variational method for adding shape information to level set-based segmentation and tracking. Unlike previous work, we represent shapes with elliptic Fourier descriptors and learn their lower dimensional latent space using Gaussian Process Latent Variable Models. Segmentation is done by a nonlinear minimisation of an image-driven energy function in the learned latent space. We combine it with a 2D pose recovery stage, yielding a single, one shot, optimisation of both shape and pose. We demonstrate the performance of our method, both qualitatively and quantitatively, with multiple images, video sequences and latent spaces, capturing both shape kinematics and object class variance.
Image and Vision Computing | 2012
Victor Adrian Prisacariu; Ian D. Reid
We propose a real-time model-based 3D hand tracker that combines image regions and the signal from an off-the-shelf 3-axis accelerometer placed on the users hand. The visual regions allow the tracker to cope with occlusions, motion blur and background clutter, while the latter aids with the inherent silhouette-pose ambiguities. The accelerometer and tracker are synchronised by casting the calibration problem as one of principal component analysis. Based on the assumption that, often, the number of possible hand configurations is limited by the activity the hand is engaging in, we use a multiclass pose classifier to distinguish between a number of activity dependent articulated hand configurations. We demonstrate the benefits of our method, both qualitatively and quantitatively, on a variety of video sequences and hand configurations and show a proof-of-concept human computer interface based on our system.
Computer Graphics Forum | 2015
Ming-Ming Cheng; Victor Adrian Prisacariu; Shuai Zheng; Philip H. S. Torr; Carsten Rother
Figure‐ground segmentation from bounding box input, provided either automatically or manually, has been extremely popular in the last decade and influenced various applications. A lot of research has focused on high‐quality segmentation, using complex formulations which often lead to slow techniques, and often hamper practical usage. In this paper we demonstrate a very fast segmentation technique which still achieves very high quality results. We propose to replace the time consuming iterative refinement of global colour models in traditional GrabCut formulation by a densely connected crf. To motivate this decision, we show that a dense crf implicitly models unnormalized global colour models for foreground and background. Such relationship provides insightful analysis to bridge between dense crf and GrabCut functional. We extensively evaluate our algorithm using two famous benchmarks. Our experimental results demonstrated that the proposed algorithm achieves an order of magnitude (10×) speed‐up with respect to the closest competitor, and at the same time achieves a considerably higher accuracy.
asian conference on computer vision | 2012
Victor Adrian Prisacariu; Aleksandr V. Segal; Ian D. Reid
We propose a novel framework for joint 2D segmentation and 3D pose and 3D shape recovery, for images coming from a single monocular source. In the past, integration of all three has proven difficult, largely because of the high degree of ambiguity in the 2D - 3D mapping. Our solution is to learn nonlinear and probabilistic low dimensional latent spaces, using the Gaussian Process Latent Variable Models dimensionality reduction technique. These act as class or activity constraints to a simultaneous and variational segmentation --- recovery --- reconstruction process. We define an image and level set based energy function, which we minimise with respect to 3D pose and shape, 2D segmentation resulting automatically as the projection of the recovered shape under the recovered pose. We represent 3D shapes as zero levels of 3D level set embedding functions, which we project down directly to probabilistic 2D occupancy maps, without the requirement of an intermediary explicit contour stage. Finally, we detail a fast, open-source, GPU-based implementation of our algorithm, which we use to produce results on both real and artificial video sequences.
international conference on computer vision | 2013
Carl Yuheng Ren; Victor Adrian Prisacariu; David W. Murray; Ian D. Reid
We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
international conference on computer vision | 2011
Victor Adrian Prisacariu; Ian D. Reid
We propose a method for simultaneous shape-constrained segmentation and parameter recovery. The parameters can describe anything from 3D shape to 3D pose and we place no restriction on the topology of the shapes, i.e. they can have holes or be made of multiple parts. We use Shared Gaussian Process Latent Variable Models to learn multimodal shape-parameter spaces. These allow non-linear embeddings of the high-dimensional shape and parameter spaces in low dimensional spaces in a fully probabilistic manner. We propose a method for exploring the multimodality in the joint space in an efficient manner, by learning a mapping from the latent space to a space that encodes the similarity between shapes. We further extend the SGP-LVM to a model that makes use of a hierarchy of embeddings and show that this yields faster convergence and greater accuracy over the standard non-hierarchical embedding. Shapes are represented implicitly using level sets, and inference is made tractable by compressing the level set embedding functions with discrete cosine transforms. We show state of the art results in various fields, ranging from pose recovery to gaze tracking and to monocular 3D reconstruction.
international symposium on mixed and augmented reality | 2013
Victor Adrian Prisacariu; Olaf Kähler; David W. Murray; Ian D. Reid
A novel framework for joint monocular 3D tracking and reconstruction is described that can handle untextured objects, occlusions, motion blur, changing background and imperfect lighting, and that can run at frame rate on a mobile phone. The method runs in parallel (i) level set based pose estimation and (ii) continuous max flow based shape optimisation. By avoiding a global computation of distance transforms typically used in level set methods, tracking rates here exceed 100Hz and 20Hz on a desktop and mobile phone, respectively, without needing a GPU. Tracking ambiguities are reduced by augmenting orientation information from the phones inertial sensor. Reconstruction involves probabilistic integration of the 2D image statistics from keyframes into a 3D volume. Per-voxel posteriors are used instead of the standard likelihoods, giving increased accuracy and robustness. Shape coherency and compactness is then imposed using a total variational approach solved using globally optimal continuous max flow.