Tommaso Cavallari
University of Bologna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tommaso Cavallari.
international conference on computer vision | 2015
Nicholas Brunetto; Samuele Salti; Nicola Fioraio; Tommaso Cavallari; Luigi Di Stefano
Simultaneous Localization and Mapping (SLAM) algorithms have been recently deployed on mobile devices, where they can enable a broad range of novel applications. Nevertheless, pure visual SLAM is inherently weak at operating in environments with a reduced number of visual features. Indeed, even many recent proposals based on RGB-D sensors cannot handle properly such scenarios, as several steps of the algorithms are based on matching visual features. In this work we propose a framework suitable for mobile platforms to fuse pose estimations attained from visual and inertial measurements, with the aim of extending the range of scenarios addressable by mobile visual SLAM. The framework deploys an array of Kalman filters where the careful selection of the state variables and the preprocessing of the inertial sensor measurements result in a simple and effective data fusion process. We present qualitative and quantitative experiments to show the improved SLAM performance delivered by the proposed approach.
computer vision and pattern recognition | 2017
Tommaso Cavallari; Stuart Golodetz; Nicholas A. Lord; Julien P. C. Valentin; Luigi Di Stefano; Philip H. S. Torr
Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known poses coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the camera pose. Recently, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but must be trained offline on the target scene, preventing relocalisation in new environments. In this paper, we show how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. Our adapted forests achieve relocalisation performance that is on par with that of offline forests, and our approach runs in under 150ms, making it desirable for real-time systems that require online relocalisation.
intelligent robots and systems | 2014
Federico Tombari; Nicola Fioraio; Tommaso Cavallari; Samuele Salti; Alioscia Petrelli; Luigi Di Stefano
This work aims at automatic detection of man-made pole-like structures in scans of urban environments acquired by a 3D sensor mounted on top a moving vehicle. Pole-like structures, such as e.g. road signs and streetlights, are widespread in these environments, and their reliable detection is relevant to applications dealing with autonomous navigation, facility damage detection, city planning and maintenance. Yet, due to the characteristic thin shape, detection of man-made pole-like structures is significantly prone to both noise as well as occlusions and clutter, the latter being pervasive nuisances when scanning urban environments. Our approach is based on a “local” stage, whereby local features are classified and clustered together, followed by a “global” stage aimed at further classification of candidate entities. The proposed pipeline turns out effective in experiments on a standard publicly available dataset as well as on a challenging dataset acquired during the project for validation purposes.
pacific rim symposium on image and video technology | 2015
Tommaso Cavallari; Luigi Di Stefano
Research works on the two topics of Semantic Segmentation and SLAM Simultaneous Localization and Mapping have been following separate tracks. Here, we link them quite tightly by delineating a category label fusion technique that allows for embedding semantic information into the dense map created by a volume-based SLAM algorithm such as KinectFusion. Accordingly, our approach is the first to provide a semantically labeled dense reconstruction of the environment from a stream of RGB-D images. We validate our proposal using a publicly available semantically annotated RGB-D dataset and a employing ground truth labels, b corrupting such annotations with synthetic noise, ci¾?deploying a state of the art semantic segmentation algorithm based on Convolutional Neural Networks.
european conference on computer vision | 2016
Tommaso Cavallari; Luigi Di Stefano
Recent research towards 3D reconstruction has delivered reliable and fast pipelines to obtain accurate volumetric maps of large environments. Alongside, we witness dramatic improvements in the field of semantic segmentation of images due to deployment of deep learning architectures. In this paper, we pursue bridging the semantic gap of purely geometric representations by leveraging on a SLAM pipeline and a deep neural network so to endow surface patches with category labels. In particular, we present the first system that, based on the input stream provided by a commodity RGB-D sensor, can deliver interactively and automatically a map of a large scale environment featuring both geometric as well as semantic information. We also show how the significant computational cost inherent to deployment of a state-of-the-art deep network for semantic labeling does not hinder interactivity thanks to suitable scheduling of the workload on an off-the-shelf PC platform equipped with two GPUs.
european conference on computer vision | 2016
Tommaso Cavallari; Luigi Di Stefano
Kick-started by deployment of the well-known KinectFusion, recent research on the task of RGBD-based dense volume reconstruction has focused on improving different shortcomings of the original algorithm. In this paper we tackle two of them: drift in the camera trajectory caused by the accumulation of small per-frame tracking errors and lack of semantic information within the output of the algorithm. Accordingly, we present an extended KinectFusion pipeline which takes into account per-pixel semantic labels gathered from the input frames. By such clues, we extend the memory structure holding the reconstructed environment so to store per-voxel information on the kinds of object likely to appear in each spatial location. We then take such information into account during the camera localization step to increase the accuracy in the estimated camera trajectory. Thus, we realize a SemanticFusion loop whereby per-frame labels help better track the camera and successful tracking enables to consolidate instantaneous semantic observations into a coherent volumetric map.
arXiv: Computer Vision and Pattern Recognition | 2017
Victor Adrian Prisacariu; Olaf Kähler; Stuart Golodetz; Michael Sapienza; Tommaso Cavallari; Philip H. S. Torr; David W. Murray
arXiv: Computer Vision and Pattern Recognition | 2018
Tommaso Cavallari; Stuart Golodetz; Nicholas A. Lord; Julien P. C. Valentin; Victor Adrian Prisacariu; Luigi Di Stefano; Philip H. S. Torr
IEEE Transactions on Visualization and Computer Graphics | 2018
Stuart Golodetz; Tommaso Cavallari; Nicholas A. Lord; Victor Adrian Prisacariu; David W. Murray; Philip H. S. Torr
international conference on computer vision | 2017
Daniele De Gregorio; Tommaso Cavallari; Luigi Di Stefano