Thomas A. Ciarfuglia
University of Perugia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas A. Ciarfuglia.
international conference on robotics and automation | 2016
Gabriele Costante; Michele Mancini; Paolo Valigi; Thomas A. Ciarfuglia
Visual ego-motion estimation, or briefly visual odometry (VO), is one of the key building blocks of modern SLAM systems. In the last decade, impressive results have been demonstrated in the context of visual navigation, reaching very high localization performance. However, all ego-motion estimation systems require careful parameter tuning procedures for the specific environment they have to work in. Furthermore, even in ideal scenarios, most state-of-the-art approaches fail to handle image anomalies and imperfections, which results in less robust estimates. VO systems that rely on geometrical approaches extract sparse or dense features and match them to perform frame-to-frame (F2F)xa0motion estimation. However, images contain much more information that can be used to further improve the F2F estimation. To learn new feature representation, a very successful approach is to use deep convolutional neural networks. Inspired by recent advances in deep networks and by previous work on learning methods applied to VO, we explore the use of convolutional neural networks to learn both the best visual features and the best estimator for the task of visual ego-motion estimation. With experiments on publicly available datasets, we show that our approach is robust with respect to blur, luminance, and contrast anomalies and outperforms most state-of-the-art approaches even in nominal conditions.
intelligent robots and systems | 2013
Gabriele Costante; Thomas A. Ciarfuglia; Paolo Valigi; Elisa Ricci
As researchers are striving for developing robotic systems able to move into the `the wild, the interest towards novel learning paradigms for domain adaptation has increased. In the specific application of semantic place recognition from cameras, supervised learning algorithms are typically adopted. However, once learning has been performed, if the robot is moved to another location, the acquired knowledge may be not useful, as the novel scenario can be very different from the old one. The obvious solution would be to retrain the model updating the robot internal representation of the environment. Unfortunately this procedure involves a very time consuming data-labeling effort at the human side. To avoid these issues, in this paper we propose a novel transfer learning approach for place categorization from visual cues. With our method the robot is able to decide automatically if and how much its internal knowledge is useful in the novel scenario. Differently from previous approaches, we consider the situation where the old and the novel scenario may differ significantly (not only the visual room appearance changes but also different room categories are present). Importantly, our approach does not require labeling from a human operator. We also propose a strategy for improving the performance of the proposed method by fusing two complementary visual cues. Our extensive experimental evaluation demonstrates the advantages of our approach on several sequences from publicly available datasets.
workshop on environmental energy and structural monitoring systems | 2012
Lorenzo Porzi; Elisa Ricci; Thomas A. Ciarfuglia; Michele Zanin
Augmented Reality (AR) aims to enhance a persons vision of the real world with useful information about the surrounding environment. Amongst all the possible applications, AR systems can be very useful as visualization tools for structural and environmental monitoring. While the large majority of AR systems run on a laptop or on a head-mounted device, the advent of smartphones have created new opportunities. One of the most important functionality of an AR system is the ability of the device to self localize. This can be achieved through visual odometry, a very challenging task for smartphone. Indeed, on most of the available smartphone AR applications, self localization is achieved through GPS and/or inertial sensors. Hence, developing an AR system on a mobile phone also poses new challenges due to the limited amount of computational resources. In this paper we describe the development of a egomotion estimation algorithm for an Android smartphone. We also present an approach based on an Extended Kalman Filter for improving localization accuracy integrating the information from inertial sensors. The implemented solution achieves a localization accuracy comparable to the PC implementation while running on an Android device.
intelligent robots and systems | 2016
Michele Mancini; Gabriele Costante; Paolo Valigi; Thomas A. Ciarfuglia
Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~ 300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation. We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images.
Robotics and Autonomous Systems | 2014
Thomas A. Ciarfuglia; Gabriele Costante; Paolo Valigi; Elisa Ricci
Visual Odometry (VO) is one of the fundamental building blocks of modern autonomous robot navigation and mapping. While most state-of-the-art techniques use geometrical methods for camera ego-motion estimation from optical flow vectors, in the last few years learning approaches have been proposed to solve this problem. These approaches are emerging and there is still much to explore. This work follows this track applying Kernel Machines to monocular visual ego-motion estimation. Unlike geometrical methods, learning-based approaches to monocular visual odometry allow issues like scale estimation and camera calibration to be overcome, assuming the availability of training data. While some previous works have proposed learning paradigms to VO, to our knowledge no extensive evaluation of applying kernel-based methods to Visual Odometry has been conducted. To fill this gap, in this work we consider publicly available datasets and perform several experiments in order to set a comparison baseline with traditional techniques. Experimental results show good performances of learning algorithms and set them as a solid alternative to the computationally intensive and complex to implement geometrical techniques. We stress the advantages of non-geometric (learned) VO as an alternative or an addition to standard geometric methods.Ego-motion is computed with state-of-the art regression techniques, namely Support Vector Machines (SVM) and Gaussian Processes (GP).To our knowledge this is the first time SVM have been applied to VO problem.We conduct extensive evaluation on three publicly available datasets, spanning both indoor and outdoor environments.The experiments show that non-geometric VO is a good alternative, or addition, to standard VO systems.
intelligent robots and systems | 2012
Thomas A. Ciarfuglia; Gabriele Costante; Paolo Valigi; Elisa Ricci
The place recognition module is a fundamental component in SLAM systems, as incorrect loop closures may result in severe errors in trajectory estimation. In the case of appearance-based methods the bag-of-words approach is typically employed for recognizing locations. This paper introduces a novel algorithm for improving loop closures detection performance by adopting a set of visual words weights, learned offline accordingly to a discriminative criterion. The proposed weights learning approach, based on the large margin paradigm, can be used for generic similarity functions and relies on an efficient online leaning algorithm in the training phase. As the computed weights are usually very sparse, a gain in terms of computational cost at recognition time is also obtained. Our experiments, conducted on publicly available datasets, demonstrate that the discriminative weights lead to loop closures detection results that are more accurate than the traditional bag-of-words method and that our place recognition approach is competitive with state-of-the-art methods.
international conference on robotics and automation | 2017
Michele Mancini; Gabriele Costante; Paolo Valigi; Thomas A. Ciarfuglia; Jeffrey A. Delmerico; Davide Scaramuzza
Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space, and power consumption. These represent highly desirable features, especially for microaerial vehicles. In order to guarantee robust operation in real-world scenarios, the estimator is required to generalize well in diverse environments. Most of the existent depth estimators do not consider generalization, and only benchmark their performance on publicly available datasets after specific fine tuning. Generalization can be achieved by training on several heterogeneous datasets, but their collection and labeling is costly. In this letter, we propose a deep neural network for scene depth estimation that is trained on synthetic datasets, which allow inexpensive generation of ground truth data. We show how this approach is able to generalize well across different scenarios. In addition, we show how the addition of long short-term memory layers in the network helps to alleviate, in sequential image streams, some of the intrinsic limitations of monocular vision, such as global scale estimation, with low computational overhead. We demonstrate that the network is able to generalize well with respect to different real-world environments without any fine tuning, achieving comparable performance to state-of-the-art methods on the KITTI dataset.
ieee international smart cities conference | 2016
Enrico Bellocchio; Gabriele Costante; Silvia Cascianelli; Paolo Valigi; Thomas A. Ciarfuglia
With this paper we present the SmartSEAL inter-connection system developed for the nationally founded SEAL project. SEAL is a research project aimed at developing Home Automation (HA) solutions for building energy management, user customization and improved safety of its inhabitants. One of the main problems of HA systems is the wide range of communication standards that commercial devices use. Usually this forces the designer to choose devices from a few brands, limiting the scope of the system and its capabilities. In this context, SmartSEAL is a framework that aims to integrate heterogeneous devices, such as sensors and actuators from different vendors, providing networking features, protocols and interfaces that are easy to implement and dynamically configurable. The core of our system is a Robotics middleware called Robot Operating System (ROS). We adapted the ROS features to the HA problem, designing the network and protocol architectures for this particular needs. These software infrastructure allows for complex HA functions that could be realized only levering the services provided by different devices. The system has been tested in our laboratory and installed in two real environments, Palazzo Fogazzaro in Schio and “Le Case” childhood school in Malo. Since one of the aim of the SEAL project is the personalization of the building environment according to the user needs, and the learning of their patterns of behaviour, in the final part of this work we also describe the ongoing design and experiments to provide a Machine Learning based re-identification module implemented with Convolutional Neural Networks (CNNs). The description of the adaptation module complements the description of the SmartSEAL system and helps in understanding how to develop complex HA services through it.
Robotics and Autonomous Systems | 2017
Silvia Cascianelli; Gabriele Costante; Enrico Bellocchio; Paolo Valigi; Mario Luca Fravolini; Thomas A. Ciarfuglia
Abstract Visual Self-localization in unknown environments is a crucial capability for an autonomous robot. Real life scenarios often present critical challenges for autonomous vision-based localization, such as robustness to viewpoint and appearance changes. To address these issues, this paper proposes a novel strategy that models the visual scene by preserving its geometric and semantic structure and, at the same time, improves appearance invariance through a robust visual representation. Our method relies on high level visual landmarks consisting of appearance invariant descriptors that are extracted by a pre-trained Convolutional Neural Network (CNN) on the basis of image patches. In addition, during the exploration, the landmarks are organized by building an incremental covisibility graph that, at query time, is exploited to retrieve candidate matching locations improving the robustness in terms of viewpoint invariance. In this respect, through the covisibility graph, the algorithm finds, more effectively, location similarities by exploiting the structure of the scene that, in turn, allows the construction of virtual locations i.e., artificially augmented views from a real location that are useful to enhance the loop closure ability of the robot. The proposed approach has been deeply analysed and tested in different challenging scenarios taken from public datasets. The approach has also been compared with a state-of-the-art visual navigation algorithm.
ieee international smart cities conference | 2016
Silvia Cascianelli; Gabriele Costante; Enrico Bellocchio; Paolo Valigi; Mario Luca Fravolini; Thomas A. Ciarfuglia
This paper provides a new contribution to the problem of vision-based place recognition introducing a novel appearance and viewpoint invariant approach that guarantees robustness with respect to perceptual aliasing and kidnapping. Most of the state-of-the-art strategies rely on low level visual features and ignore the semantical structure of the scene. Thus, even small changes in the appearance of the scene (e.g., illumination conditions) cause a significant performance drop. In contrast to previous work, we propose a new strategy to model the scene by preserving its geometrical and the semantical structure and, at the same time, achieving an improved appearance invariance through a robust visual representation. In particular, to manage the perceptual aliasing problem, we introduce a covisibility graph, that connects semantical entities of the scene preserving their geometrical relations. The method relies on high level patches consisting of dense and robust descriptors that are extracted by a Convolutional Neural Network (CNN). Through the graph structure, we are able to efficiently retrieve candidate locations and to synthesize virtual locations (i.e., artificial intermediate views between two keyframes) to improve the viewpoint invariance. The proposed approach has been compared with state-of-the-art approaches in different challenging scenarios taken from public datasets.