Matteo Poggi
University of Bologna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matteo Poggi.
international conference on 3d vision | 2016
Matteo Poggi; Stefano Mattoccia
Inferring dense depth from stereo is crucial for several computer vision applications and Semi Global Matching (SGM) is often the preferred choice due to its good tradeoff between accuracy and computation requirements. Nevertheless, it suffers of two major issues: streaking artifacts caused by the Scanline Optimization (SO) approach, at the core of this algorithm, may lead to inaccurate results and the high memory footprint that may become prohibitive with high resolution images or devices with constrained resources. In this paper, we propose a smart scanline aggregation approach for SGM aimed at dealing with both issues. In particular, the contribution of this paper is threefold: i) leveraging on machine learning, proposes a novel generalpurpose confidence measure suited for any for stereo algorithm, based on O(1) features, that outperforms state of-the-art ii) taking advantage of this confidence measure proposes a smart aggregation strategy for SGM enabling significant improvements with a very small overhead iii) the overall strategy drastically reduces the memory footprint of SGM and, at the same time, improves its effectiveness and execution time. We provide extensive experimental results, including a cross-validation with multiple datasets (KITTI 2012, KITTI 2015 and Middlebury 2014).
international conference on distributed smart cameras | 2015
Stefano Mattoccia; Matteo Poggi
In this paper we describe the strategy adopted to design, from scratch, an embedded RGBD sensor for accurate and dense depth perception on a low-cost FPGA. This device infers, at more than 30 Hz, dense depth maps according to a state-of-the-art stereo vision processing pipeline entirely mapped into the FPGA without buffering partial results on external memories. The strategy outlined in this paper enables accurate depth computation with a low latency and a simple hardware design. On the other hand, it poses major constraints to the computing structure of the algorithms that fit with this simplified architecture and thus, in this paper, we discuss the solutions devised to overcome these issues. We report experimental results concerned with practical application scenarios in which the proposed RGBD sensor provides accurate and real-time depth sensing suited for the embedded vision domain.
international conference on 3d vision | 2016
Matteo Poggi; Stefano Mattoccia
Stereo matching is a popular technique to infer depth from two or more images and wealth of methods have been proposed to deal with this problem. Despite these efforts, finding accurate stereo correspondences is still an open problem. The strengths and weaknesses of existing methods are often complementary and in this paper, motivated by recent trends in this field, we exploit this fact by proposing Deep Stereo Fusion, a Convolutional Neural Network capable of combining the output of multiple stereo algorithms in order to obtain more accurate result with respect to each input disparity map. Deep Stereo Fusion process a 3D features vector, encoding both spatial and cross-algorithm information, in order to select the best disparity hypothesis among those proposed by the single stereo matchers. To the best of our knowledge, our proposal is the first i) to leverage on deep learning and ii) able to predict the optimal disparity assignments by taking only as input cue the disparity maps. This second feature makes our method suitable for deployment even when other cues (e.g., confidence) are not available such as when dealing with disparity maps provided by off-the-shelf 3D sensors. We thoroughly evaluate our proposal on the KITTI stereo benchmark with respect state-of-the-art in this field.
british machine vision conference | 2016
Matteo Poggi; Stefano Mattoccia
Stereo vision is a popular technique to infer depth from two or more images. In this field, confidence measures, typically obtained from the analysis of the cost volume, aim at detecting uncertain disparity assignments. As recently proved, multiple confidence measures combined with hand-crafted features extracted from the cost volume can be used also for other purposes and in particular to improve the overall disparity accuracy leveraging on machine learning techniques. In this paper, starting from the observation that recurrent local patterns occurring in the disparity maps can tell a correct assignment from a wrong one, we follow a completely different methodology to infer a novel confidence measure from scratch. Specifically, leveraging on Convolutional Neural Networks, we pose the confidence formulation as a regression problem by analyzing the disparity map provided by a stereo vision system. Once trained on a subset of the KITTI 2012 dataset with the disparity maps provided by the simple block-matching algorithm, our confidence measure outperforms state-of-the-art with two datasets (KITTI 2015 and Middlebury 2014) as well as with two stereo algorithms. The experimental evaluation reported clearly highlights that our approach is capable to better generalize its behavior in different circumstances with respect to state-of-the-art. Finally, not being based on cost volume analysis, our proposal is also potentially suited for out-of-the-box depth generation devices which usually do not expose the cues required by top-performing approaches.
international conference on image analysis and processing | 2015
Matteo Poggi; Luca Nanni; Stefano Mattoccia
In smart-cities, computer vision has the potential to dramatically improve the quality of life of people suffering of visual impairments. In this field, we have been working on a wearable mobility aid aimed at detecting in real-time obstacles in front of a visually impaired. Our approach relies on a custom RGBD camera, with FPGA on-board processing, worn as traditional eyeglasses and effective point-cloud processing implemented on a compact and lightweight embedded computer. This latter device also provides feedback to the user by means of an haptic interface as well as audio messages. In this paper we address crosswalk recognition that, as pointed out by several visually impaired users involved in the evaluation of our system, is a crucial requirement in the design of an effective mobility aid. Specifically, we propose a reliable methodology to detect and categorize crosswalks by leveraging on point-cloud processing and deep-learning techniques. The experimental results reported, on 10000+ frames, confirm that the proposed approach is invariant to head/camera pose and extremely effective even when dealing with large occlusions typically found in urban environments.
international symposium on computers and communications | 2016
Matteo Poggi; Stefano Mattoccia
In this paper we propose an effective and wearable mobility aid for people suffering of visual impairments purely based on 3D computer vision and machine learning techniques. By wearing our device the users can perceive, guided by audio messages and tactile feedback, crucial information concerned with the surrounding environment and hence avoid obstacles along the path. Our proposal can work in synergy with the white cane and allows for very effective and real-time obstacle detection on an embedded computer, by processing the point-cloud provided by a custom RGBD sensor, based on passive stereo vision. Moreover, our system, leveraging on deep-learning techniques, enables to semantically categorize the detected obstacles in order to increase the awareness of the explored environment. It can optionally work in synergy with a smartphone, wirelessly connected to the the proposed mobility aid, exploiting its audio capability and standard GPS-based navigation tools such as Google Maps. The overall system can operate in real-time for hours using a small battery, making it suitable for everyday life. Experimental results confirmed that our proposal has excellent obstacle detection performance and has a promising semantic categorization capability.
computer vision and pattern recognition | 2017
Matteo Poggi; Stefano Mattoccia
Confidence measures estimate unreliable disparity assignments performed by a stereo matching algorithm and, as recently proved, can be used for several purposes. This paper aims at increasing, by means of a deep network, the effectiveness of state-of-the-art confidence measures exploiting the local consistency assumption. We exhaustively evaluated our proposal on 23 confidence measures, including 5 top-performing ones based on random-forests and CNNs, training our networks with two popular stereo algorithms and a small subset (25 out of 194 frames) of the KITTI 2012 dataset. Experimental results show that our approach dramatically increases the effectiveness of all the 23 confidence measures on the remaining frames. Moreover, without re-training, we report a further cross-evaluation on KITTI 2015 and Middlebury 2014 confirming that our proposal provides remarkable improvements for each confidence measure even when dealing with significantly different input data. To the best of our knowledge, this is the first method to move beyond conventional pixel-wise confidence estimation.
international conference on image analysis and processing | 2017
Matteo Poggi; Fabio Tosi; Stefano Mattoccia
The advent of embedded stereo cameras based on low-power and compact devices such as FPGAs (Field Programmable Gate Arrays) has enabled to effectively address several computer vision problems. However, being the depth data generated by stereo algorithms affected by errors, reliable strategies to detect wrong disparity assignments by means of confidence measures are desirable. Recent works proved that confidence measures are also a powerful cue to improve the overall accuracy of stereo. Most approaches aimed at predicting match reliability rely on cost volume analysis, an information seldom available as output of most embedded depth sensors. Therefore, in this paper we analyze and evaluate strategies compatible with the constraints of embedded stereo cameras. In particular, we focus our attention on methods to infer match reliability inside depth sensors based on highly constrained computing architectures such as FPGAs. We quantitatively assess, on Middlebury 2014 and KITTI 2015 datasets, the impact of different design strategies for 16 confidence measures from the literature, suited for implementation on such embedded systems. Our evaluation shows that, compared to the confidence measures typically deployed in this context and based on storing intermediate results, other approaches yield much more accurate predictions with negligible computing requirements and memory footprint. This enables for their implementation even on highly constrained architectures.
computer vision and pattern recognition | 2017
Matteo Poggi; Fabio Tosi; Stefano Mattoccia
Confidence measures aim at discriminating unreliable disparities inferred by a stereo vision system from reliable ones. A common and effective strategy adopted by most top-performing approaches consists in combining multiple confidence measures by means of an appropriately trained random-forest classifier. In this paper, we propose a novel approach by training an n-channel convolutional neural network on a set of feature maps, each one encoding the outcome of a single confidence measure. This strategy enables to move the confidence prediction problem from the conventional 1D feature maps domain, adopted by approaches based on random-forests, to a more distinctive 3D domain, going beyond single pixel analysis. This fact, coupled with a deep network appropriately trained on a small subset of images, enables to outperform top-performing approaches based on random-forests.
international conference on d imaging | 2016
Matteo Poggi; Stefano Mattoccia
Inferring dense depth from stereo is crucial for several computer vision applications and stereo cameras based on embedded systems and/or reconfigurable devices such as FPGA became quite popular in the past years. In this field Semi Global Matching (SGM) is, in most cases, the preferred algorithm due to its good trade-off between accuracy and computation requirements. Nevertheless, a careful design of the processing pipeline enables significant improvements in terms of disparity map accuracy, hardware resources and frame rate. In particular factors like the amount of matching costs and parameters, such as the number/selection of scanlines, and so on have a great impact on the overall resource requirements. In this paper we evaluate different variants of the SGM algorithm suited for implementation on embedded or reconfigurable devices looking for the best compromise in terms of resource requirements, accuracy of the disparity estimation and running time. To assess quantitatively the effectiveness of the considered variants we adopt the KITTI 2015 training dataset, a challenging and standard benchmark with ground truth containing several realistic scenes.