Matteo Tomasi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matteo Tomasi is active.

Explore More

Publication

Featured researches published by Matteo Tomasi.

IEEE Transactions on Computers | 2012

A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features

Karl Pauwels; Matteo Tomasi; Javier Díaz Alonso; Eduardo Ros; M.M. Van Hulle

Low-level computer vision algorithms have extreme computational requirements. In this work, we compare two real-time architectures developed using FPGA and GPU devices for the computation of phase-based optical flow, stereo, and local image features (energy, orientation, and phase). The presented approach requires a massive degree of parallelism to achieve real-time performance and allows us to compare FPGA and GPU design strategies and trade-offs in a much more complex scenario than previous contributions. Based on this analysis, we provide suggestions to real-time system designers for selecting the most suitable technology, and for optimizing system development on this platform, for a number of diverse applications.

IEEE Transactions on Very Large Scale Integration Systems | 2012

Parallel Architecture for Hierarchical Optical Flow Estimation Based on FPGA

Francisco Barranco; Matteo Tomasi; Javier Díaz; Mauricio Vanegas; Eduardo Ros

The proposed work presents a highly parallel architecture for motion estimation. Our system implements the well-known Lucas and Kanade algorithm with the multi-scale extension for the computation of large motion estimations in a dedicated device [field-programmable gate array (FPGA)]. Our system achieves 270 frames per second for a 640 × 480 resolution in the best case of the mono-scale implementation and 32 frames per second for the multi-scale one, fulfilling the requirements for a real-time system. We describe the system architecture, address the evaluation of the accuracy with well-known benchmark sequences (including a comparative study), and show the main hardware resources used.

IEEE Transactions on Circuits and Systems for Video Technology | 2010

High-Performance Optical-Flow Architecture Based on a Multi-Scale, Multi-Orientation Phase-Based Model

Matteo Tomasi; Mauricio Vanegas; Francisco Barranco; Javier Díaz; Eduardo Ros

The accurate estimation of optical flow is a problem widely experienced in computer vision and researchers in this field are devoting their efforts to formulate reliable and robust algorithms for real life applications. These approaches need to be evaluated, especially in controlled scenarios. Because of their stability phase-based methods have generally been adopted in the various techniques developed to date, although it is still difficult to be sure of their viability in real-time systems due to their high requirements in terms of computational load. We describe here the implementation of a phase-based optical flow in a field-programmable gate array (FPGA) device. The system benefits from phase-information stability as well as sub-pixel accuracy without requiring additional computations and at the same time achieves high-performance computation by taking full advantage of the parallel processing resources of FPGA devices. Furthermore, the architecture extends the implementation to a multi-resolution and multi-orientation implementation, which enhances its accuracy and covers a wide range of detected velocities. Deep pipelined datapath architecture with superscalar computing units at different stages allows real-time processing beyond VGA image resolution. The final circuit is of significant complexity and useful for a wide range of fields requiring portable optical-flow processing engines.

IEEE Transactions on Circuits and Systems for Video Technology | 2012

Massive Parallel-Hardware Architecture for Multiscale Stereo, Optical Flow and Image-Structure Computation

Matteo Tomasi; Mauricio Vanegas; Francisco Barranco; Javier Daz; Eduardo Ros

Low-level vision tasks pose an outstanding challenge in terms of computational effort: pixel-wise operations require high-performance architectures to achieve real-time processing. Nowadays, diverse technologies permit a high level of parallelism and in this way researchers can address more and more complex on-chip low-level vision-feature extraction. In the state of the art, different architectures have been described that process single vision modes in real time but multiple computer vision modes are seldom conjointly computed on a single device to produce a general-purpose on-chip low-level vision system: this may be the basis for mid-level or high-level vision tasks. We present here a novel architecture for multiple-vision feature extraction that includes multiscale optical flow, disparity, energy, orientation, and phase. A high degree of robustness in real-life situations is obtained thanks to adopting phase-based models (at the cost of relatively high computing resource requirements). The high flexibility of the reconfigurable devices used allows for the exploration of different hardware configurations to deal with final target and user requirements. Making use of this novel architecture and hardware-sharing techniques we describe a co-processing board implementation as a case study. It reaches an outstanding computing power of 92.3 GigaOPS at very low power consumption (approximately 12.9 GigaOPS/W).

Journal of Systems Architecture | 2010

Multi-port abstraction layer for FPGA intensive memory exploitation applications

Mauricio Vanegas; Matteo Tomasi; Javier Díaz; Eduardo Ros

We describe an efficient, high-level abstraction, multi-port memory-control unit (MCU) capable of providing data at maximum throughput. This MCU has been developed to take full advantage of FPGA parallelism. Multiple parallel processing entities are possible in modern FPGA devices, but this parallelism is lost when they try to access external memories. To address the problem of multiple entities accessing shared data we propose an architecture with multiple abstract access ports (AAPs) to access one external memory. Bearing in mind that hardware designs in FPGA technology are generally slower than memory chips, it is feasible to build a memory access scheduler by using a suitable arbitration scheme based on a fast memory controller with AAPs running at slower frequencies. In this way, multiple processing units connected through the AAPs can make memory transactions at their slower frequencies and the memory access scheduler can serve all these transactions at the same time by taking full advantage of the memory bandwidth.

IEEE Transactions on Very Large Scale Integration Systems | 2012

Real-Time Architecture for a Robust Multi-Scale Stereo Engine on FPGA

Matteo Tomasi; Mauricio Vanegas; Francisco Barranco; Javier Díaz; Eduardo Ros

In this work, we present a real-time implementation of a stereo algorithm on field-programmable gate array (FPGA). The approach is a phase-based model that allows computation with sub-pixel accuracy. The algorithm uses a robust multi-scale and multi-orientation method that optimizes the estimation extraction with respect to the local image structure support. With respect to the state of the art, our work increases the on-chip power of computation compared to previous approaches in order to obtain a good accuracy of results with a large disparity range. In addition, our approach is specially suited for unconstrained environments applications thanks to the robustness of the phase information, capable of dealing with severe illumination changes and with small affine deformation between the image pair. This work also includes the rectification images circuitry in order to exploit the epipolar constraints on the chip. The dedicated circuit can rectify and process images of VGA resolution at a frame rate of 57 fps. The implementation uses a fine pipelined method (also with superscalar units) and multiple user defined parameters that lead to a high working frequency and a good adaptability to different scenarios. In the paper, we present different results and we compare them with state of the art approaches.

Journal of Systems Architecture | 2010

Fine grain pipeline architecture for high performance phase-based optical flow computation

Matteo Tomasi; Francisco Barranco; Mauricio Vanegas; Javier Díaz; Eduardo Ros

Accurate motion analysis of real life sequences is a very active research field due to its multiple potential applications. Currently, new technologies offer us very fast and accurate sensors that provide a huge quantity of data per second. Processing these data streams is very expensive (in terms of computing power) for general purpose processors and therefore, is beyond processing capabilities of most current embedded devices. In this work, we present a specific hardware architecture that implements a robust optical flow algorithm able to process input video sequences at a high frame rate and high resolution, up to 160fps for VGA images. We describe a superpipelined datapath of more than 85 stages (some of them configured with superscalar units able to process several data in parallel). Therefore, we have designed an intensive parallel processing engine. System speed (frames per second) produces fine optical flow estimations (by constraining the actual motion ranges between consecutive frames) and the phase-based method confers the system robustness to image noise or illumination changes. In this work, we analyze the architecture of different frame rates and input image noise levels. We compare the results with other approaches in the state of the art and validate our implementation using several hardware platforms.

computer vision and pattern recognition | 2013

Collision Detection for Visually Impaired from a Body-Mounted Camera

Shrinivas Pundlik; Matteo Tomasi; Gang Luo

A real-time collision detection system using a body-mounted camera is developed for visually impaired and blind people. The system computes sparse optical flow in the acquired videos, compensates for camera self-rotation using external gyro-sensor, and estimates collision risk in local image regions based on the motion estimates. Experimental results for a variety of scenarios involving static and dynamic obstacles are shown in terms of time-to-collision and obstacle localization in test videos. The proposed approach is successful in estimating collision risk for head-on obstacles as well as obstacles that are close to the walking paths of the user. An end-to-end collision warning system based on inputs from a video camera as well as a gyro-sensor has been implemented on a generic laptop and on an embedded OMAP-3 compatible platform. The proposed embedded system represents a valuable contribution toward the development of a portable vision aid for visually impaired and blind patients.

international symposium on industrial electronics | 2010

A novel architecture for a massively parallel low level vision processing engine on chip

Matteo Tomasi; Mauricio Vanegas; Francisco Barranco; Javier Díaz; Eduardo Ros

Specific architectures for different low level vision modalities have been developed and described using reconfigurable hardware. Each of them tries to solve a single low level vision problem: optical flow, disparity, segmentation, tracking, etc. We introduce a novel architecture that includes multiple processing engines in a massively parallel low level vision processing engine of very high complexity and performance. Our design is able to process input images and extract at the same time different visual features such as multi-scale stereo, optical flow and local contrast descriptors such as local orientation, energy or phase. The latest hardware design techniques have been employed in order to achieve the presented system with more than 2000 basic processing elements running in parallel. We have based our system in a Harmonic filter image decomposition model based on Gabor-like filters. It has been validated in multiple scenarios in previous works and it allows sharing hardware resources among different vision modalities on the same chip. In this paper we present an FPGA based implementation of this intensive processing engine as well as the designing techniques employed. The circuit processes input frames of 512×512 pixels at 28 frames per second.

Digital Signal Processing | 2013

Pipelined architecture for real-time cost-optimized extraction of visual primitives based on FPGAs

Francisco Barranco; Matteo Tomasi; Javier Díaz; Mauricio Vanegas; Eduardo Ros

This paper presents an architecture for the extraction of visual primitives on chip: energy, orientation, disparity, and optical flow. This cost-optimized architecture processes in real time high-resolution images for real-life applications. In fact, we present a versatile architecture that may be customized for different performance requirements depending on the target application. In this case, dedicated hardware and its potential on-chip implementation on FPGA devices become an efficient solution. We have developed a multi-scale approach for the computation of the gradient-based primitives. Gradient-based methods are very popular in the literature because they provide a very competitive accuracy vs. efficiency trade-off. The hardware implementation of the system is performed using superscalar fine-grain pipelines to exploit the maximum degree of parallelism provided by the FPGA. The system reaches 350 and 270 VGA frames per second (fps) for the disparity and optical flow computations respectively in their mono-scale version and up to 32 fps for the multi-scale scheme extracting all the described features in parallel. In this work we also analyze the performance in accuracy and hardware resources of the proposed implementation.

Explore More