Roberto Airoldi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Airoldi is active.

Explore More

Publication

Featured researches published by Roberto Airoldi.

norchip | 2009

A reconfigurable SoC tailored to Software Defined Radio applications

Fabio Garzia; Waqar Hussain; Roberto Airoldi; Jari Nurmi

This paper presents the mapping of SDR applications on a reconfigurable SoC, based on a run-time reconfigurable coarse-grain accelerator called CREMA. CREMA is characterized by mapping adaptiveness, meaning that its architecture is specified according to the needs of the application mapped on it. CREMA is used to accelerate two kernels used in SDR applications: correlations for synchronization purposes and FFT for the OFDM modulation/demodulation. In both cases we show that the implementation on CREMA is 4X faster that a similar implementation on a general-purpose coarse-grain accelerator, and that its resource occupation is reduced by 4.5X.

international symposium on system-on-chip | 2010

Homogeneous MPSoC as baseband signal processing engine for OFDM systems

Roberto Airoldi; Fabio Garzia; Omer Anjum; Jari Nurmi

This paper presents a homogeneous Multi-Processor System-on-Chip (MPSoC) as baseband signal processing engine for software defined radio applications. The implementation and parallelisation of a generic OFDM system is presented taking as study case the physical layer of the IEEE 802.11a standard. The MPSoC is composed of nine computational nodes connected in a mesh topology through a hierarchical network-on-chip. Each node hosts a COFFEE RISC processor as processing element. The architecture was prototyped on an ALTERA STRATIX IV FPGA working at a maximum frequency of 180 MHz.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Improving Reconfigurable Hardware Energy Efficiency and Robustness via DVFS-Scaled Homogeneous MP-SoC

Roberto Airoldi; Fabio Garzia; Jari Nurmi

This paper presents the study of Dynamic Voltage and Frequency Scaling (DVFS) technique applied to an existing multi-core architecture composed of 9 computational nodes interconnected by a hierarchical Network-on-Chip. The architecture was synthesized and characterized in area/power utilizing 65nm standard cell technology. For the analysis of the achievable energy/power saving, a representative algorithm from wireless communications was utilized as test case. Energy and power reduction results achieved with DVFS were then compared to the ones obtainable via clock gating. The results show that DVFS guarantees higher energy savings than clock gating. Moreover, when considering power consumption DVFS improves the system performance by a factor of 3 when compared to clock gating, improving hardware robustness to soft errors related to power integrity phenomena.

international symposium on microarchitecture | 2010

Energy-Efficient Fast Fourier Transforms for Cognitive Radio Systems

Roberto Airoldi; Omer Anjum; Fabio Garzia; Jari Nurmi; Alexander M. Wyglinski

An energy-efficient fast Fourier transform (FFT) algorithm for cognitive radio communication systems uses a homogeneous multiprocessor system on chip. The algorithm allows for pruning of inputs such that algorithm complexity can be reduced whenever several of the FFT inputs are zero. Results show that the pruning algorithm significantly reduces energy consumption compared to a nonpruned version.

international conference on embedded computer systems architectures modeling and simulation | 2013

A scalable FFT processor architecture for OFDM based communication systems

Deepak Revanna; Omer Anjum; Manuele Cucchi; Roberto Airoldi; Jari Nurmi

The modern wireless standards predominantly are based on OFDM communication systems. Various mobile devices in recent times support multiple wireless standards and demand efficient transceiver. Hence, in a communication transceiver the baseband hardware needs to be scalable and efficient across multiple standards. In an OFDM based transceiver, FFT computation is one of the most computationally intensive and power hungry modules. Design of FFT hardware is a challenging task while balancing design parameters such as speed, power, area, flexibility and scalability. The research work in this paper proposes a scalable radix-2 N-point novel FFT processor architecture. The architecture design is based on an approach to balance various specified design parameters to meet the requirements of SDR platforms supporting multiple wireless standards. The FFT processor was designed and prototyped using VHDL on an Altera Stratix V FPGA device 5SGSMD5K2F40C2. The processor operates at a maximum frequency of 200MHz, uses less than 1% of FPGA device resources and meets the performance requirements of multiple wireless standards such as IEEE 802.11a/g, IEEE 802.16e, 3GPP-LTE, DAB and DVB. The proposed architecture outperforms the existing fixed and variable length FFT processors in terms of speed, flexibility and scalability.

international conference on parallel processing | 2010

FFT Algorithms Evaluation on a Homogeneous Multi-processor System-on-Chip

Roberto Airoldi; Fabio Garzia; Jari Nurmi

This paper presents the evaluation of radix-2, radix-4 and radix-8 algorithms for N-point FFTs on a homogeneous Multi-Processor System-on-Chip, prototyped on FPGA device. The evaluation of the algorithms was done analysing profiling of the algorithms in comparison to a single processor architecture. The performance were evaluated in terms of required clock cycles, achieved speed-up and parallelization efficiency. The analysis showed for each algorithm how the parallelization efficiency grows moving from small to larger FFTs. Moreover the comparison between the different implementations showed the parallelization properties of each algorithm. Radix-2 algorithm shows the best speed-up and parallelization efficiency while radix-4 gives the best performance in terms of required clock cycles.

international symposium on system-on-chip | 2009

Mapping of the FFT on a reconfigurable architecture targeted to SDR applications

Fabio Garzia; Roberto Airoldi; Jari Nurmi; Carmelo Giliberto; Claudio Brunelli

This paper describes the implementation of a FFT on a system based on a GP core and a reconfigurable coarse-grain accelerator. The entire system has been prototyped on an Altera Stratix II device. On the prototype a 1024-point FFT gives a 40X speed-up in comparison with the software implementation. The 1024-point FFT is executed in 400μβ. Considering an ASIC synthesis of the coarse-grain array, the 1024-point FFT is executed in 42μβ, against the 104μβ of a DSP implementation.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2009

Implementation of W-CDMA Cell Search on a FPGA Based Multi-Processor System-on-Chip with Power Management

Roberto Airoldi; Fabio Garzia; Tapani Ahonen; Dragomir Milojevic; Jari Nurmi

In this paper we describe a general purpose, homogeneous Multi-Processor System-on-Chip (MPSoC) based on 9 processing clusters using COFFEE RISC processors and a hierarchical Network-on-Chip implemented on an FPGA device. The MPSoC platform integrates a cluster clock gating technique, enabling independent core and memory sleep modes. Low cluster turn-on delay allows frequent use of such technique, resulting in power savings. In order to quantify the performance of the proposed platform and the reduction of power consumption, we implement Target Cell Search part of the WCDMA, a well known SDR application. We show that the proposed MPSoC platform achieves an important speed-up (7.3X ) when compared to comparable single processor platform. We also show that a significant reduction in dynamic power consumption can be achieved (50% for the complete application) using the proposed cluster clock-gating technique.

conference on ph.d. research in microelectronics and electronics | 2009

Implementation of a 64-point FFT on a Multi-Processor System-on-Chip

Roberto Airoldi; Fabio Garzia; Jari Nurmi

This paper describes the implementation of a 64-point FFT on a Multi-Processor System-on-Chip (MPSoC) composed of 9 homogeneous clusters. Each cluster is built around a RISC processor. The implementation technique adopted for the mapping of the FFT produces a speed-up of 6x which is close to the theoretical limit. This is due to a reduced overhead of intra-clusters communication.

signal processing systems | 2016

HARP2: An X-Scale Reconfigurable Accelerator-Rich Platform for Massively-Parallel Signal Processing Algorithms

Waqar Hussain; Roberto Airoldi; Henry Hoffmann; Tapani Ahonen; Jari Nurmi

This paper presents design, development and evaluation of an eXtra-large Scale, Homogeneous and a Heterogeneous Accelerator-Rich Platform (HARP2) for massively parallel signal processing algorithms. HARP is an integrated platform of multiple Coarse-Grained Reconfigurable Arrays (CGRAs) over a Network-on-Chip (NoC) where each CGRA is scaled and tailored for a specific application. The architecture of the NoC consists of nine nodes in a topology of 3-rows × 3-columns and acts as backbone of communication between different CGRAs. In this experimental work, the HARP template is used to instantiate a homogeneous (HARP-hom) and a heterogeneous (HARP-het) platform. The HARP-het is generated for a proof-of-concept test to verify the design and functionality of HARP. It also provides insight to many features of the design and evaluation in terms of different performance metrics. The other version (HARP-hom) is instantiated for a relatively realistic design problem, i.e., satisfying the execution-time constraints imposed on Fast Fourier Transform processing in IEEE-802.11n demodulators. Both of the versions of HARP are treated for comparative analysis using different performance metrics against some of the existing state-of-the-art platforms. The HARP versions are designed to illustrate large-scale homogeneous/heterogeneous multicore architectures while presenting the advantages of maximizing the number of reconfigurable processing resources on a single chip.

Explore More