Claudio Brunelli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Claudio Brunelli is active.

Explore More

Publication

Featured researches published by Claudio Brunelli.

Eurasip Journal on Wireless Communications and Networking | 2011

State of the art baseband DSP platforms for Software Defined Radio: A survey

Omer Anjum; Tapani Ahonen; Fabio Garzia; Jari Nurmi; Claudio Brunelli; Heikki Berg

Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.

international symposium on system-on-chip | 2008

Analyzing models of computation for software defined radio applications

Heikki Berg; Claudio Brunelli; Ulf Lücking

Applying design principles and methodologies constituted in the software domain and being adapted to the complete execution environment provides new perspectives for future multi-radio computers. In order to share the underlying hardware resources efficiently, the overall system architecture and related programming model has to support dynamic behavior and extensive changes in the configuration during run-time. The requirements for such a multi-radio computer are demanding, as there will be various radio access stacks with inhomogeneous characteristics executing in parallel. This implies a configuration and control framework, besides the different protocol stacks, that is aware of the managed system in every state and is capable of dynamically scheduling different dataflow graphs corresponding to the applications running on the underlying system. This paper presents the main concepts behind such a reactive system, focusing in particular on the proposed model of computation, giving an overview on the software architecture and related problems to be solved.

signal processing systems | 2009

Approximating sine functions using variable-precision Taylor polynomials

Claudio Brunelli; Heikki Berg; David Guevorkian

Sine is one of the fundamental mathematic functions which are widely used in a number of application fields. In particular, signal processing and telecommunications need to calculate sine and cosine of numerical values for several different purposes. One of the challenges which affected the implementation of sine calculation in Digital Signal Processing (DSP) has been the method used to calculate it by means of rational functions, which would allow the implementation of sine calculation in a digital computer system. One possibility is to exploit the Taylor polynomials, even though their main drawback consists of a relatively high grade (thus computational load) already for relatively low-precision approximations. This paper proposes a variable-precision method that allows approximating sine and cosine functions with Taylor polynomials while significantly reducing the computational load required. Our analysis shows how using our method it is possible to achieve the same accuracy marked by other approximation methods, at a lower computational cost.

international symposium on system-on-chip | 2009

Mapping of the FFT on a reconfigurable architecture targeted to SDR applications

Fabio Garzia; Roberto Airoldi; Jari Nurmi; Carmelo Giliberto; Claudio Brunelli

This paper describes the implementation of a FFT on a system based on a GP core and a reconfigurable coarse-grain accelerator. The entire system has been prototyped on an Altera Stratix II device. On the prototype a 1024-point FFT gives a 40X speed-up in comparison with the software implementation. The 1024-point FFT is executed in 400μβ. Considering an ASIC synthesis of the coarse-grain array, the 1024-point FFT is executed in 42μβ, against the 104μβ of a DSP implementation.

international symposium on system-on-chip | 2010

Implementation and benchmarking of FFT algorithms on multicore platforms

Claudio Brunelli; Roberto Airoldi; Jari Nurmi

This paper analyzes the performance of the execution of a few commonly used versions of the Fast Fourier Transform (FFT) algorithm. We started from the C implementation of programs implementing the aforementioned FFT algorithms, then profiled their execution on a series of multicore platforms, both embedded and not. The aim of this work is multiple: in the first place we tried to find out how well different FFT algorithms map to different multicore processors. Secondly, we wanted to understand also how well the performance scales with the number of cores, and how well current compilers manage in exploiting the available hardware when compared to handcrafted programs. Results show that Radix-4 Cooley-Tuckey FFT is on average the best one among the algorithms considered.

ieee eurocon | 2009

Implementation of W-CDMA cell search on a runtime reconfigurable coarse-grain array

Fabio Garzia; Claudio Brunelli; Carmelo Giliberto; Jari Nurmi

This paper describes the implementation ofW-CDMA cell search on a reconfigurable architecture. The architecture is composed of a general-purpose processor core and a reconfigurable coarse-grain array accelerator. In this work we used a computational kernel mapped on the reconfigurable array to execute the W-CDMA target cell search. The acceleration produces a 26X total speed-up against a 4X overhead in area.

international symposium on system-on-chip | 2011

OpenCL implementation of Cholesky matrix decomposition

Claudio Brunelli; Eero Aho; Heikki Berg

This paper presents some OpenCL implementations for Cholesky decomposition, a very popular algorithm used in linear algebra and signal processing applications. The Cholesky algorithm represents a very interesting candidate for OpenCL implementation since it contains sequential parts besides parallel ones. Furthermore, one step involves just a small amount of calculations. These characteristics pose challenges which call for suitable techniques to overcome the limitations of the language. We propose several versions of the implementation of the Cholesky algorithm, then provide an analysis of the trade off between complexity and performance offered by each of them. We also analyze the differences between execution of the program on GPU and on multicore CPU.

international symposium on system-on-chip | 2010

Efficient floating-point texture decompression

Tomi Aarnio; Claudio Brunelli; Timo Viitanen

We propose a novel hardware design for decoding compressed floating-point textures in a graphics processing unit (GPU). Our decoder is based on the NXR texture format, which provides lossy, fixed-rate 6∶1 compression for floating-point textures. Our design exploits the constraints of the compressed pixel blocks to produce the correct output using only fixed-point arithmetic. This results in significantly lower silicon area occupation compared to pre-existing floating-point texture decoders.

Archive | 2011