Mario Garrido | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mario Garrido is active.

Explore More

Publication

Featured researches published by Mario Garrido.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Pipelined Radix-

Mario Garrido; Jesus Grajal; Miguel A. Sanchez; Oscar Gustafsson

The appearance of radix-22 was a milestone in the design of pipelined FFT hardware architectures. Later, radix-22 was extended to radix-2k . However, radix-2k was only proposed for single-path delay feedback (SDF) architectures, but not for feedforward ones, also called multi-path delay commutator (MDC). This paper presents the radix-2k feedforward (MDC) FFT architectures. In feedforward architectures radix-2k can be used for any number of parallel samples which is a power of two. Furthermore, both decimation in frequency (DIF) and decimation in time (DIT) decompositions can be used. In addition to this, the designs can achieve very high throughputs, which makes them suitable for the most demanding applications. Indeed, the proposed radix-2k feedforward architectures require fewer hardware resources than parallel feedback ones, also called multi-path delay feedback (MDF), when several samples in parallel must be processed. As a result, the proposed radix-2k feedforward architectures not only offer an attractive solution for current applications, but also open up a new research line on feedforward structures.

IEEE Transactions on Circuits and Systems | 2009

2^{k}

Mario Garrido; Keshab K. Parhi; Jesus Grajal

This paper presents a new pipelined hardware architecture for the computation of the real-valued fast Fourier transform (RFFT). The proposed architecture takes advantage of the reduced number of operations of the RFFT with respect to the complex fast Fourier transform (CFFT), and requires less area while achieving higher throughput and lower latency. The architecture is based on a novel algorithm for the computation of the RFFT, which, contrary to previous approaches, presents a regular geometry suitable for the implementation of hardware structures. Moreover, the algorithm can be used for both the decimation in time (DIT) and decimation in frequency (DIF) decompositions of the RFFT and requires the lowest number of operations reported for radix 2. Finally, as in previous works, when calculating the RFFT the output samples are obtained in a scrambled order. The problem of reordering these samples is solved in this paper and a pipelined circuit that performs this reordering is proposed.

IEEE Transactions on Aerospace and Electronic Systems | 2008

Feedforward FFT Architectures

Miguel A. Sanchez; Mario Garrido; Marisa López-Vallejo; Jesus Grajal

This paper presents an in-depth study of the implementation and characterization of fast Fourier transform (FFT) pipelined architectures suitable for broadband digital channelized receivers. When implementing the FFT algorithm on field-programmable gate array (FPGA) platforms, the primary goal is to maximize throughput and minimize area. Feedback and feedforward architectures have been analyzed regarding key design parameters: radix, bitwidth, number of points and stage scaling. Moreover, a simplification of the FFT algorithm, the monobit FFT, has been implemented in order to achieve faster real time performance in broadband digital receivers. The influence of the hardware implementation on the performance of digital channelized receivers has been analyzed in depth, revealing interesting implementation trade-offs which should be taken into account when designing this kind of signal processing systems on FPGA platforms.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2011

A Pipelined FFT Architecture for Real-Valued Signals

Mario Garrido; Jesus Grajal; Oscar Gustafsson

This brief presents novel circuits for calculating bit reversal on a series of data. The circuits are simple and consist of buffers and multiplexers connected in series. The circuits are optimum in two senses: they use the minimum number of registers that are necessary for calculating the bit reversal and have minimum latency. This makes them very suitable for calculating the bit reversal of the output frequencies in hardware fast Fourier transform (FFT) architectures. This brief also proposes optimum solutions for reordering the output frequencies of the FFT when different common radices are used, including radix-2, radix-2k , radix-4, and radix-8.

international conference on acoustics, speech, and signal processing | 2007

Implementing FFT-based digital channelized receivers on FPGA platforms

Mario Garrido; Jesus Grajal

A new memoryless CORDIC algorithm for the FFT computation is proposed in this paper. This approach calculates the direction of the micro-rotations from the control counter of the FFT, so the area of the rotator hardly depends on the number of rotations, which is particularly suitable for the computation of FFTs of a high number of points. Moreover, the new CORDIC presents other advantages such as the simplification of the basic CORDIC processor used to calculate the micro-rotations, or an easy way to compensate the intrinsic gain of the CORDIC algorithm. Additionally, the VLSI implementation of the algorithm is a pipeline architecture with high performance in terms of speed, throughput and latency.

asilomar conference on signals, systems and computers | 2011

Optimum Circuits for Bit Reversal

Tanvir Ahmed; Mario Garrido; Oscar Gustafsson

This paper presents a 512-point feedforward FFT architecture for wireless personal area network (WPAN). The architecture processes a continuous flow of 8 samples in parallel, leading to a throughput of 2.64 GSamples/s. The FFT is computed in three stages that use radix-8 butterflies. This radix reduces significantly the number of rotators with respect to previous approaches based on radix-2. Besides, the proposed architecture uses the minimum memory that is required for a 512-point 8-parallel FFT. Experimental results show that besides its high throughput, the design is efficient in area and power consumption, improving the results of previous approaches. Specifically, for a wordlength of 16 bits, the proposed design consumes 61.5 mW and its area is 1.43 mm2.

IEEE Transactions on Very Large Scale Integration Systems | 2017

Efficient Memoryless Cordic for FFT Computation

Mario Garrido; Miguel A. Sanchez; Maria Luisa Lopez-Vallejo; Jesus Grajal

This brief presents a novel 4096-point radix-4 memory-based fast Fourier transform (FFT). The proposed architecture follows a conflict-free strategy that only requires a total memory of size N and a few additional multiplexers. The control is also simple, as it is generated directly from the bits of a counter. Apart from the low complexity, the FFT has been implemented on a Virtex-5 field programmable gate array (FPGA) using DSP slices. The goal has been to reduce the use of distributed logic, which is scarce in the target FPGA. With this purpose, most of the hardware has been implemented in DSP48E. As a result, the proposed FPGA is efficient in terms of hardware resources, as is shown by the experimental results.

IEEE Transactions on Circuits and Systems | 2014

A 512-point 8-parallel pipelined feedforward FFT for WPAN

Sau-Gee Chen; Shen-Jui Huang; Mario Garrido; Shyh-Jye Jou

This paper presents a bit reversal circuit for continuous-flow parallel pipelined FFT processors. In addition to two flexible commutators, the circuit consists of two memory groups, where each group has P memory banks. For the consideration of achieving both low delay time and area complexity, a novel write/read scheduling mechanism is devised, so that FFT outputs can be stored in those memory banks in an optimized way. The proposed scheduling mechanism can write the current successively generated FFT output data samples to the locations without any delay right after they are successively released by the previous symbol. Therefore, total memory space of only N data samples is enough for continuous-flow FFT operations. Since read operation is not overlapped with write operation during the entire period, only single-port memory is required, which leads to great area reduction. The proposed bit-reversal circuit architecture can generate natural-order FFT output and support variable power-of-2 FFT lengths.

ieee international radar conference | 2005

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

Miguel A. Sanchez; Mario Garrido; Marisa López-Vallejo; Jesus Grajal; Carlos A. López-Barrio

This paper presents several implementations of digital channelised receivers on field-programmable gate array (FPGA) platforms for electronic warfare (EW) applications. All implementations are based on the fast Fourier transform (FFT) but they are intended for different applications. We have studied in detail and implemented different parallel architectures for the FFT algorithm in order to maximise speed processing and throughput, and to optimise area. On the other hand, monobit implementations of the FFT have been carried out in order to get real time in broadband digital receivers. Finally, in order to improve the detection of non-stationary signals, time-frequency analysis based on the short time Fourier transform (STFT) has also been implemented.

international symposium on circuits and systems | 2013

Continuous-flow Parallel Bit-Reversal Circuit for MDF and MDC FFT Architectures

Padma Prasad Boopal; Mario Garrido; Oscar Gustafsson

This paper presents a reconfigurable FFT architecture for variable-length and multi-streaming WiMax wireless standard. The architecture processes 1 stream of 2048-point FFT, up to 2 streams of 1024-point FFT or up to 4 streams of 512-point FFT. The architecture consists of a modified radix-2 single delay feedback (SDF) FFT. The sampling frequency of the system is varied in accordance with the FFT length. The latch-free clock gating technique is used to reduce power consumption. The proposed architecture has been synthesized for the Virtex-6 XCVLX760 FPGA. Experimental results show that the architecture achieves the throughput that is required by the WiMax standard and the design has additional features compared to the previous approaches. The design uses 1% of the total available FPGA resources and maximum clock frequency of 313.67 MHz is achieved. Furthermore, this architecture can be expanded to suit other wireless standards.

Explore More