Angel M. Buron
University of Cantabria
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Angel M. Buron.
IEEE Transactions on Signal Processing | 1999
J. Garcia; Juan A. Michell; Angel M. Buron
This paper presents a full custom one-bit slice delay commutator for a pipeline split radix FFT (SRFFT) architecture, implemented using the true single-phase-clock (TSPC) circuit technique and a 1.0-/spl mu/m CMOS technology. This circuit can be configured or all intermediate SRFFT computation levels for transforms of lengths up to N=2048, where N is power of two. The circuit has been tested up to 200 MHz, having a power consumption of 1.1 W at 5 V of power supply.
Proceedings of SPIE | 2007
Jesús García; Juan A. Michell; Gustavo A. Ruiz; Angel M. Buron
Applications based on Fast Fourier Transform (FFT) such as signal and image processing require high computational power, plus the ability to choose the algorithm and architecture to implement it. This paper explains the realization of a Split Radix FFT (SRFFT) processor based on a pipeline architecture reported before by the same authors. This architecture has as basic building blocks a Complex Butterfly and a Delay Commutator. The main advantages of this architecture are: * To combine the higher parallelism of the 4r-FFTs and the possibility of processing sequences having length of any power of two. * The simultaneous operation of multipliers and adder-subtracters implicit in the SRFFT, which leads to faster operation at the same degree of pipeline. The implementation has been made on a Field Programmable Gate Array (FPGA) as a way of obtaining high performance at economical price and a short time of realization. The Delay Commutator has been designed to be customized for even and odd SRFFT computation levels. It can be used with segmented arithmetic of any level of pipeline in order to speed up the operating frequency. The processor has been simulated up to 350 MHz, with an EP2S15F672C3 Altera Stratix II as a target device, for a transform length of 256 complex points.
Computers & Electrical Engineering | 2007
F. Javier Diaz; Angel M. Buron; José M. Solana
A hardware-oriented image coding processing scheme based on the Haar wavelet transform is presented. The procedure computes a variant of the Haar wavelet transform that uses only addition and subtraction operations, after that, an optimized methodology performs the selection and coding of the coefficients, tailored for it with the main aim of attaining the lowest circuit complexity hardware implementation. A selection strategy, which does not require the previous ordering of coefficients, has been used. A non-conventional coding methodology, which uses an optimized combination of techniques adapted to the various groups of coefficients, has been devised for the coding of the selected coefficients leading to a compressed representation of the image and reducing the coding problems inherent in threshold selection. The compression level reached for images of 512x512 pixels with 256 grey levels is just over 22:1, (0.4bits/pixel) with a normalized mean square error, nrmse, of 2-3%, with subjective qualities which can be classified as good. The whole compression circuitry has been described and simulated at HDL level for up to 4 consecutive images, obtaining consistent results. The complete processor (excluding memory) for images of 256x256 pixels has been implemented using only one general-purpose low-cost FPGA chip, thus proving the design reliability and its relative simplicity.
IEEE Transactions on Signal Processing | 2005
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron
The Integer Cosine Transform (ICT) presents a performance close to Discrete Cosine Transform (DCT) with a reduced computational complexity. The ICT kernel is integer-based, so computation only requires adding and shifting operations. This work presents a parallel-pipelined architecture of an 8/spl times/8 forward two-dimensional (2-D) ICT(10,9,6,2,3,1) processor for image encoding. A fully pipelined row-column decomposition method based on two one-dimensional (1-D) ICTs and a transpose buffer based on D-type flip-flops is used. The main characteristics of 1-D ICT architecture are high throughput, parallel processing, reduced internal storage, and 100% efficiency in computational elements. The arithmetic units are distributed and are made up of adders/subtractors operating at half the frequency of the input data rate. In this transform, the truncation and rounding errors are only introduced at the final normalization stage. The normalization coefficient word length of 18-bit (13-bit effective) has been established using the requirements of IEEE standard 1180-1990 as a reference. The processor has been implemented using standard cell design methodology in 0.35-/spl mu/m CMOS technology, measures 9.3 mm/sup 2/, and contains 12.4 k gates. The maximum frequency is 300 MHz with a latency of 214 cycles (260 cycles with normalization).
international conference on image processing | 2003
Juan A. Michell; Gustavo A. Ruiz; Angel M. Buron
The integer cosine transform (ICT) has been shown to be an alternative to the DCT for image processing. This paper presents a parallel-pipelined architecture of an 8/spl times/8 ICT(I0, 9, 6, 2, 3, 1) processor for image compression. The main characteristics of this architecture are: high throughput, low latency, reduced internal storage and 100% efficiency in all computational elements. The processor has been designed in 0.35-/spl mu/m CMOS technology with an operational frequency of 300 MHz.
Proceedings of SPIE | 2003
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron; José M. Solana; Miguel A. Manzano; J. Diaz
The Discrete Cosine Transform (DCT) is the most widely used transform for image compression. The Integer Cosine Transform denoted ICT (10, 9, 6, 2, 3, 1) has been shown to be a promising alternative to the DCT due to its implementation simplicity, similar performance and compatibility with the DCT. This paper describes the design and implementation of a 8×8 2-D ICT processor for image compression, that meets the numerical characteristic of the IEEE std. 1180-1990. This processor uses a low latency data flow that minimizes the internal memory and a parallel pipelined architecture, based on a numerical strength reduction Integer Cosine Transform (10, 9, 6, 2, 3, 1) algorithm, in order to attain high throughput and continuous data flow. A prototype of the 8×8 ICT processor has been implemented using a standard cell design methodology and a 0.35-μm CMOS CSD 3M/2P 3.3V process on a 10 mm2 die. Pipeline circuit techniques have been used to attain the maximum frequency of operation allowed by the technology, attaining a critical path of 1.8ns, which should be increased by a 20% to allow for line delays, placing the estimated operational frequency at 500Mhz. The circuit includes 12446 cells, being flip-flops 6757 of them. Two clock signals have been distributed, an external one (fs) and an internal one (fs/2). The high number of flip-flops has forced the use of a strategy to minimize clock-skew, combining big sized buffers on the periphery and using wide metal lines (clock-trunks) to distribute the signals.
international conference on image processing | 2005
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron
This paper describes the architecture of an 8/spl times/8 2-D DCT/IDCT processor with high throughput, reduced hardware, and a parallel-pipeline scheme. This architecture allows the processing elements and arithmetic units to work in parallel at half the frequency of the data input rate. A fully pipelined row-column decomposition method based on two 1-D DCTs and a transpose buffer based on D-type flip-flops are used. The processor has been implemented in a 0.35-/spl mu/m CMOS process with a core area of 3mm/sup 2/ and 11.7k gates. It meets the requirements of IEEE Std. 1180-1990. The data input rate frequency is 300 MHz with a latency of 172 cycles for 2-D DCT and 178 cycles for 2-D IDCT. The proposed design is compact and suitable for HDTV applications.
Proceedings of SPIE | 2005
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron
This paper describes the architecture of an 8x8 2-D DCT/IDCT processor with high throughput and a cost-effective architecture. The 2D DCT/IDCT is calculated using the separability property, so that its architecture is made up of two 1-D processors and a transpose buffer (TB) as intermediate memory. This transpose buffer presents a regular structure based on D-type flip-flops with a double serial input/output data-flow very adequate for pipeline architectures. The processor has been designed with parallel and pipeline architecture to attain high throughput, reduced hardware and maximum efficiency in all arithmetic elements. This architecture allows that the processing elements and arithmetic units work in parallel at half the frequency of the data input rate, except for normalization of transform which it is done in a multiplier operating at maximum frequency. Moreover, it has been verified that the precision analysis of the proposed processor meets the demands of IEEE Std. 1180-1990 used in video codecs ITU-T H.261 and ITU-T H.263. This processor has been conceived using a standard cell design methodology and manufactured in a 0.35-μm CMOS CSD 3M/2P 3.3V process. It has an area of 6.25 mm2 (the core is 3mm2) and contains a total of 11.7k gates, of which 5.8k gates are flip-flops. A data input rate frequency of 300MHz has been established with a latency of 172 cycles for the 2-D DCT and 178 cycles for the 2-D IDCT. The computing time of a block is close to 580ns. Its performances in computing speed as well as hardware complexity indicate that the proposed design is suitable for HDTV applications.
IEE Proceedings E Computers and Digital Techniques | 1992
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron
signal processing systems | 2006
Gustavo A. Ruiz; Juan A. Michell; Angel M. Buron