Is this you? Create Your Porfile

Alvaro Vazquez

University of Santiago de Compostela

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alvaro Vazquez is active.

Explore More

Publication

Featured researches published by Alvaro Vazquez.

symposium on computer arithmetic | 2007

A New Family of High.Performance Parallel Decimal Multipliers

Alvaro Vazquez; Elisardo Antelo; Paolo Montuschi

This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD-8421 multiplier operand into minimally redundant signed-digit radix-10, radix-4 and radix-5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix-4 and radix-5 recodings allow the reuse of a conventional parallel binary radix-4 multiplier to perform combined binary/decimal multiplications. Evaluation results show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix-8 parallel binary multipliers and other representative alternatives for decimal multiplication.

IEEE Transactions on Computers | 2010

Improved Design of High-Performance Parallel Decimal Multipliers

Alvaro Vazquez; Elisardo Antelo; Paolo Montuschi

The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

international conference on computer design | 2007

A radix-10 SRT divider based on alternative BCD codings

Alvaro Vazquez; Elisardo Antelo; Paolo Montuschi

In this paper we present the algorithm and architecture a radix-10 floating-point divider based on an SRT non-restoring digit-by-digit algorithm. The algorithm uses conventional techniques developed to speed-up radix-2k division such as signed-digit (SD) redundant quotient and digit selection by constant comparison using a carry-save estimate of the partial remainder. To optimize area and latency for decimal, we include novel features such as the use of alternative BCD codings to represent decimal operands, estimates by truncation at any binary position inside a decimal digit, a single customized fast carry propagate decimal adder for partial remainder computation, initial odd multiple generation and final normalization with rounding, and register placement to exploit advanced high fanin mux-latch circuits. The rough area-delay estimations performed show that the proposed divider has a similar latency but less hardware complexity (1.3 area ratio) than a recently published high performance digit-by-digit implementation.

IEEE Transactions on Computers | 2014

Fast Radix-10 Multiplication Using Redundant BCD Codes

Alvaro Vazquez; Elisardo Antelo; Javier D. Bruguera

We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative high-performance implementations. Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. This encoding has several advantages. First, it is a self-complementing code, so that a negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available redundancy allows a fast and simple generation of multiplicand multiples in a carry-free way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD, conventional binary VLSI circuit techniques, such as binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. To show the advantages of our architecture, we have synthesized a RTL model for

symposium on computer arithmetic | 2009

Computation of Decimal Transcendental Functions Using the CORDIC Algorithm

Alvaro Vazquez; Julio Villalba; Elisardo Antelo

16\times 16

signal processing systems | 2002

Implementation of the Exponential Function in a Floating-Point Unit

Alvaro Vazquez; Elisardo Antelo

-digit and

IEEE Transactions on Computers | 2013

Iterative Algorithm and Architecture for Exponential, Logarithm, Powering, and Root Extraction

Alvaro Vazquez; Javier D. Bruguera

34\times 34

IEEE Transactions on Computers | 2012

Redundant Floating-Point Decimal CORDIC Algorithm

Alvaro Vazquez; Julio Villalba-Moreno; Elisardo Antelo; Emilio L. Zapata

-digit multiplications and performed a comparative survey of the previous most representative designs. We show that the proposed decimal multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation.

application specific systems architectures and processors | 2008

New insights on Ling adders

Alvaro Vazquez; Elisardo Antelo

In this work we propose new decimal floating-point CORDIC algorithms for transcendental function evaluation. We show how these algorithms are mapped to a state of the art Decimal Floating-Point Unit (DFPU), both considering the use of a carry--propagate adder or a carry--save redundant adder. We compared with previous decimal CORDIC proposals and with table-driven algorithms, and we concluded that our approach have significant potential advantages for transcendental function evaluation in state of the art DFPUs with minor modifications of the hardware.

asilomar conference on signals, systems and computers | 2010

Multi-operand decimal addition by efficient reuse of a binary carry-save adder tree

Alvaro Vazquez; Elisardo Antelo

In this work we present an implementation of the exponential function in double precision, in a unit that supports IEEE floating-point arithmetic. As existing proposals, the implementation is based on the use of a floating-point multiplier and additional hardware. We decompose the computation into three subexponentials. The first and third subexponentials are computed in a conventional way (table look-up and polynomial approximation). The second subexponential is computed based on a transformation of the slow radix-2 digit-recurrence algorithm into a fast computation by using the multiplier and additional hardware. We present a design process that permits the selection of the most convenient trade-off between hardware complexity and latency. We discuss the algorithm, the implementation, and perform a rough comparison with three proposed designs. Our estimations indicate that the implementation proposed in this work presents better trade-off between hardware complexity and latency than the compared designs.

Explore More