José-Alejandro Piñeiro
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José-Alejandro Piñeiro.
IEEE Transactions on Computers | 2005
José-Alejandro Piñeiro; Stuart F. Oberman; Jean-Michel Muller; Javier D. Bruguera
A table-based method for high-speed function approximation in single-precision floating-point format is presented in this paper. Our focus is the approximation of reciprocal, square root, square root reciprocal, exponentials, logarithms, trigonometric functions, powering (with a fixed exponent p), or special functions. The algorithm presented here combines table look-up, an enhanced minimax quadratic approximation, and an efficient evaluation of the second-degree polynomial (using a specialized squaring unit, redundant arithmetic, and multioperand addition). The execution times and area costs of an architecture implementing our method are estimated, showing the achievement of the fast execution times of linear approximation methods and the reduced area requirements of other second-degree interpolation algorithms. Moreover, the use of an enhanced minimax approximation which, through an iterative process, takes into account the effect of rounding the polynomial coefficients to a finite size allows for a further reduction in the size of the look-up tables to be used, making our method very suitable for the implementation of an elementary function generator in state-of-the-art DSPs or graphics processing units (GPUs).
symposium on computer arithmetic | 2001
José-Alejandro Piñeiro; Javier D. Bruguera; Jean-Michel Muller
A method for the calculation of faithfully rounded single-precision floating-point powering (X/sup p/) is proposed in this paper. This method employs table look-up and a second-degree minimax approximation, which allows the employment of reduced size tables to store the coefficients from the polynomial approximation. A specialized squaring unit and a fused accumulation tree carry out with the computation of the quadratic polynomial. Both unfolded and pipelined architectures are presented, and the results of a pre-layout synthesis performed using CMOS 0.35 /spl mu/m technology are shown, achieving a 50% area reduction from linear approximation methods, and with improved speed over other second-degree approximation based algorithms. The pipelined architecture has a latency of three cycles and a throughput of one result per cycle.
IEEE Transactions on Computers | 2004
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
An architecture for the computation of logarithm, exponential, and powering operations is presented in this paper, based on a high-radix composite algorithm for the computation of the powering function (X/sup Y/). The algorithm consists of a sequence of overlapped operations: 1) digit-recurrence logarithm, 2) left-to-right carry-free (LRCF) multiplication, and 3) online exponential. A redundant number system is used and the selection in 1) and 3) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm, with a control unit which allows the independent computation of logarithm and exponential, is proposed and the execution times and hardware requirements are estimated for single and double-precision floating-point computations. These estimates are obtained for radices from r=8 to r=1,024, according to an approximate model for the delay and area of the main logic blocks and help determining the radix values which lead to the most efficient implementations: r=32 and r=128.
signal processing systems | 2005
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
A high-radix digit-recurrence algorithm for the computation of the logarithm, and an analysis of the tradeoffs between area and speed for its implementation, are presented in this paper. Selection by rounding is used in iterations j ≥ 2, and by table look-up in the first iteration. A sequential architecture is proposed, and estimates of the execution time and hardware requirements are obtained for n = 16, 24, 32, 53 and 64 bits of precision and for radix values from r = 8 to r = 1024. These estimates are obtained according to an approximate model for the delay and area of the main logic blocks. We show that the most efficient implementations are obtained for radices ranging from r = 32 to r = 256, reducing the execution time by half with respect to a radix-4 implementation with redundant arithmetic.
application-specific systems, architectures, and processors | 2002
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
A high-radix digit-recurrence algorithm or the computation of the logarithm is presented in this paper. Selection by rounding is used in iterations j/spl ges/2, and selection by table in the first iteration is combined with a restricted digit-set for the second one, in order to guarantee the convergence of the algorithm. A sequential architecture is proposed. and the execution time and hardware requirements of this architecture are estimated, for a target precision of n=32 bits and a radix r=256. These estimates are obtained according to a rough model for the delay and area cost of the main logic blocks employed, and show the achievement of a speed-up by over 4 times with regard to a conventional radix-2 implementation with redundant arithmetic.
digital systems design | 2002
D. Piso; José-Alejandro Piñeiro; Javier D. Bruguera
An analysis of the impact of different methods for the double-precision computation of division and square root in the performance of a superscalar processor is presented in this paper. This analysis is carried out combining the SimpleScalar toolset, estimates of the latency and throughput of the compared methods and a set of benchmarks with typical features of intensive computing applications. Simulation results show the importance of having an efficient unit for the computation of these operations, since changes in the density of division and square root below 1% lead to changes in the performance around a 20%.
digital systems design | 2001
José-Alejandro Piñeiro; Javier D. Bruguera; Jean-Michel Muller
A FPGA implementation of a method for the calculation of faithfully rounded single-precision floating-point powering (X/sup p/) is presented in this paper. A second-degree minimax polynomial approximation is used, together with the employment of table look-up, a specialized squaring unit and a fused accumulation tree. The FPGA implementation of an architecture with a latency of 3 cycles and a throughput of one result per cycle has been performed using a Xilinx XC4036XL device. The implemented unit has an operation frequency over 33 MHz.
symposium on computer arithmetic | 2003
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
A high-radix composite algorithm for the computation of the powering function (X/sup Y/) is presented. The algorithm consists of a sequence of overlapped operations: (i) digit-recurrence logarithm, (ii) left-to-right carry-free (LRCF) multiplications, and (iii) online exponential. A redundant number system is used, and the selection in (i) and (iii) is done by rounding except from the first iteration, when selection by table look-up is necessary to guarantee the convergence of the recurrences. A sequential implementation of the algorithm is proposed, and the execution times and hardware requirements are estimated for single and double-precision floating-point computations, for radix r=128, showing that powering can be computed with similar performance as high-radix CORDIC algorithms.
international symposium on circuits and systems | 2003
José-Alejandro Piñeiro; Javier D. Bruguera; Milos D. Ercegovac
An on-line high-radix algorithm for computing the exponential function (e/sup x/) with arbitrary precision n is presented. Selection by rounding and a redundant digit-set for the digits e/sub j/ are used, with selection by table in the first iteration to guarantee the convergence of the algorithm, and the on-line delay is /spl delta/ = 2 cycles. A sequential architecture implementing the algorithm is proposed, and the execution times and hardware requirements are estimated for 32-bit and 64-bit computations for several radix values. An analysis of the tradeoff between area and speed shows that the most efficient implementations are obtained for radix values from r = 32 to 256, depending on the precision.
international conference on computer design | 2002
José-Alejandro Piñeiro; Milos D. Ercegovac; Javier D. Bruguera
An analysis of the tradeoffs between area and speed for a sequential implementation of a high-radix recurrence for logarithm computation is presented in this paper The high-radix algorithm is outlined and a sequential architecture is proposed, with the use of selection by rounding of the digits and redundant representation. Estimates of the execution time and total area are obtained for n = 16, 32 and 64 bits of precision and for radix values from r = 8 to r = 1024. An analysis of the tradeoffs between area and speed is presented, showing that the most efficient implementations are obtained for radices r = 256 for 16, 32 bit and r = 128 for 64 bit computations.