Javier Hormigo
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Javier Hormigo.
IEEE Transactions on Circuits and Systems | 2010
Francisco Jaime; Miguel Sánchez; Javier Hormigo; Julio Villalba; Emilio L. Zapata
Coordinate Rotation DIgital Computer (CORDIC) rotator is a well known and widely used algorithm within computers due to its way of carrying out some calculations such as trigonometric functions, among others. A scale factor compensation inherent to the CORDIC algorithm becomes an important drawback when trying to improve its benefits, although some authors have come up with a new scaling-free version, which has been successfully implemented within wireless applications. However, this new CORDIC can still be significantly improved by modifying some of its parts, therefore, this paper shows an enhanced version of the scaling-free CORDIC. These new enhancements have been implemented and tested, obtaining some new architectures which are able to reach a 35% lower latency and a 36% reduction in area and power consumption compared to the original scaling-free architecture.
field-programmable logic and applications | 2004
Joaquín Olivares; Javier Hormigo; Julio Villalba; Ignacio Benavides
Most of Block based motion estimation algorithms are based on computing the sum of absolute differences (SAD) between candidate and reference block. In this paper a FPGA design for fast computing of the minimum SAD is proposed. Thanks to the use of the on–line arithmetic (OLA) two goal are achieved: it is possible to implement a full 16 × 16 macroblock SAD in a single FPGA device and it permits us to speed up the computation by early truncation of the SAD calculation. Reconfigurable devices allows us to change 8 × 8 or 16 × 16 pixels per block models. Comparison with other related works are provided.
Pattern Recognition Letters | 1999
Gabriel Cristóbal; Javier Hormigo
Abstract In this paper we propose a new method for texture segmentation based on the use of texture feature detectors derived from a decorrelation procedure of a modified version of a Pseudo-Wigner distribution (PWD). The decorrelation procedure is accomplished by a cascade recursive least squared (CRLS) principal component (PC) neural network. The goal is to obtain a more efficient analysis of images by combining the advantages of using a high-resolution joint representation given by the PWD with an effective adaptive principal component analysis (PCA) through the use of feedforward neural networks.
application specific systems architectures and processors | 2009
Javier Hormigo; Manuel Ortiz; Francisco J. Quiles; Francisco Jaime; Julio Villalba; Emilio L. Zapata
Most Field Programmable Gate Array (FPGA) devices have a special fast carry propagation logic intended to optimize addition operations. The redundant adders do not easily fit into this specialized carry-logic and, consequently, they require double hardware resources than carry propagate adders, while showing a similar delay for small size operands. Therefore, carry-save adders are not usually implemented on FPGA devices, although they are very useful in ASIC implementations. In this paper we study efficient implementations of carry-save adders on FPGA devices, taking advantage of the specialized carry-logic. We show that it is possible to implement redundant adders with a hardware cost close to that of a carry propagate adder. Specifically, for 16 bits and bigger wordlengths, redundant adders are clearly faster and have an area requirement similar to carry propagate adders. Among all the redundant adders studied, the 4:2 compressor is the fastest one, presents the best exploitation of the logic resources within FPGA slices and the easiest way to adapt classical algorithms to efficiently fit FPGA resources.
IEEE Transactions on Computers | 2013
Javier Hormigo; Julio Villalba; Emilio L. Zapata
Although redundant addition is widely used to design parallel multioperand adders for ASIC implementations, the use of redundant adders on Field Programmable Gate Arrays (FPGAs) has generally been avoided. The main reasons are the efficient implementation of carry propagate adders (CPAs) on these devices (due to their specialized carry-chain resources) as well as the area overhead of the redundant adders when they are implemented on FPGAs. This paper presents different approaches to the efficient implementation of generic carry-save compressor trees on FPGAs. They present a fast critical path, independent of bit width, with practically no area overhead compared to CPA trees. Along with the classic carry-save compressor tree, we present a novel linear array structure, which efficiently uses the fast carry-chain resources. This approach is defined in a parameterizable HDL code based on CPAs, which makes it compatible with any FPGA family or vendor. A detailed study is provided for a wide range of bit widths and large number of operands. Compared to binary and ternary CPA trees, speedups of up to 2.29 and 2.14 are achieved for 16-bit width and up to 3.81 and 3.11 for 64-bit width.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2015
Sergio D. Muñoz; Javier Hormigo
This brief presents a hardware design to achieve high-throughput QR decomposition, using the Givens rotation method. It utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point field-programmable gate array (FPGA) architecture for 4 × 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared with other previous proposals for FPGA, our design achieves at least 50% more throughput, as well as much less resource utilization.
signal processing systems | 2004
Javier Hormigo; Julio Villalba; Emilio L. Zapata
In this paper we present a specific CORDIC processor for variable-precision coordinates. This system allows us to specify the precision to perform the CORDIC operation, and control the accuracy of the result, in such a way that re-computation of inaccurate results can be carried out with higher precision. It permits a reliable and accurate evaluation of a wide range of elementary functions. The specific architecture designed greatly improves the computational time of previous solutions based on classic polynomial approximation. For controlling error in numerical computation (where intervals are normally narrow) the proposed design performs an interval operation in a time close to that of a point operation.
application-specific systems, architectures, and processors | 2000
Javier Hormigo; Julio Villalba; Michael J. Schulte
This paper presents an efficient hardware algorithm for variable-precision logarithm. The algorithm uses an iterative technique that employs table lookups and polynomial approximations. Compared to similar algorithms, it reduces the number of fixed-precision operations by avoiding full precision computations and dynamically varying the precision of intermediate results. It also uses significantly smaller tables than related algorithms. For a specified hardware implementation, the algorithm requires fewer than 2L/sup 2/ fixed-precision multiplications to evaluate the logarithm to L words of precision. An error analysis for the algorithm is also presented.
application-specific systems, architectures, and processors | 2002
Julio Villalba; Gerardo Bandera; Mario A. González; Javier Hormigo; Emilio L. Zapata
In this paper we deal with polynomial evaluation based on new processor architectures for multimedia applications. We introduce some algorithms to take advantage of the new attributes of multimedia processors, such as VLIW (very long instruction word) and SIMD (single instruction multiple data architecture) architectures. Algorithms to support polynomial evaluation based only in addition/shift operations and other different algorithms with MAC (multiply-and-add) instructions are analyzed and tailored to subword parallelism units of the new processors. Both potential instruction-level and machine-level parallelism are fully exploited through concurrent use of all functional units.
asilomar conference on signals, systems and computers | 2014
Javier Hormigo; Julio Villalba
This paper presents a new family of arithmetic operators to optimize the implementation of circuits for digital signal processing. They are based on using a new fixed-point format which allows performing rounding to nearest as the same cost as truncation. Thanks to the use of rounding, the word-length optimization may improve significantly respect to using conventional units and truncation. That reduction means a simultaneous improvement of area, delay, and, consequently, power consumption. As an example, several FIR filters have been tested, and an area reduction up to 50% along with a speed improvement up to 42% has been obtained.