Julio Villalba
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julio Villalba.
IEEE Transactions on Image Processing | 1995
Nicolás Guil; Julio Villalba; Emilio L. Zapata
The authors describe a new algorithm for the fast Hough transform (FHT) that satisfactorily solves the problems other fast algorithms propose in the literature-erroneous solutions, point redundance, scaling, and detection of straight lines of different sizes-and needs less storage space. By using the information generated by the algorithm for the detection of straight lines, they manage to detect the segments of the image without appreciable computational overhead. They also discuss the performance and the parallelization of the algorithm and show its efficiency with some examples.
IEEE Transactions on Computers | 1997
Elisardo Antelo; Julio Villalba; Javier D. Bruguera; Emilio L. Zapata
Traditionally, CORDIC algorithms have employed radix-2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. The authors present a full radix-4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection function, and propose an efficient technique for the compensation of the nonconstant scale factor. Three radix-4 CORDIC architectures are implemented: 1) a word serial architecture based on the zero skipping technique, 2) a pipelined architecture, and 3) an application specific architecture (the angles are known beforehand). The first two are general purpose implementations where redundant (carry-save) or nonredundant arithmetic can be used, whereas the last one is a simplification of the first two. The proposed architectures present a good trade-off between latency and hardware complexity when compared with existing CORDIC architectures.
IEEE Transactions on Circuits and Systems | 2010
Francisco Jaime; Miguel Sánchez; Javier Hormigo; Julio Villalba; Emilio L. Zapata
Coordinate Rotation DIgital Computer (CORDIC) rotator is a well known and widely used algorithm within computers due to its way of carrying out some calculations such as trigonometric functions, among others. A scale factor compensation inherent to the CORDIC algorithm becomes an important drawback when trying to improve its benefits, although some authors have come up with a new scaling-free version, which has been successfully implemented within wireless applications. However, this new CORDIC can still be significantly improved by modifying some of its parts, therefore, this paper shows an enhanced version of the scaling-free CORDIC. These new enhancements have been implemented and tested, obtaining some new architectures which are able to reach a 35% lower latency and a 36% reduction in area and power consumption compared to the original scaling-free architecture.
international conference on application specific array processors | 1995
Julio Villalba; J. A. Hidalgo; E.L. Zapata; Elisardo Antelo; Javier D. Bruguera
The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we will propose two algorithms and architectures in order to perform the compensation of the scale factor in parallel with the computation of the CORDIC iterations. This way it is not necessary to carry out the final multiplication or add scaling iterations in order to achieve the compensation. With the architectures we propose the dependence on n of the compensation of the scale factor disappears, and this considerably reduces the latency of the system. The architectures developed are optimized solutions for the different operating modes of the CORDIC both in conventional and in redundant arithmetic.
signal processing systems | 1996
Javier D. Bruguera; Nicolás Guil; Tomás Lang; Julio Villalba; Emilio L. Zapata
We present the design of parallel architectures for the computation of the Hough transform based on application-specific CORDIC processors. The design of the circular CORDIC in rotation mode is simplified by the a priori knowledge of the angles participating in the transform and a high throughput is obtained through a pipelined design combined with the use of redundant arithmetic (carry save adders in this paper). Saving area is essential to the design of a pipelined CORDIC and can be achieved through the reduction in the number of microrotations and/or the size of the coefficient ROM. To reduce the number of microrotations we incorporate radix 4, when it is possible, or mixed radix (radix 2 and radix 4) in the design of the processor, achieving a reduction by half and 25% microrotations, respectively, with respect to a totally radix 2 implementation. Furthermore, if we allocate two circular CORDIC rotators into one processors then the size of the shared coefficient ROM is only 50% of the ROM of a design based on two separated rotators. Finally, we have also incorporated additional microrotations in order to reduce the scale factor to one. The result is a pipelined architecture which can be easily integrated in VLSI technology due to its regularity and modularity.
signal processing systems | 1998
Julio Villalba; Tomás Lang; Emilio L. Zapata
The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we present two algorithms and the corresponding architectures (one for both rotation and vectoring modes and the other only for rotation mode) to perform the scaling factor compensation in parallel with the classical CORDIC iterations. With these methods, the scale factor compensation overhead is reduced to a couple of iterations for any word length. The architectures presented have been optimized for conventional and redundant arithmetic.
field-programmable logic and applications | 2004
Joaquín Olivares; Javier Hormigo; Julio Villalba; Ignacio Benavides
Most of Block based motion estimation algorithms are based on computing the sum of absolute differences (SAD) between candidate and reference block. In this paper a FPGA design for fast computing of the minimum SAD is proposed. Thanks to the use of the on–line arithmetic (OLA) two goal are achieved: it is possible to implement a full 16 × 16 macroblock SAD in a single FPGA device and it permits us to speed up the computation by early truncation of the SAD calculation. Reconfigurable devices allows us to change 8 × 8 or 16 × 16 pixels per block models. Comparison with other related works are provided.
application specific systems architectures and processors | 2009
Javier Hormigo; Manuel Ortiz; Francisco J. Quiles; Francisco Jaime; Julio Villalba; Emilio L. Zapata
Most Field Programmable Gate Array (FPGA) devices have a special fast carry propagation logic intended to optimize addition operations. The redundant adders do not easily fit into this specialized carry-logic and, consequently, they require double hardware resources than carry propagate adders, while showing a similar delay for small size operands. Therefore, carry-save adders are not usually implemented on FPGA devices, although they are very useful in ASIC implementations. In this paper we study efficient implementations of carry-save adders on FPGA devices, taking advantage of the specialized carry-logic. We show that it is possible to implement redundant adders with a hardware cost close to that of a carry propagate adder. Specifically, for 16 bits and bigger wordlengths, redundant adders are clearly faster and have an area requirement similar to carry propagate adders. Among all the redundant adders studied, the 4:2 compressor is the fastest one, presents the best exploitation of the logic resources within FPGA slices and the easiest way to adapt classical algorithms to efficiently fit FPGA resources.
IEEE Transactions on Computers | 2008
Elisardo Antelo; Julio Villalba; Emilio L. Zapata
The unfolded and pipelined CORDIC is a high-performance hardware element that produces a wide variety of one and two argument functions with high throughput. The reduction in delay, power, and area (cost) are of significant interest regarding this module due to its high demand for resources. The linear approximation to rotation has been proposed to achieve such reductions. However, the schemes for rotation (multiplication) and vectoring (division) complicate the implementation in a single unit. In this work, we improve the linear approximation scheme, leading to a unified implementation for rotation and vectoring, where fully parallel tree multipliers are used instead of the second half of CORDIC iterations. We also combine the linear approximation to rotation with the scale factor compensation so that the compensation is concurrently performed with the rotation process. We then extend the method to 3D CORDIC. Such an extension is not straightforward due to the lack of existing analytical expressions for the convergence of the algorithm. A comparison, using a rough area-time model and synthesis results, shows that our proposals may achieve significant reductions in delay, with no increase in area, in actual implementations.
application specific systems architectures and processors | 1996
Julio Villalba; J. C. Arrabal; Emilio L. Zapata; Elisardo Antelo; Javier D. Bruguera
In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm.