Julio Villalba | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julio Villalba is active.

Explore More

Publication

Featured researches published by Julio Villalba.

IEEE Transactions on Image Processing | 1995

A fast Hough transform for segment detection

Nicolás Guil; Julio Villalba; Emilio L. Zapata

The authors describe a new algorithm for the fast Hough transform (FHT) that satisfactorily solves the problems other fast algorithms propose in the literature-erroneous solutions, point redundance, scaling, and detection of straight lines of different sizes-and needs less storage space. By using the information generated by the algorithm for the detection of straight lines, they manage to detect the segments of the image without appreciable computational overhead. They also discuss the performance and the parallelization of the algorithm and show its efficiency with some examples.

IEEE Transactions on Computers | 1997

High performance rotation architectures based on the radix-4 CORDIC algorithm

Elisardo Antelo; Julio Villalba; Javier D. Bruguera; Emilio L. Zapata

Traditionally, CORDIC algorithms have employed radix-2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. The authors present a full radix-4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection function, and propose an efficient technique for the compensation of the nonconstant scale factor. Three radix-4 CORDIC architectures are implemented: 1) a word serial architecture based on the zero skipping technique, 2) a pipelined architecture, and 3) an application specific architecture (the angles are known beforehand). The first two are general purpose implementations where redundant (carry-save) or nonredundant arithmetic can be used, whereas the last one is a simplification of the first two. The proposed architectures present a good trade-off between latency and hardware complexity when compared with existing CORDIC architectures.

IEEE Transactions on Circuits and Systems | 2010

Enhanced Scaling-Free CORDIC

Francisco Jaime; Miguel Sánchez; Javier Hormigo; Julio Villalba; Emilio L. Zapata

Coordinate Rotation DIgital Computer (CORDIC) rotator is a well known and widely used algorithm within computers due to its way of carrying out some calculations such as trigonometric functions, among others. A scale factor compensation inherent to the CORDIC algorithm becomes an important drawback when trying to improve its benefits, although some authors have come up with a new scaling-free version, which has been successfully implemented within wireless applications. However, this new CORDIC can still be significantly improved by modifying some of its parts, therefore, this paper shows an enhanced version of the scaling-free CORDIC. These new enhancements have been implemented and tested, obtaining some new architectures which are able to reach a 35% lower latency and a 36% reduction in area and power consumption compared to the original scaling-free architecture.

international conference on application specific array processors | 1995

CORDIC architectures with parallel compensation of the scale factor

Julio Villalba; J. A. Hidalgo; E.L. Zapata; Elisardo Antelo; Javier D. Bruguera

The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we will propose two algorithms and architectures in order to perform the compensation of the scale factor in parallel with the computation of the CORDIC iterations. This way it is not necessary to carry out the final multiplication or add scaling iterations in order to achieve the compensation. With the architectures we propose the dependence on n of the compensation of the scale factor disappears, and this considerably reduces the latency of the system. The architectures developed are optimized solutions for the different operating modes of the CORDIC both in conventional and in redundant arithmetic.

signal processing systems | 1996

Cordic based parallel/pipelined architecture for the Hough transform

Javier D. Bruguera; Nicolás Guil; Tomás Lang; Julio Villalba; Emilio L. Zapata

We present the design of parallel architectures for the computation of the Hough transform based on application-specific CORDIC processors. The design of the circular CORDIC in rotation mode is simplified by the a priori knowledge of the angles participating in the transform and a high throughput is obtained through a pipelined design combined with the use of redundant arithmetic (carry save adders in this paper). Saving area is essential to the design of a pipelined CORDIC and can be achieved through the reduction in the number of microrotations and/or the size of the coefficient ROM. To reduce the number of microrotations we incorporate radix 4, when it is possible, or mixed radix (radix 2 and radix 4) in the design of the processor, achieving a reduction by half and 25% microrotations, respectively, with respect to a totally radix 2 implementation. Furthermore, if we allocate two circular CORDIC rotators into one processors then the size of the shared coefficient ROM is only 50% of the ROM of a design based on two separated rotators. Finally, we have also incorporated additional microrotations in order to reduce the scale factor to one. The result is a pipelined architecture which can be easily integrated in VLSI technology due to its regularity and modularity.

signal processing systems | 1998

Parallel Compensation of Scale Factor for the CORDIC Algorithm

Julio Villalba; Tomás Lang; Emilio L. Zapata

The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we present two algorithms and the corresponding architectures (one for both rotation and vectoring modes and the other only for rotation mode) to perform the scaling factor compensation in parallel with the classical CORDIC iterations. With these methods, the scale factor compensation overhead is reduced to a couple of iterations for any word length. The architectures presented have been optimized for conventional and redundant arithmetic.

field-programmable logic and applications | 2004

Minimum Sum of Absolute Differences Implementation in a Single FPGA Device

Joaquín Olivares; Javier Hormigo; Julio Villalba; Ignacio Benavides

Most of Block based motion estimation algorithms are based on computing the sum of absolute differences (SAD) between candidate and reference block. In this paper a FPGA design for fast computing of the minimum SAD is proposed. Thanks to the use of the on–line arithmetic (OLA) two goal are achieved: it is possible to implement a full 16 × 16 macroblock SAD in a single FPGA device and it permits us to speed up the computation by early truncation of the SAD calculation. Reconfigurable devices allows us to change 8 × 8 or 16 × 16 pixels per block models. Comparison with other related works are provided.

application specific systems architectures and processors | 2009

Efficient Implementation of Carry-Save Adders in FPGAs

Javier Hormigo; Manuel Ortiz; Francisco J. Quiles; Francisco Jaime; Julio Villalba; Emilio L. Zapata

Most Field Programmable Gate Array (FPGA) devices have a special fast carry propagation logic intended to optimize addition operations. The redundant adders do not easily fit into this specialized carry-logic and, consequently, they require double hardware resources than carry propagate adders, while showing a similar delay for small size operands. Therefore, carry-save adders are not usually implemented on FPGA devices, although they are very useful in ASIC implementations. In this paper we study efficient implementations of carry-save adders on FPGA devices, taking advantage of the specialized carry-logic. We show that it is possible to implement redundant adders with a hardware cost close to that of a carry propagate adder. Specifically, for 16 bits and bigger wordlengths, redundant adders are clearly faster and have an area requirement similar to carry propagate adders. Among all the redundant adders studied, the 4:2 compressor is the fastest one, presents the best exploitation of the logic resources within FPGA slices and the easiest way to adapt classical algorithms to efficiently fit FPGA resources.

IEEE Transactions on Computers | 2008

A Low-Latency Pipelined 2D and 3D CORDIC Processors

Elisardo Antelo; Julio Villalba; Emilio L. Zapata

The unfolded and pipelined CORDIC is a high-performance hardware element that produces a wide variety of one and two argument functions with high throughput. The reduction in delay, power, and area (cost) are of significant interest regarding this module due to its high demand for resources. The linear approximation to rotation has been proposed to achieve such reductions. However, the schemes for rotation (multiplication) and vectoring (division) complicate the implementation in a single unit. In this work, we improve the linear approximation scheme, leading to a unified implementation for rotation and vectoring, where fully parallel tree multipliers are used instead of the second half of CORDIC iterations. We also combine the linear approximation to rotation with the scale factor compensation so that the compensation is concurrently performed with the rotation process. We then extend the method to 3D CORDIC. Such an extension is not straightforward due to the lack of existing analytical expressions for the convergence of the algorithm. A comparison, using a rough area-time model and synthesis results, shows that our proposals may achieve significant reductions in delay, with no increase in area, in actual implementations.

application specific systems architectures and processors | 1996

Radix-4 vectoring CORDIC algorithm and architectures

Julio Villalba; J. C. Arrabal; Emilio L. Zapata; Elisardo Antelo; Javier D. Bruguera

In this work we extend the radix-4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix-4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix-4 division algorithm, there are specific issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for nonredundant and redundant arithmetic (following two different approaches: arithmetic comparisons and table look-up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word-serial architecture. When compared with conventional radix-2 (redundant and non-redundant) architectures, the radix-4 algorithms present a significant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the nonconstant scale factor in the radix-4 algorithm.

Explore More