Emilio L. Zapata | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emilio L. Zapata is active.

Explore More

Publication

Featured researches published by Emilio L. Zapata.

Pattern Recognition | 1997

Lower order circle and ellipse Hough transform

Nicolás Guil; Emilio L. Zapata

In this work we present two new algorithms for the detection of circles and ellipses which use the FHT algorithm as a basis: Fast Circle Hough Transform (FCHT) and Fast Ellipse Hough Transform (FEHT). The first stage of these two algorithms, devoted to obtaining the centers of the figures, is computationally the most costly. With the objective of improving the execution times of this stage it has been implemented using a new focusing algorithm instead of the typical polling process in a parameter space. This new algorithm uses a new strategy that manages to reduce the execution times, especially in the case where multiple figures appear in the image, or when they are of very different sizes. We also perform a labeling of the image points that permits discriminating which of these belong to each figure, saving computations in subsequent stages.

IEEE Transactions on Image Processing | 1995

A fast Hough transform for segment detection

Nicolás Guil; Julio Villalba; Emilio L. Zapata

The authors describe a new algorithm for the fast Hough transform (FHT) that satisfactorily solves the problems other fast algorithms propose in the literature-erroneous solutions, point redundance, scaling, and detection of straight lines of different sizes-and needs less storage space. By using the information generated by the algorithm for the detection of straight lines, they manage to detect the segments of the image without appreciable computational overhead. They also discuss the performance and the parallelization of the algorithm and show its efficiency with some examples.

IEEE Transactions on Computers | 1997

High performance rotation architectures based on the radix-4 CORDIC algorithm

Elisardo Antelo; Julio Villalba; Javier D. Bruguera; Emilio L. Zapata

Traditionally, CORDIC algorithms have employed radix-2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. The authors present a full radix-4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection function, and propose an efficient technique for the compensation of the nonconstant scale factor. Three radix-4 CORDIC architectures are implemented: 1) a word serial architecture based on the zero skipping technique, 2) a pipelined architecture, and 3) an application specific architecture (the angles are known beforehand). The first two are general purpose implementations where redundant (carry-save) or nonredundant arithmetic can be used, whereas the last one is a simplification of the first two. The proposed architectures present a good trade-off between latency and hardware complexity when compared with existing CORDIC architectures.

international conference on parallel architectures and compilation techniques | 1999

Automatic analytical modeling for the estimation of cache misses

Basilio B. Fraguela; Ramón Doallo; Emilio L. Zapata

Caches play a very important role in the performance of modern computer systems due to the gap between the memory and the processor speed. Among the methods for studying their behaviour, the most widely used has been trace-driven simulation. Nevertheless, analytical modeling gives more information and requires smaller computation times that allow it to be used in the compilation step to drive automatic optimizations on the code. The traditional drawback of analytical modeling has been its limited precision and the lack of techniques to apply it systematically without user intervention. In this work we present a methodology to build analytical models for codes with regular access patterns. These models can be applied to caches with an arbitrary size, line size and associativity. Their validation through simulations using typical scientific code fragments has proved a good degree of accuracy.

parallel computing | 1995

Data distributions for sparse matrix vector multiplication

Luis F. Romero; Emilio L. Zapata

Sparse matrix vector multiplication (SpMxV) is often one of the core components of many scientific applications. Many authors have proposed methods for its data distribution in distributed memory multiprocessors. We can classify these into four groups: Scatter, D-Way Strip, Recursive and Miscellaneous. In this work we propose a new method (Multiple Recursive Decomposition (MRD)), which partitions the data using the prime factors of the dimensions of a multiprocessor network with mesh topology. Furthermore, we introduce a new storage scheme, storage-by-row-of-blocks, that significantly increases the efficiency of the Scatter method. We will name Block Row Scatter (BRS) method this new variant. The MRD and BRS methods achieve results that improve those obtained by other analyzed methods, being their implementation easier. In fact, the data distributions resulting from the MRD and BRS methods are a generalization of the Block and Cyclic distributions used in dense matrices.

IEEE Transactions on Communications | 1997

High-performance VLSI architecture for the Viterbi algorithm

Montserrat Bóo; Francisco Argüello; Javier D. Bruguera; Ramón Doallo; Emilio L. Zapata

The Viterbi (1967) algorithm (VA) is known to be an efficient method for the realization of maximum-likelihood (ML) decoding of convolutional codes. The VA is characterized by a graph, called a trellis, which defines the transitions between states. To define an area efficient architecture for the VA is equivalent to obtaining an efficient mapping of the trellis. We present a methodology that permits the efficient hardware mapping of the VA onto a processor network of arbitrary size. This formal model is employed for the partitioning of the computations among an arbitrary number of processors in such a way that the data are recirculated, optimizing the use of the PEs and the communications. Therefore, the algorithm is mapped onto a column of processing elements and an optimal design solution is obtained for a particular set of area and/or speed constraints. Furthermore, the management of the surviving path memory for its mapping and distribution among the processors was studied. As a result, we obtain a regular and modular design appropriate for its VLSI implementation in which the only necessary communications between processors are the data recirculations between stages.

international conference on supercomputing | 2000

A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Eladio Gutiérrez; Oscar G. Plata; Emilio L. Zapata

This paper presents a new parallelization method for reductions of arrays with subscripted subscripts on scalable shared memory multiprocessors. The mapping of computations is based on grouping reduction loop iterations into sets that are further assigned to the cooperating threads of computation. Iterations belonging to the same set are chosen in such a way that update different entries in the reduction array. That is, the loop distribution implies a conflict-free write distribution of the reduction array. The iteration sets are set up by building a loop-index prefetching data structure that allows to reorder properly the loop iterations. The proposed method is general, scalable, and easy to implement on a compiler. In addition it deals in a uniform way with one and multiple subscript arrays. In case of multiple indirection arrays, writes on the reduction array affecting different sets are solved by defining conflict-free supersets. A performance evaluation is presented. From the experimental results and performance analysis, the proposed method appears as a clear alternative to the array expansion and privatized buffer techniques, used on state-of-the-art parallelizing compilers, like Polaris or SUIF. The scalability problem that those techniques exhibit is missing in our method, as the memory overhead presented does not depend on the number of processors.

IEEE Transactions on Circuits and Systems | 2010

Enhanced Scaling-Free CORDIC

Francisco Jaime; Miguel Sánchez; Javier Hormigo; Julio Villalba; Emilio L. Zapata

Coordinate Rotation DIgital Computer (CORDIC) rotator is a well known and widely used algorithm within computers due to its way of carrying out some calculations such as trigonometric functions, among others. A scale factor compensation inherent to the CORDIC algorithm becomes an important drawback when trying to improve its benefits, although some authors have come up with a new scaling-free version, which has been successfully implemented within wireless applications. However, this new CORDIC can still be significantly improved by modifying some of its parts, therefore, this paper shows an enhanced version of the scaling-free CORDIC. These new enhancements have been implemented and tested, obtaining some new architectures which are able to reach a 35% lower latency and a 36% reduction in area and power consumption compared to the original scaling-free architecture.

signal processing systems | 1996

Cordic based parallel/pipelined architecture for the Hough transform

Javier D. Bruguera; Nicolás Guil; Tomás Lang; Julio Villalba; Emilio L. Zapata

We present the design of parallel architectures for the computation of the Hough transform based on application-specific CORDIC processors. The design of the circular CORDIC in rotation mode is simplified by the a priori knowledge of the angles participating in the transform and a high throughput is obtained through a pipelined design combined with the use of redundant arithmetic (carry save adders in this paper). Saving area is essential to the design of a pipelined CORDIC and can be achieved through the reduction in the number of microrotations and/or the size of the coefficient ROM. To reduce the number of microrotations we incorporate radix 4, when it is possible, or mixed radix (radix 2 and radix 4) in the design of the processor, achieving a reduction by half and 25% microrotations, respectively, with respect to a totally radix 2 implementation. Furthermore, if we allocate two circular CORDIC rotators into one processors then the size of the shared coefficient ROM is only 50% of the ROM of a design based on two separated rotators. Finally, we have also incorporated additional microrotations in order to reduce the scale factor to one. The result is a pipelined architecture which can be easily integrated in VLSI technology due to its regularity and modularity.

IEEE Transactions on Parallel and Distributed Systems | 1992

A VLSI constant geometry architecture for the fast Hartley and Fourier transforms

Emilio L. Zapata; Francisco Argüello

An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N=2/sup n/ data items is considered as input. The system calculates the FHT and the FFT in n and n+1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (Nlog/sub 2/N)/4Q cycles for the FHT and N(1+log/sub 2/N)/4Q cycles for the FFT, where Q is the number of processors (Q= 2/sup q/, Q >

Explore More