Juan Torres López
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juan Torres López.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1999
J.A. Hidalgo; Juan Torres López; Francisco Argüello; E.L. Zapata
We present an area-efficient parallel architecture that implements the constant-geometry, in-place Fast Fourier transform. It consists of a specific purpose processor array interconnected by means of a perfect unshuffle network. For a radix r transform of N=r/sup n/ data of size D and a column of P=r/sup p/ processors, each processor has only one local memory of N/rP words of size rD, with only one read port and one write port that, nevertheless, make it possible to read the r inputs of a butterfly and write r intermediate results in each memory cycle. The address generating circuit that permits the in-place implementation is simple and the same for all the local memories. The data how has been designed to efficiently exploit the pipelining of the processing section with no cycle loss. This architecture reduces the area by almost 50% of other designs with a similar performance.
international conference on multimedia and expo | 2002
Sonia González; Angeles G. Navarro; Juan Torres López; Emilio L. Zapata
In recent years, there has been an increasing interest in video on demand (VoD) systems. We study a distributed VoD system in which the videos are replicated according to their popularity. We present an algorithm to share the load in such a system efficiently and an analytical model that captures the performance of this algorithm, which we validate through simulations. This research shows that popularity is an essential parameter which can save storage without reducing performance by just replicating a few popular videos in all the servers.
international conference on acoustics speech and signal processing | 1998
María A. Trenas; Juan Torres López; Emilio L. Zapata; Francisco Argüello
Real time image processing uses SIMD engines to accelerate the computation of algorithms such as the DCT, FFT or DWT. So, a good skewing scheme becomes essential for avoiding memory bank conflicts. A memory system is introduced for the efficient in-place computation of such transforms. It consists of M=2/sup m/ memory modules, providing parallel access to M image points whose patterns are a row or a column, the interval in both cases being 2/sup l/, l/spl ges/0. The efficiency of our design is proved through the computation of the 2D DWT.
The Computer Journal | 2001
Margarita Amor; Francisco Argüello; Juan Torres López; Oscar G. Plata; Emilio L. Zapata
This paper presents a general data-parallel formulation for a class of problems based on the divide and conquer strategy. A combination of three techniques—mapping vectors, index-digit permutations and space-filling curves—are used to reorganize the algorithmic dataflow, providing great flexibility to efficiently exploit data locality and to reduce and optimize communications. In addition, these techniques allow the easy translation of the reorganized dataflows into HPF (High Performance Fortran) constructs. Finally, experimental results on the Cray T3E validate our method.
IEEE Transactions on Computers | 1994
Juan Torres López; Emilio L. Zapata
The solution of tridiagonal systems is a topic of great interest in many areas of numerical analysis. Several algorithms have recently been proposed for solving triadiagonal systems based on the Divide and Conquer (DC) strategy. In this work we propose a unified parallel architecture for DC algorithms which present the data flows of the Successive Doubling, Recursive Doubling and Parallel Cyclic Reduction methods. The architecture is based in the perfect unshuffle permutation, which transforms these data flows into a constant geometry one. The partition of the data arises in a natural manner, giving way to a systolic data flow with a wired control section. We conclude that the constant geometry Cyclic Reduction architecture is the most appropriate one for solving tridiagonal systems and, from the point of view of integration in VLSI technology, is the one which uses the least amount of area and the smallest number of pins. >
IEEE Transactions on Multimedia | 2006
Sonia González; Angeles G. Navarro; Juan Torres López; Emilio L. Zapata
In our research, we consider a distributed video-on-demand (VoD) system in which only the most popular videos are replicated in all the servers, whereas the rest of them are distributed through the system following some allocation scheme. In this paper, we present an algorithm to efficiently share the load in such a system and an analytical model that captures the performance of this algorithm, which we validate through simulations. One novelty in our work is that our analytical model lets us relate popularity and partial replication of some of the videos and to predict the user waiting time. We exploit such relationships to assist the system designer to select the size of the servers and network, the optimal number of servers to maintain short waiting time and to predict when the network encounters bottleneck
symposium on computer architecture and high performance computing | 2012
Alberto Sanz; Rafael Asenjo; Juan Torres López; Rafael Larrosa; Angeles G. Navarro; Vassily Litvinov; Sung-Eun Choi; Bradford L. Chamberlain
Chapel is a parallel programming language designed to improve the productivity and ease of use of conventional and parallel computers. This language currently delivers sub optimal performance when executing codes that perform global data re-allocation operations on distributed memory architectures. This is mainly due to data communication that is done without aggregation (one message for each remote array element). In this work, we analyze Chapels standard Block and Cyclic distribution modules and optimize the communication routines for array assignments by performing aggregation. Thanks to the expressive power of Chapel, the compiler and runtime have enough information to do communication aggregation without user intervention. The runtime relies on the low-level GAS Net networking layer, whose versions of one-sided bulk put/get routines that support strides are particularly useful for us. Experimental results conducted on Hector (a Cray XE6) and Jaguar (Cray XK6)reveal that the implemented techniques can lead to significant reductions in communication time.
application-specific systems, architectures, and processors | 2000
María A. Trenas; Juan Torres López; Manuel Sánchez; Francisco Argüello; E.L. Zapata
Wavelet Packet Transform (WPT) provides good spectral and temporal resolutions in arbitrary regions of the time-frequency plane. Given an additive cost function, a best-tree searching algorithm allows the selection of the best basis for a given signal according to this function. This adaptive choice of the time-frequency tiling benefits most of the applications where the standard Wavelet Transform (WT) has already shown to be useful. Though many specific architectures have been proposed in the literature for the WT, it is not the case for WPT. In this work we present a specific architecture for WPT which implements the best-tree searching algorithm.
Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future | 2000
María A. Trenas; Juan Torres López; Emilio L. Zapata
The wavelet packet transform (WPT) provides good spectral and temporal resolutions in arbitrary regions of the time-frequency plane. This flexible choice of the time-frequency tiling benefits most of the applications where the standard wavelet transform (WT) has already shown to be useful. Examples of these application areas are: signal and image compression, non-linear filtering or denoising, speech coding, medical and biomedical signal and image processing, and communication. However though many specific architectures have been proposed in the literature for the WT it is not the case for WPT. We propose a folded word-serial pipelined architecture able of computing a complete WPT binary tree in an on-line fashion, but easily configurable in order to compute any required WPT subtree. This architecture has been tested by means of a functional simulation and the implementation of its control circuitry on an FPGA device.
parallel computing | 1997
Juan Torres López; Oscar G. Plata; Francisco Argüello; Emilio L. Zapata
In this paper we describe a method for the regularization and parallelization of tridiagonal algorithms based on the divide and conquer strategy. The method is based on perfect shuffle and unshuffle permutations which transform the flow of these algorithms into a flow with the same pattern of communications in all the stages (constant geometry). We use a unified parallel architecture defined by a column of P = rp processors (1 5 P 5 N/r, for systems with N = r” equations) interconnected by means of a shuffle and a ring network as a framework to compare the most important parallel solvers for tridiagonal systems.