Alfredo Remón
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alfredo Remón.
parallel computing | 2011
Peter Benner; Pablo Ezzatti; Daniel Kressner; Enrique S. Quintana-Ortí; Alfredo Remón
We describe a hybrid Lyapunov solver based on the matrix sign function, where the intensive parts of the computation are accelerated using a graphics processor (GPU) while executing the remaining operations on a general-purpose multi-core processor (CPU). The initial stage of the iteration operates in single-precision arithmetic, returning a low-rank factor of an approximate solution. As the main computation in this stage consists of explicit matrix inversions, we propose a hybrid implementation of Gausz-Jordan elimination using look-ahead to overlap computations on GPU and CPU. To improve the approximate solution, we introduce an iterative refinement procedure that allows to cheaply recover full double-precision accuracy. In contrast to earlier approaches to iterative refinement for Lyapunov equations, this approach retains the low-rank factorization structure of the approximate solution. The combination of the two stages results in a mixed-precision algorithm, that exploits the capabilities of both general-purpose CPUs and many-core GPUs and overlaps critical computations. Numerical experiments using real-world data and a platform equipped with two Intel Xeon QuadCore processors and an Nvidia Tesla C1060 show a significant efficiency gain of the hybrid method compared to a classical CPU implementation.
international conference on parallel processing | 2009
Peter Benner; Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón
We investigate the numerical computation of the matrix sign function of large-scale dense matrices. This is a common task in various application areas. The main computational work in Newtons iteration for the matrix sign function consits of matrix inversion. Therefore, we investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) and Gauss-Jordan eliminations. The target architecture is a current general-purpose multi-core processor connected to a graphics processor. Parallelism is extracted in both processors by linking sequential versions of the codes with multithreaded implementations of BLAS. Our results on a system with two Intel Quad-Core processors and an NVIDIA Tesla C1060 illustrate the performance and scalability attained by the codes on this system.
Concurrency and Computation: Practice and Experience | 2013
Peter Benner; Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón
In this paper, we tackle the inversion of large‐scale dense matrices via conventional matrix factorizations (LU, Cholesky, and LDLT) and the Gauss–Jordan method on hybrid platforms consisting of a multicore CPU and a many‐core graphics processor (GPU). Specifically, we introduce the different matrix inversion algorithms by using a unified framework based on the notation from the FLAME project; we develop hybrid implementations for those matrix operations underlying the algorithms, alternative to those in existing libraries for single GPU systems; and we perform an extensive experimental study on a platform equipped with state‐of‐the‐art general‐purpose architectures from Intel (Santa Clara, CA, USA) and a ‘Fermi’ GPU from NVIDIA (Santa Clara, CA, USA) that exposes the efficiency of the different inversion approaches. Our study and experimental results show the simplicity and performance advantage of the Gauss–Jordan elimination‐based inversion methods and the difficulties associated with the symmetric indefinite case. Copyright
The Journal of Supercomputing | 2011
Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón
We study the use of massively parallel architectures for computing a matrix inverse. Two different algorithms are reviewed, the traditional approach based on Gaussian elimination and the Gauss–Jordan elimination alternative, and several high performance implementations are presented and evaluated. The target architecture is a current general-purpose multicore processor (CPU) connected to a graphics processor (GPU). Numerical experiments show the efficiency attained by the proposed implementations and how the computation of large-scale inverses, which only a few years ago would have required a distributed-memory cluster, take only a few minutes on a hybrid architecture formed by a multicore CPU and a GPU.
parallel, distributed and network-based processing | 2011
Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón
Inversion of large-scale matrices appears in a few scientific applications like model reduction or optimal control. Matrix inversion requires an important computational effort and, therefore, the application of high performance computing techniques and architectures for matrices with dimension in the order of thousands. Following the recent up rise of graphics processors (GPUs), we present and evaluate high performance codes for matrix inversion, based on Gauss-Jordan elimination with partial pivoting, which off-load the main computational kernels to one or more GPUs while performing fine-grain operations on the general-purpose processor. The target architecture consists of a multi-core processor connected to several GPUs. Parallelism is extracted from parallel implementations of BLAS and from the concurrent execution of operations in the available computational units. Numerical experiments on a system with two Intel Quad Core processors and four NVIDIA c1060 GPUs illustrate the efficiency and the scalability of the different implementations, which deliver over 1.2×1012 floating point operations per second.
IEEE Geoscience and Remote Sensing Letters | 2011
Alfredo Remón; S. F. Sánchez; Abel Paz; Enrique S. Quintana-Ortí; Antonio Plaza
In this letter, we discuss the use of multicore processors in the acceleration of endmember extraction algorithms for hyperspectral image unmixing. Specifically, we develop computationally efficient versions of two popular fully automatic endmember extraction algorithms: orthogonal subspace projection and N-FINDR. Our experimental results, based on the analysis of hyperspectral data collected by the National Aeronautics and Space Administration Jet Propulsion Laboratorys Airborne Visible InfraRed Imaging Spectrometer, indicate that endmember extraction algorithms can significantly benefit from these inexpensive high-performance computing platforms, which can offer real-time response with some programming effort.
EURASIP Journal on Advances in Signal Processing | 2013
Alfredo Remón; S. F. Sánchez; Sergio Bernabé; Enrique S. Quintana-Ortí; Antonio Plaza
Hyperspectral imaging is a growing area in remote sensing in which an imaging spectrometer collects hundreds of images (at different wavelength channels) for the same area on the surface of the Earth. Hyperspectral images are extremely high-dimensional, and require on-board processing algorithms able to satisfy near real-time constraints in applications such as wildland fire monitoring, mapping of oil spills and chemical contamination, etc. One of the most widely used techniques for analyzing hyperspectral images is spectral unmixing, which allows for sub-pixel data characterization. This is particularly important since the available spatial resolution in hyperspectral images is typically of several meters, and therefore it is reasonable to assume that several spectrally pure substances (called endmembers in hyperspectral imaging terminology) can be found within each imaged pixel. There have been several efforts towards the efficient implementation of hyperspectral unmixing algorithms on architectures susceptible of being mounted onboard imaging instruments, including field programmable gate arrays (FPGAs) and graphics processing units (GPUs). While FPGAs are generally difficult to program, GPUs are difficult to adapt to onboard processing requirements in spaceborne missions due to its extremely high power consumption. In turn, with the increase in the number of cores, multi-core platforms have recently emerged as an easier to program platform compared to FPGAs, and also more tolerable radiation and power consumption requirements. However, a detailed assessment of the performance versus energy consumption of these architectures has not been conducted as of yet in the field of hyperspectral imaging, in which it is particularly important to achieve processing results in real-time. In this article, we provide a thoughtful perspective on this relevant issue and further analyze the performance versus energy consumption ratio of different processing chains for spectral unmixing when implemented on multi-core platforms.
international conference on algorithms and architectures for parallel processing | 2013
Peter Benner; Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón
We investigate the effect that commonoptimization techniques for general-purpose multicore processors (either manual, compiler-driven, in the form of highly tuned libraries, or orchestrated by a runtime) exert on the performance-power-energy trade-off of dense linear algebra routines. The algorithm employed for this analysis is matrix inversion via Gauss-Jordan elimination, but the results from the evaluation carry beyond this particular operation and are representative for a variety of dense linear algebra computations, especially, dense matrix factorizations.
parallel computing | 2010
Peter Benner; Pablo Ezzatti; Daniel Kressner; Enrique S. Quintana-Ortí; Alfredo Remón
Model order reduction of a dynamical linear time-invariant system appears in many applications from science and engineering. Numerically reliable SVD-based methods for this task require in general
Algorithms | 2013
Peter Benner; Pablo Ezzatti; Hermann Mena; Enrique S. Quintana-Ortí; Alfredo Remón
\mathcal{O}(n^3)