Ester M. Garzón | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ester M. Garzón is active.

Explore More

Publication

Featured researches published by Ester M. Garzón.

computer and information technology | 2010

Improving the Performance of the Sparse Matrix Vector Product with GPUs

Francisco Vázquez; Gloria Ortega; José-Jesús Fernández; Ester M. Garzón

Sparse matrices are involved in linear systems, eigensystems and partial differential equations from a wide spectrum of scientific and engineering disciplines. Hence, sparse matrix vector product (SpMV) is considered as key operation in engineering and scientific computing. For these applications the optimization of the sparse matrix vector product (SpMV) is very relevant. However, the irregular computation involved in SpMV prevents the optimum exploitation of computational architectures when the sparse matrices are very large. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for GPUs have already appeared on the scene. This work proposes and evaluates new implementations of SpMV for GPUs called ELLR-T. They are based on the format ELLPACK-R, which allows storage of the sparse matrix in a regular manner. A comparative evaluation against a variety of storage formats previously proposed has been carried out based on a representative set of test matrices. The results show that: (1) the SpMV is highly accelerated with GPUs; (2) the performance strongly depends on the specific pattern of the matrix; and (3) the implementations ELLR-T achieve higher overall performance. Consequently, the new implementations of SpMV, ELLR-T, described in this paper can help to exploit the GPUs, because, they achieve high performance and they can be easily joined in the engineering and scientific computing.

Concurrency and Computation: Practice and Experience | 2011

A new approach for sparse matrix vector product on NVIDIA GPUs

Francisco Vázquez; José-Jesús Fernández; Ester M. Garzón

The sparse matrix vector product (SpMV) is a key operation in engineering and scientific computing and, hence, it has been subjected to intense research for a long time. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim of maximizing the performance. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for NVIDIA GPUs have already appeared on the scene. This work proposes and evaluates a new implementation of SpMV for NVIDIA GPUs based on a new format, ELLPACK‐R, that allows storage of the sparse matrix in a regular manner. A comparative evaluation against a variety of storage formats previously proposed has been carried out based on a representative set of test matrices. The results show that, although the performance strongly depends on the specific pattern of the matrix, the implementation based on ELLPACK‐R achieves higher overall performance. Moreover, a comparison with standard state‐of‐the‐art superscalar processors reveals that significant speedup factors are achieved with GPUs. Copyright

Journal of Structural Biology | 2010

A matrix approach to tomographic reconstruction and its implementation on GPUs

Francisco Vázquez; Ester M. Garzón; José-Jesús Fernández

Electron tomography allows elucidation of the molecular architecture of complex biological specimens. Weighted backprojection (WBP) is the standard reconstruction method in the field. In this work, three-dimensional reconstruction with WBP is addressed from a matrix perspective by formulating the problem as a set of sparse matrix-vector products, with the matrix being constant and shared by all the products. This matrix approach allows efficient implementations of reconstruction algorithms. Although WBP is computationally simple, the resolution requirements may turn the tomographic reconstruction into a computationally intensive problem. Parallel systems have traditionally been used to cope with such demands. Recently, graphics processor units (GPUs) have emerged as powerful platforms for scientific computing and they are getting increasing interest. In combination with GPU computing, the matrix approach for WBP exhibits a significant acceleration factor compared to the standard implementation.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2013

Analysis and Optimizations of Global and Local Versions of the RX Algorithm for Anomaly Detection in Hyperspectral Data

Jose M. Molero; Ester M. Garzón; Inmaculada García; Antonio Plaza

Anomaly detection is an important task for hyperspectral data exploitation. A standard approach for anomaly detection in the literature is the method developed by Reed and Xiaoli, also called RX algorithm. A variation of this algorithm consists of applying the same concept to a local sliding window centered around each image pixel. The computational cost is very high for RX algorithm and it strongly increases for its local versions. However, current advances in high performance computing help to reduce the run-time of these algorithms. So, for the standard RX, it is possible to achieve a processing time similar to the data acquisition time and to increase the practical interest for its local versions. In this paper, we discuss several optimizations which exploit different forms of acceleration for these algorithms. First, we explain how the calculation of the correlation matrix and its inverse can be accelerated through optimization techniques based on the properties of these particular matrices and the efficient use of linear algebra libraries. Second, we describe parallel implementations of the RX algorithm, optimized for multicore platforms. These are well-known, inexpensive and widely available high performance computing platforms. The ability to detect anomalies of the global and local versions of RX is explored using a wide set of experiments, using both synthetic and real data, which are used for comparing the optimized versions of the global and local RX algorithms in terms of anomaly detection accuracy and computational efficiency. The synthetic images have been generated under different noise conditions and anomalous features. The two real scenes used in the experiments are a hyperspectral data set collected by NASAs Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) system over the World Trade Center (WTC) in New York, five days after the terrorist attacks, and another data set collected by the HYperspectral Digital Image Collection Experiment (HYDICE). Experimental results indicate that the proposed optimizations can significantly improve the performance of the considered algorithms without reducing their anomaly detection accuracy.

parallel computing | 2012

Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach

Francisco Vázquez; José Jesús Fernández; Ester M. Garzón

A wide range of applications in engineering and scientific computing are involved in the acceleration of the sparse matrix vector product (SpMV). Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. SpMV implementations for GPUs have already appeared on the scene. This work is focused on the ELLR-T algorithm to compute SpMV on GPU architecture, its performance is strongly dependent on the optimum selection of two parameters. Therefore, taking account that the memory operations dominate the performance of ELLR-T, an analytical model is proposed in order to obtain the auto-tuning of ELLR-T for particular combinations of sparse matrix and GPU architecture. The evaluation results with a representative set of test matrices show that the average performance achieved by auto-tuned ELLR-T by means of the proposed model is near to the optimum. A comparative analysis of ELLR-T against a variety of previous proposals shows that ELLR-T with the estimated configuration reaches the best performance on GPU architecture for the representative set of test matrices.

Ultramicroscopy | 2012

Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction.

J.I. Agulleiro; Francisco Vázquez; Ester M. Garzón; José-Jesús Fernández

Modern computers are equipped with powerful computing engines like multicore processors and GPUs. The 3DEM community has rapidly adapted to this scenario and many software packages now make use of high performance computing techniques to exploit these devices. However, the implementations thus far are purely focused on either GPUs or CPUs. This work presents a hybrid approach that collaboratively combines the GPUs and CPUs available in a computer and applies it to the problem of tomographic reconstruction. Proper orchestration of workload in such a heterogeneous system is an issue. Here we use an on-demand strategy whereby the computing devices request a new piece of work to do when idle. Our hybrid approach thus takes advantage of the whole computing power available in modern computers and further reduces the processing time. This CPU+GPU co-processing can be readily extended to other image processing tasks in 3DEM.

Journal of Structural Biology | 2010

Vectorization with SIMD extensions speeds up reconstruction in electron tomography

J.I. Agulleiro; Ester M. Garzón; José-Jesús Fernández

Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processors SIMD extensions in the field of 3D electron microscopy.

Journal of Applied Remote Sensing | 2012

Anomaly detection based on a parallel kernel RX algorithm for multicore platforms

Jose M. Molero; Ester M. Garzón; Inmaculada García; Antonio Plaza

Anomaly detection is an important task for hyperspectral data exploitation. A standard approach for anomaly detection in the literature is the method developed by Reed and Yu, also called RX algorithm. It implements the Mahalanobis distance, which has been widely used in hyperspectral imaging applications. A variation of this algorithm, known as kernel RX (KRX), consists of applying the same concept to a sliding window centered around each image pixel. KRX is computationally very expensive because, for every image pixel, a covariance matrix and its inverse has to be calculated. We develop an efficient implementation of the kernel RX algorithm. Our proposed approach makes use of linear algebra libraries and further develops a parallel implementation optimized for multi-core platforms, which is a well known, inexpensive and widely available high performance computing technology. Experimental results for two hyperspectral data sets are provided. The first one was collected by NASA’s airborne visible infra-red imaging spectrometer (AVIRIS) system over the World Trade Center (WTC) in New York, five days after the terrorist attacks, and the second one was collected by the hyperspectral digital image collection experiment (HYDICE). Our anomaly detection accuracy, evaluated using receiver operating characteristics (ROC) curves, indicates that KRX can significantly outperform the classic RX while achieving close to linear speedup in state-of-the-art multi-core platforms.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2014

Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors

Jose M. Molero; Ester M. Garzón; Inmaculada García; Enrique S. Quintana-Ortí; Antonio Plaza

Anomaly detection is an important task for hyperspectral data exploitation. Although many algorithms have been developed for this purpose in recent years, due to the large dimensionality of hyperspectral image data, fast anomaly detection remains a challenging task. In this work, we exploit the computational power of commodity graphics processing units (GPUs) and multicore processors to obtain implementations of a well-known anomaly detection algorithm developed by Reed and Xiaoli (RX algorithm), and a local (LRX) variant, which basically consists in applying the same concept to a local sliding window centered around each image pixel. LRX has been shown to be more accurate to detect small anomalies but computationally more expensive than RX. Our interest is focused on improving the computational aspects, not only through efficient parallel implementations, but also by analyzing the mathematical issues of the method and adopting computationally inexpensive solvers. Futhermore, we also assess the energy consumption of the newly developed parallel implementations, which is very important in practice. Our optimizations (based on software and hardware techniques) result in a significant reduction of execution time and energy consumption, which are keys to increase the practical interest of the considered algorithms. Indeed, for RX, the runtime obtained is less than the data acquisition time when real hyperspectral images are used. Our experimental results also indicate that the proposed optimizations and the parallelization techniques can significantly improve the general performance of the RX and LRX algorithms while retaining their anomaly detection accuracy.

The Computer Journal | 2011

Matrix Implementation of Simultaneous Iterative Reconstruction Technique (SIRT) on GPUs

Francisco Vázquez; Ester M. Garzón; José Jesús Fernández

Electron tomography (ET) is an important technique in biosciences that is providing new insights into the cellular ultrastructure. Iterative reconstruction methods have been shown to be robust against the noise and limited-tilt range conditions present in ET. Nevertheless, these methods are not extensively used due to their computational demands. Instead, the simpler method weighted backprojection (WBP) remains prevalent. Recently, we have demonstrated that a matrix approach to WBP allows a significant reduction in processing time both on central processing units and on graphics processing units (GPUs). In this work, we extend that matrix approach to one of the most common iterative methods in ET, simultaneous iterative reconstruction technique (SIRT). We show that it is possible to implement this method targeted at GPU directly, using sparse algebra. We also analyse this approach on different GPU platforms and confirm that these implementations exhibit high performance. This may thus help to the widespread use of SIRT.

Explore More