David M. Fernández | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David M. Fernández is active.

Explore More

Publication

Featured researches published by David M. Fernández.

IEEE Transactions on Magnetics | 2010

Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units

Maryam Mehri Dehnavi; David M. Fernández; Dennis D. Giannacopoulos

A wide class of finite-element (FE) electromagnetic applications requires computing very large sparse matrix vector multiplications (SMVM). Due to the sparsity pattern and size of the matrices, solvers can run relatively slowly. The rapid evolution of graphic processing units (GPUs) in performance, architecture, and programmability make them very attractive platforms for accelerating computationally intensive kernels such as SMVM. This work presents a new algorithm to accelerate the performance of the SMVM kernel on graphic processing units.

ieee conference on electromagnetic field computation | 2010

Enhancing the performance of conjugate gradient solvers on graphic processing units

Maryam Mehri Dehnavi; David M. Fernández; Dennis D. Giannacopoulos

A study of the fundamental obstacles to accelerate the preconditioned conjugate gradient (PCG) method on modern graphic processing units (GPUs) is presented and several techniques are proposed to enhance its performance over previous work independent of the GPU generation and the matrix sparsity pattern. The proposed enhancements increase the performance of PCG up to 23 times compared to vector optimized PCG results on modern CPUs and up to 3.4 times compared to previous GPU results.

Computer Physics Communications | 2008

FPGA architecture and implementation of sparse matrix–vector multiplication for the finite element method

Yousef El-Kurdi; David M. Fernández; Evgueni Souleimanov; Dennis D. Giannacopoulos; Warren J. Gross

The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix–vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.

IEEE Transactions on Parallel and Distributed Systems | 2013

Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units

Maryam Mehri Dehnavi; David M. Fernández; Jean-Luc Gaudiot; Dennis D. Giannacopoulos

Accelerating numerical algorithms for solving sparse linear systems on parallel architectures has attracted the attention of many researchers due to their applicability to many engineering and scientific problems. The solution of sparse systems often dominates the overall execution time of such problems and is mainly solved by iterative methods. Preconditioners are used to accelerate the convergence rate of these solvers and reduce the total execution time. Sparse approximate inverse (SAI) preconditioners are a popular class of preconditioners designed to improve the condition number of large sparse matrices. We propose a GPU accelerated SAI preconditioning technique called GSAI, which parallelizes the computation of this preconditioner on NVIDIA graphic cards. The preconditioner is then used to enhance the convergence rate of the BiConjugate Gradient Stabilized (BiCGStab) iterative solver on the GPU. The SAI preconditioner is generated on average 28 and 23 times faster on the NVIDIA GTX480 and TESLA M2070 graphic cards, respectively, compared to ParaSails (a popular implementation of SAI preconditioners on CPU) single processor/core results. The proposed GSAI technique computes the SAI preconditioner in approximately the same time as ParaSails generates the same preconditioner on 16 AMD Opteron 252 processors.

IEEE Transactions on Magnetics | 2012

Alternate Parallel Processing Approach for FEM

David M. Fernández; Maryam Mehri Dehnavi; Warren J. Gross; Dennis D. Giannacopoulos

In this work we present a new alternate way to formulate the finite element method (FEM) for parallel processing based on the solution of single mesh elements called FEM-SES. The key idea is to decouple the solution of a single element from that of the whole mesh, thus exposing parallelism at the element level. Individual element solutions are then superimposed node-wise using a weighted sum over concurrent nodes. A classic 2-D electrostatic problem is used to validate the proposed method obtaining accurate results. Results show that the number of iterations of the proposed FEM-SES method scale sublinearly with the number of unknowns. Two generations of CUDA enabled NVIDIA GPUs were used to implement the FEM-SES method and the execution times were compared to the classic FEM showing important performance benefits.

international workshop on computer architecture for machine perception | 2007

Modelling and Implementation of a Novel SPR Biointerface for Time-Effective Detection of Sepsis Biomarkers

Jahyun J. Koo; David M. Fernández; Ashraf Haddad; Warren J. Gross

High-performance reconfigurable computers (HPRCs) consisting of CPUs with application-specific FPGA accelerators traditionally use a low-level hardware-description language such as VHDL or Verilog to program the FP-GAs. The complexity of hardware design methodologies for FPGAs requires specialist engineering knowledge and presents a significant barrier to entry for scientific users with only a software background. Recently, a number of High-Level Languages (HLLs) for programming FPGAs have emerged that aim to lower this barrier and abstract away hardware-dependent details. This paper presents the results of a study on implementing hardware accelerators using the Mitrion-C HLL. The implementation of two floating-point scientific kernels: dense matrix-vector multiplication (DMVM) and the computation of spherical boundary conditions in molecular dynamics (SB) are described. We describe optimizations that are essential for taking advantage of both the features of the HLL and the underlying HPRC hardware and libraries. Scaling of the algorithms to multiple FPGAs is also investigated. With four FPGAs, 80 times speedup over an Itanium 2 CPU was achieved for the DMVM, while a 26 times speedup was achieved for SB.

IEEE Transactions on Magnetics | 2010

Multicore Acceleration of CG Algorithms Using Blocked-Pipeline-Matching Techniques

David M. Fernández; Dennis D. Giannacopoulos; Warren J. Gross

To realize the acceleration potential of multicore computing environments computational electromagnetics researchers must address parallel programming paradigms early in application development. We present a new blocked-pipeline-matched sparse representation and show speedup results for the conjugate gradient method by parallelizing the sparse matrix-vector multiplication kernel on multicore systems for a set of finite element matrices to demonstrate the potential of this approach. Performance of up to 8.2 GFLOPS was obtained for the proposed vectorized format using four Intel-cores, 17 × more than the nonvectorized version.

ieee conference on electromagnetic field computation | 2009

Efficient Multicore Sparse Matrix-Vector Multiplication for FE Electromagnetics

David M. Fernández; Dennis D. Giannacopoulos; Warren J. Gross

Multicore systems are rapidly becoming a dominant industry trend for accelerating electromagnetics computations, driving researchers to address parallel programming paradigms early in application development. We present a new sparse representation and a two level partitioning scheme for efficient sparse matrix-vector multiplication on multicore systems, and show results for a set of finite element matrices that demonstrate its potential.

IEEE Transactions on Magnetics | 2016

Acceleration of the Finite-Element Gaussian Belief Propagation Solver Using Minimum Residual Techniques

Yousef El-Kurdi; David M. Fernández; Warren J. Gross; Dennis D. Giannacopoulos

The finite-element Gaussian belief propagation (FGaBP) method, introduced recently, provides a powerful alternative to the conventional finite-element method solvers to efficiently utilize high-performance computing platforms. In this paper, we accelerate the FGaBP convergence by combining it with two methods based on residual minimization techniques, namely, the flexible generalized minimum residual and the iterant recombination method. The numerical results show considerable reductions in the total number of operations compared with the stand-alone FGaBP method, while maintaining the scalability features of FGaBP.

IEEE Transactions on Magnetics | 2017

Solving Finite-Element Time-Domain Problems With GaBP

David M. Fernández; Ali Akbarzadeh-Sharbaf; Dennis D. Giannacopoulos

In this paper, a new finite element Gaussian belief propagation (FGaBP) method is presented for time-domain applications. The unconditionally stable Newmark time-stepping scheme is combined with FGaBP for this purpose. As shown empirically, the method converges for increasing time step sizes without losing stability. The combined FGaBP-time stepping is able to retain the parallel scalability from FGaBP as in previous work. In addition, this paper also shows that lossy material properties can be easily supported by the method with minimal changes to its formulation.

Explore More