Matthew G. Knepley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew G. Knepley is active.

Explore More

Publication

Featured researches published by Matthew G. Knepley.

Computer Physics Communications | 2011

Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns

Rio Yokota; Jaydeep P. Bardhan; Matthew G. Knepley; Lorena A. Barba; Tsuyoshi Hamada

Abstract We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method ( fmm ) in conjunction with a boundary element method ( bem ) formulation of the continuum electrostatic model, as well as the bibee approximation to bem . The hardware acceleration is achieved through graphics processors, gpu s. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein–drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 gpu s. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 gpu s. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 gpu s. We are currently adapting our bem software to solve the linearized Poisson–Boltzmann equation for dilute ionic solutions, and it is also designed to be flexible enough to be extended for a variety of integral equation problems, ranging from Poisson problems to Helmholtz problems in electromagnetics and acoustics to high Reynolds number flow.

SIAM Journal on Scientific Computing | 2005

Optimizing the Evaluation of Finite Element Matrices

Robert C. Kirby; Matthew G. Knepley; Anders Logg; L. Ridgway Scott

Assembling stiffness matrices represents a significant cost in many finite element computations. We address the question of optimizing the evaluation of these matrices. By finding redundant computations, we are able to significantly reduce the cost of building local stiffness matrices for the Laplace operator and for the trilinear form for Navier--Stokes operators. For the Laplace operator in two space dimensions, we have developed a heuristic graph algorithm that searches for such redundancies and generates code for computing the local stiffness matrices. Up to cubics, we are able to build the stiffness matrix on any triangle in less than one multiply-add pair per entry. Up to sixth degree, we can do it in less than about two pairs. Preliminary low-degree results for Poisson and Navier-Stokes operators in three dimensions are also promising.

International Journal for Numerical Methods in Engineering | 2011

PetFMM—A dynamically load‐balancing parallel fast multipole library

Felipe A. Cruz; Matthew G. Knepley; Lorena A. Barba

Fast algorithms for the computation of N-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this latter class belongs the well-known fast multipole method (FMM), which offers (N) complexity. The FMM is a complex algorithm, and the programming difficulty associated with it has arguably diminished its impact, being a barrier for adoption. This paper presents an extensible parallel library for N-body interactions utilizing the FMM algorithm. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based N-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by N vortex particles in two dimensions, ample verification and testing of the library was performed. Strong scaling results are presented with 10 million particles on up to 256 processors, including both speedup and parallel efficiency. The largest problem size that has been run with the PetFMM library at this point was 64 million particles in 64 processors. The library is currently able to achieve over 85% parallel efficiency for 64 processes. The performance study, computational model, and application demonstrations presented in this paper are limited to 2D. However, the software architecture was designed to make an extension of this work to 3D straightforward, as the framework is templated over the dimension. The software library is open source under the PETSc license, even less restrictive than the BSD license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications. Copyright

Archive | 2013

Preliminary Implementation of PETSc Using GPUs

Victor Minden; Barry F. Smith; Matthew G. Knepley

PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.

Siam Review | 2015

Composing Scalable Nonlinear Algebraic Solvers

Peter R. Brune; Matthew G. Knepley; Barry F. Smith; Xuemin Tu

Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods.

international symposium on parallel and distributed computing | 2012

Composable Linear Solvers for Multiphysics

Jed Brown; Matthew G. Knepley; David May; Lois Curfman McInnes; Barry F. Smith

The Portable, Extensible Toolkit for Scientific computing (PETSc), which focuses on the scalable solution of problems based on partial differential equations, now incorporates new components that allow full compos ability of solvers for multiphysics and multilevel methods. Through strong encapsulation, we achieve arbitrary, dynamic composition of hierarchical methods for coupled problems and allow customization of all components in composite solvers. For example, we support block decompositions with nested multigrid as well as multigrid on the fully coupled system with block-decomposed smoothers. This paper provides an overview of PETScs new multiphysics capabilities, which have been used in parallel applications including lithosphere dynamics, subduction and mantle convection, ice sheet dynamics, subsurface reactive flow, fusion, mesoscale materials modeling, and power networks.

Scientific Programming | 2009

Mesh algorithms for PDE with Sieve I: Mesh distribution

Matthew G. Knepley; Dmitry Karpeev

We have developed a new programming framework, called Sieve, to support parallel numerical partial differential equation(s) (PDE) algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or arrows, the conceptual first-class objects represented in the containers. Further, generic algorithms acting on this arrow container are systematically used to provide natural geometric operations on the topology and also, through duality, on the data. Finally, coverings and duality are used to encode not only individual meshes, but all types of hierarchies underlying PDE data structures, including multigrid and mesh partitions. In order to demonstrate the usefulness of the framework, we show how the mesh partition data can be represented and manipulated using the same fundamental mechanisms used to represent meshes. We present the complete description of an algorithm to encode a mesh partition and then distribute a mesh, which is independent of the mesh dimension, element shape, or embedding. Moreover, data associated with the mesh can be similarly distributed with exactly the same algorithm. The use of a high level of abstraction within the Sieve leads to several benefits in terms of code reuse, simplicity, and extensibility. We discuss these benefits and compare our approach to other existing mesh libraries.

ACM Transactions on Mathematical Software | 2013

Finite Element Integration on GPUs

Matthew G. Knepley; Andy R. Terrel

We present a novel finite element integration method for low-order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators on an NVIDIA GTX285, which has a nominal single precision peak flop rate of 1 TF/s and bandwidth of 159 GB/s, corresponding to a bandwidth limited peak of 40 GF/s.

Physical Review D | 2000

Search for disoriented chiral condensate at the Fermilab Tevatron

Travis C. Brooks; M. E. Convery; W. L. Davis; K. Del Signore; Thomas L. Jenkins; Erik Kangas; Matthew G. Knepley; K. L. Kowalski; C. Taylor; S. H. Oh; W.D. Walker; Patrick L. Colestock; Barbara E. Hanna; M. Martens; J. Streets; Robin Ball; H.R. Gustafson; L. W. Jones; Michael J. Longo; James D. Bjorken; A. Abashian; Nelson Morgan; Claude A. Pruneau

We present results from MiniMax (Fermilab T-864), a small test/experiment at the Fermilab Tevatron designed to search for the production of a disoriented chiral condensate (DCC) in p-p(bar sign) collisions at {radical}(s)=1.8 TeV in the forward direction, {approx}3.4<{eta}<{approx}4.2. Data, consisting of 1.3x10{sup 6} events, are analyzed using the robust observables developed in an earlier paper. The results are consistent with generic, binomial-distribution partition of pions into charged and neutral species. Limits on DCC production in various models are presented. (c) 2000 The American Physical Society.

Journal of Chemical Physics | 2011

Mathematical analysis of the boundary-integral based electrostatics estimation approximation for molecular solvation: exact results for spherical inclusions.

Jaydeep P. Bardhan; Matthew G. Knepley

We analyze the mathematically rigorous BIBEE (boundary-integral based electrostatics estimation) approximation of the mixed-dielectric continuum model of molecular electrostatics, using the analytically solvable case of a spherical solute containing an arbitrary charge distribution. Our analysis, which builds on Kirkwoods solution using spherical harmonics, clarifies important aspects of the approximation and its relationship to generalized Born models. First, our results suggest a new perspective for analyzing fast electrostatic models: the separation of variables between material properties (the dielectric constants) and geometry (the solute dielectric boundary and charge distribution). Second, we find that the eigenfunctions of the reaction-potential operator are exactly preserved in the BIBEE model for the sphere, which supports the use of this approximation for analyzing charge-charge interactions in molecular binding. Third, a comparison of BIBEE to the recent GBε theory suggests a modified BIBEE model capable of predicting electrostatic solvation free energies to within 4% of a full numerical Poisson calculation. This modified model leads to a projection-framework understanding of BIBEE and suggests opportunities for future improvements.

Explore More