Barry F. Smith | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barry F. Smith is active.

Explore More

Publication

Featured researches published by Barry F. Smith.

Modern software tools for scientific computing | 1997

Efficient management of parallelism in object-oriented numerical software libraries

Satish Balay; William Gropp; Lois Curfman McInnes; Barry F. Smith

Parallel numerical software based on the message passing model is enormously complicated. This paper introduces a set of techniques to manage the complexity, while maintaining high efficiency and ease of use. The PETSc 2.0 package uses object-oriented programming to conceal the details of the message passing, without concealing the parallelism, in a high-quality set of numerical software libraries. In fact, the programming model used by PETSc is also the most appropriate for NUMA shared-memory machines, since they require the same careful attention to memory hierarchies as do distributed-memory machines. Thus, the concepts discussed are appropriate for all scalable computing systems. The PETSc libraries provide many of the data structures and numerical kernels required for the scalable solution of PDEs, offering performance portability.

parallel computing | 2001

High-performacne parallel implicit CFD

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.

ieee international conference on high performance computing data and analytics | 2013

Multiphysics simulations: Challenges and opportunities

David E. Keyes; Lois Curfman McInnes; Carol S. Woodward; William Gropp; Eric Myra; Michael Pernice; John B. Bell; Jed Brown; Alain Clo; Jeffrey M. Connors; Emil M. Constantinescu; Donald Estep; Kate Evans; Charbel Farhat; Ammar Hakim; Glenn E. Hammond; Glen A. Hansen; Judith C. Hill; Tobin Isaac; Kirk E. Jordan; Dinesh K. Kaushik; Efthimios Kaxiras; Alice Koniges; Kihwan Lee; Aaron Lott; Qiming Lu; John Harold Magerlein; Reed M. Maxwell; Michael McCourt; Miriam Mehl

We consider multiphysics applications from algorithmic and architectural perspectives, where “algorithmic” includes both mathematical analysis and computational complexity, and “architectural” includes both software and hardware environments. Many diverse multiphysics applications can be reduced, en route to their computational simulation, to a common algebraic coupling paradigm. Mathematical analysis of multiphysics coupling in this form is not always practical for realistic applications, but model problems representative of applications discussed herein can provide insight. A variety of software frameworks for multiphysics applications have been constructed and refined within disciplinary communities and executed on leading-edge computer systems. We examine several of these, expose some commonalities among them, and attempt to extrapolate best practices to future systems. From our study, we summarize challenges and forecast opportunities.

SIAM Journal on Scientific Computing | 1999

An Energy-minimizing Interpolation for Robust Multigrid Methods

Winglok Wan; Tony F. Chan; Barry F. Smith

We propose a robust interpolation for multigrid based on the concepts of energy minimization and approximation. It can handle PDE coefficients of various types on structured or unstructured grids under one framework. The formulation is general; it can be applied to any dimension. We demonstrate numerically the effectiveness of the multigrid method in two dimensions by applying it to a discontinuous coefficient problem, an oscillatory coefficient problem, and an anisotropic problem. Empirically, the convergence rate is independent of the coefficients of the underlying PDE, in addition to being independent of the mesh size. The proposed method is primarily designed for second-order elliptic PDEs, with possible extensions to other classes of problems such as integral equations.

Parallel Computational Fluid Dynamics 1999#R##N#Towards Teraflops, Optimization and Novel Formulations | 2000

Towards Realistic Performance Bounds for Implicit CFD Codes

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

Traditionally, numerical analysts have evaluated the performance of algorithms by counting the number of floating-point operations. On the algorithmic side, tremendous strides have been made; many algorithms now require only a few floating-point operations per mesh point. However, on the hardware side, memory system performance is improving at a rate that is much slower than that of processor performance. The result is a mismatch in capabilities: algorithm design has minimized the work per data item, but hardware design depends on executing an increasing large number of operations per data item. The importance of memory bandwidth to the overall performance is suggested by the available results. These show that the STREAM results are much better indicator of performance than the peak numbers. The chapter illustrates the performance limitations caused by insufficient available memory bandwidth with a discussion of sparse matrix-vector multiply, a critical operation in many iterative methods used in implicit CFD codes. It also focuses on the per-processor performance of compute nodes used in parallel computers. Experiments have shown that PETSc-FUN3D has good scalability. In fact, since good per-processor performance reduces the fraction of time spent computing as opposed to communication, achieving the best per-processor performance is a critical prerequisite to demonstrating uninflated parallel performance.

conference on high performance computing (supercomputing) | 2000

Performance Modeling and Tuning of an Unstructured Mesh CFD Application

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and Beowulf clusters. The code achieves a respectable level of performance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memory-centric perspective (in contrast to traditional flop-orientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters, and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrix-vector product operation.

Archive | 2013

Preliminary Implementation of PETSc Using GPUs

Victor Minden; Barry F. Smith; Matthew G. Knepley

PETSc is a scalable solver library for the solution of algebraic equations arising from the discretization of partial differential equations and related problems. PETSc is organized as a class library with classes for vectors, matrices, Krylov methods, preconditioners, nonlinear solvers, and differential equation integrators. A new subclass of the vector class has been introduced that performs its operations on NVIDIA GPU processors. In addition, a new sparse matrix subclass that performs matrix-vector products on the GPU was introduced. The Krylov methods, nonlinear solvers, and integrators in PETSc run unchanged in parallel using these new subclasses. These can be used transparently from existing PETSc application codes in C, C++, Fortran, or Python. The implementation is done with the Thrust and Cusp C++ packages from NVIDIA.

Numerische Mathematik | 1991

A domain decomposition algorithm for elliptic problems in three dimensions

Barry F. Smith

SummaryMost domain decomposition algorithms have been developed for problems in two dimensions. One reason for this is the difficulty in devising a satisfactory, easy-to-implement, robust method of providing global communication of information for problems in three dimensions. Several methods that work well in two dimension do not perform satisfactorily in three dimensions.A new iterative substructuring algorithm for three dimensions is proposed. It is shown that the condition number of the resulting preconditioned problem is bounded independently of the number of subdomains and that the growth is quadratic in the logarithm of the number of degrees of freedom associated with a subdomain. The condition number is also bounded independently of the jumps in the coefficients of the differential equation between subdomains. The new algorithm also has more potential parallelism than the iterative substructuring methods previously proposed for problems in three dimensions.

parallel computing | 2002

Parallel components for PDEs and optimization: some issues and experiences

Boyana Norris; Satish Balay; Steven J. Benson; Lori A. Freitag; Paul D. Hovland; Lois Curfman McInnes; Barry F. Smith

High-performance simulations in computational science often involve the combined software contributions of multidisciplinary teams of scientists, engineers, mathematicians, and computer scientists. One goal of component-based software engineering in large-scale scientific simulations is to help manage such complexity by enabling better interoperability among codes developed by different groups. This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization. We consider several motivating applications involving partial differential equations and unconstrained minimization to demonstrate this approach and evaluate performance.

conference on automated deduction | 1984

The Linked Inference Principle, II: The User's Viewpoint

Larry Wos; Robert Veroff; Barry F. Smith; William McCune

In the field of automated reasoning, the search continues for useful representations of information, for powerful inference rules, for effective canonlcallzatlon procedures, and for intelligent strategies. The practical objective of this search is, of course, to produce ever more powerful automated reasoning programs. In this paper, we show how the power of such programs can be sharply increased by employing inference rules called linked inference rules. In particular, we focus on linked UR-resolutlon, a generalization of standard UR-resolution [2], and discuss ongoing experiments that permit comparison of the two inference rules. The intention is to present the results of those experiments at the Seventh Conference on Automated Deduction. Much of the treatment of linked inference rules given in this paper is from the user’s viewpoint, with certain abstract considerations reserved for Section 3.

Explore More