Is this you? Create Your Porfile

Brian T. N. Gunney

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian T. N. Gunney is active.

Explore More

Publication

Featured researches published by Brian T. N. Gunney.

Journal of Parallel and Distributed Computing | 2006

Parallel clustering algorithms for structured AMR

Brian T. N. Gunney; Andrew M. Wissink; David Hysom

We compare several different parallel implementation approaches for the clustering operations performed during adaptive meshing operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos, which is commonly used in many SAMR applications. The baseline for comparison is a single program, multiple data extension of the original algorithm that works well for up to O(102) processors. Our goal is a clustering algorithm for machines of up to O(105) processors, such as the 64K-processor IBM BlueGene/L (BG/L) system. We first present an algorithm that avoids unneeded communications of the baseline approach, improving the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large-scale parallel computer systems, including a 16K-processor BG/L system.

Journal of Computer and System Sciences | 2008

Performance evaluation of supercomputers using HPCC and IMB Benchmarks

Subhash Saini; Robert Ciotti; Brian T. N. Gunney; Thomas E. Spelce; Alice Koniges; Don Dossa; Panagiotis Adamidis; Rolf Rabenseifner; Sunil R. Tiyyagura; Matthias S. Mueller

The HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers-SGI Altix BX2, Cray X1, Cray Opteron Cluster, Dell Xeon Cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC Benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks results to study the performance of 11 MPI communication functions on these systems.

international parallel and distributed processing symposium | 2006

Performance evaluation of supercomputers using HPCC and IMB benchmarks

Subhash Saini; Robert Ciotti; Brian T. N. Gunney; Thomas E. Spelce; Alice Koniges; Don Dossa; Panagiotis Adamidis; Rolf Rabenseifner; Sunil R. Tiyyagura; Matthias S. Mueller; Rod Fatoohi

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems

ieee international conference on high performance computing data and analytics | 2012

Novel views of performance data to analyze large-scale adaptive applications

Abhinav Bhatele; Todd Gamblin; Katherine E. Isaacs; Brian T. N. Gunney; Martin Schulz; Peer-Timo Bremer; Bernd Hamann

Performance analysis of parallel scientific codes is becoming increasingly difficult due to the rapidly growing complexity of applications and architectures. Existing tools fall short in providing intuitive views that facilitate the process of performance debugging and tuning. In this paper, we extend recent ideas of projecting and visualizing performance data for faster, more intuitive analysis of applications. We collect detailed per-level and per-phase measurements for a dynamically load-balanced, structured AMR library and project per-core data collected in the hardware domain on to the applications communication topology. We show how our projections and visualizations lead to a rapid diagnosis of and mitigation strategy for a previously elusive scaling bottleneck in the library that is hard to detect using conventional tools. Our new insights have resulted in a 22% performance improvement for a 65,536-core run of the AMR library on an IBM Blue Gene/P system.

Journal of Physics: Conference Series | 2010

ALE-AMR: A New 3D Multi-Physics Code for Modeling Laser/Target Effects

Alice Koniges; Aaron Fisher; R W Anderson; David C. Eder; D. S. Bailey; Brian T. N. Gunney; P. Wang; B. Brown; K. Fisher; F. Hansen; B. R. Maddox; David J. Benson; Marc A. Meyers; A. Geille

We have developed a new 3D multi-physics multi-material code, ALE- AMR, for modeling laser/target effects including debris/shrapnel generation. The code combines Arbitrary Lagrangian Eulerian (ALE) hydrodynamics with Adaptive Mesh Refinement (AMR) to connect the continuum to microstructural regimes. The code is unique in its ability to model hot radiating plasmas and cold fragmenting solids. New numerical techniques were developed for many of the physics packages to work efficiency on a dynamically moving and adapting mesh. A flexible strength/failure framework allows for pluggable material models. Material history arrays are used to store persistent data required by the material models, for instance, the level of accumulated damage or the evolving yield stress in J2 plasticity models. We model ductile metals as well as brittle materials such as Si, Be, and B4C. We use interface reconstruction based on volume fractions of the material components within mixed zones and reconstruct interfaces as needed. This interface reconstruction model is also used for void coalescence and fragmentation. The AMR framework allows for hierarchical material modeling (HMM) with different material models at different levels of refinement. Laser rays are propagated through a virtual composite mesh consisting of the finest resolution representation of the modeled space. A new 2 nd order accurate diffusion solver has been implemented for the thermal conduction and radiation transport packages. The code is validated using laser and x-ray driven spall experiments in the US and France. We present an overview of the code and simulation results.

Presented at: IFSA Conference, Kobe, Japan, Sep 09 - Sep 14, 2007 | 2008

Interface reconstruction in two- and three-dimensional arbitrary Lagrangian-Eulerian adaptive mesh refinement simulations

Nathan D. Masters; R W Anderson; N S Elliott; Aaron Fisher; Brian T. N. Gunney; Alice Koniges

Modeling of high power laser and ignition facilities requires new techniques because of the higher energies and higher operational costs. We report on the development and application of a new interface reconstruction algorithm for chamber modeling code that combines ALE (Arbitrary Lagrangian Eulerian) techniques with AMR (Adaptive Mesh Refinement). The code is used for the simulation of complex target elements in the National Ignition Facility (NIF) and other similar facilities. The interface reconstruction scheme is required to adequately describe the debris/shrapnel (including fragments or droplets) resulting from energized materials that could affect optics or diagnostic sensors. Traditional ICF modeling codes that choose to implement ALE + AMR techniques will also benefit from this new scheme. The ALE formulation requires material interfaces (including those of generated particles or droplets) to be tracked. We present the interface reconstruction scheme developed for NIFs ALE-AMR and discuss how it is affected by adaptive mesh refinement and the ALE mesh. Results of the code are shown for NIF and OMEGA target configurations.

Journal of Parallel and Distributed Computing | 2016

Advances in patch-based adaptive mesh refinement scalability

Brian T. N. Gunney; R W Anderson

Patch-based structured adaptive mesh refinement (SAMR) is widely used for high-resolution simulations. Combined with modern supercomputers, it could provide simulations of unprecedented size and resolution. A persistent challenge for this combination has been managing dynamically adaptive meshes on more and more MPI tasks. The distributed mesh management scheme in SAMRAI has made some progress SAMR scalability, but early algorithms still had trouble scaling past the regime of 10 5 MPI tasks. This work provides two critical SAMR regridding algorithms, which are integrated into that scheme to ensure efficiency of the whole. The clustering algorithm is an extension of the tile-clustering approach, making it more flexible and efficient in both clustering and parallelism. The partitioner is a new algorithm designed to prevent the network congestion experienced by its predecessor. We evaluated performance using weak- and strong-scaling benchmarks designed to be difficult for dynamic adaptivity. Results show good scaling on up to 1.5M cores and 2M MPI tasks. Detailed timing diagnostics suggest scaling would continue well past that. We developed two key SAMR regridding components that scaled individually and integrated scalably.The cascade partitioner took 10% of the regrid time and yielded loads within 10% of ideal.The tile clustering step took about 2% of regrid time and reduced cluster counts by a factor of 38.Our benchmarks, set up to be challenging for dynamic adaptivity, scaled to 2M MPI tasks.Smooth, well-behaved timer trends indicate higher scaling is possible.

Presented at: Inertial Fusion Sciences and Applications, Kobe, Japan, Sep 09 - Sep 14, 2007 | 2008

Hierarchical material models for fragmentation modeling in NIF-ALE-AMR

Aaron Fisher; Nathan D. Masters; P Dixit; David J. Benson; Alice Koniges; R W Anderson; Brian T. N. Gunney; P. Wang; R. Becker

Fragmentation is a fundamental process that naturally spans micro to macroscopic scales. Recent advances in algorithms, computer simulations, and hardware enable us to connect the continuum to microstructural regimes in a real simulation through a heterogeneous multiscale mathematical model. We apply this model to the problem of predicting how targets in the NIF chamber dismantle, so that optics and diagnostics can be protected from damage. The mechanics of the initial material fracture depend on the microscopic grain structure. In order to effectively simulate the fragmentation, this process must be modeled at the subgrain level with computationally expensive crystal plasticity models. However, there are not enough computational resources to model the entire NIF target at this microscopic scale. In order to accomplish these calculations, a hierarchical material model (HMM) is being developed. The HMM will allow fine-scale modeling of the initial fragmentation using computationally expensive crystal plasticity, while the elements at the mesoscale can use polycrystal models, and the macroscopic elements use analytical flow stress models. The HMM framework is built upon an adaptive mesh refinement (AMR) capability. We present progress in implementing the HMM in the NIF-ALE-AMR code. Additionally, we present test simulations relevant to NIF targets.

Archive | 2015

Characterization of Proxy Application Performance on Advanced Architectures. UMT2013, MCB, AMG2013

Louis H. Howell; Brian T. N. Gunney; Abhinav Bhatele

Three codes were tested at LLNL as part of a Tri-Lab effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. Teams from Sandia and Los Alamos tested proxy apps of their own. The focus in this report is on the LLNL codes UMT2013, MCB, and AMG2013. We present weak and strong MPI scaling results and studies of OpenMP efficiency on a large BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. The hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while information from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Results from three more speculative tests are also included: one that exploits NVRAM as extended memory, one that studies performance under a power bound, and one that illustrates the effects of changing the torus network mapping on BG/Q.

Archive | 2014

Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

David A. Beckingsale; Wayne Gaudin; Richard D. Hornung; Brian T. N. Gunney; Todd Gamblin; J. A. Herdman; Stephen A. Jarvis

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

Explore More