Is this you? Create Your Porfile

Christopher S. Daley

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher S. Daley is active.

Explore More

Publication

Featured researches published by Christopher S. Daley.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2013

Analysis of Cray XC30 Performance Using Trinity-NERSC-8 Benchmarks and Comparison with Cray XE6 and IBM BG/Q

Matthew J. Cordery; Brian Austin; H. J. Wassermann; Christopher S. Daley; Nicholas J. Wright; Simon D. Hammond; Douglas Doerfler

In this paper, we examine the performance of a suite of applications on three different architectures: Edison, a Cray XC30 with Intel Ivy Bridge processors; Hopper and Cielo, both Cray XE6’s with AMD Magny–Cours processors; and Mira, an IBM BlueGene/Q with PowerPC A2 processors. The applications chosen are a subset of the applications used in a joint procurement effort between Lawrence Berkeley National Laboratory, Los Alamos National Laboratory and Sandia National Laboratories. Strong scaling results are presented, using both MPI-only and MPI+OpenMP execution models.

ieee international conference on high performance computing data and analytics | 2014

Evolution of FLASH, a multi-physics scientific simulation code for high-performance computing

Anshu Dubey; Katie Antypas; Alan Clark Calder; Christopher S. Daley; Bruce Fryxell; Brad Gallagher; Donald Q. Lamb; Dongwook Lee; Kevin Olson; Lynn B. Reid; Paul Rich; Paul M. Ricker; Katherine Riley; R. Rosner; Andrew R. Siegel; Noel T. Taylor; Klaus Weide; Francis Xavier Timmes; Natasha Vladimirova; John A. ZuHone

The FLASH code has evolved into a modular and extensible scientific simulation software system over the decade of its existence. During this time it has been cumulatively used by over a thousand researchers to investigate problems in astrophysics, cosmology, and in some areas of basic physics, such as turbulence. Recently, many new capabilities have been added to the code to enable it to simulate problems in high-energy density physics. Enhancements to these capabilities continue, along with enhancements enabling simulations of problems in fluid-structure interactions. The code started its life as an amalgamation of already existing software packages and sections of codes developed independently by various participating members of the team for other purposes. The code has evolved through a mixture of incremental and deep infrastructural changes. In the process, it has undergone four major revisions, three of which involved a significant architectural advancement. Along the way, a software process evolved that addresses the issues of code verification, maintainability, and support for the expanding user base. The software process also resolves the conflicts arising out of being in development and production simultaneously with multiple research projects, and between performance and portability. This paper describes the process of code evolution with emphasis on the design decisions and software management policies that have been instrumental in the success of the code. The paper also makes the case for a symbiotic relationship between scientific research and good software engineering of the simulation software.

Concurrency and Computation: Practice and Experience | 2012

Optimization of multigrid based elliptic solver for large scale simulations in the FLASH code

Christopher S. Daley; Marcos Vanella; Anshu Dubey; Klaus Weide; Elias Balaras

FLASH is a multiphysics multiscale adaptive mesh refinement (AMR) code originally designed for simulation of reactive flows often found in Astrophysics. With its wide user base and flexible applications configuration capability, FLASH has a dual task of maintaining scalability and portability in all its solvers. The scalability of fully explicit solvers in the code is tied very closely to that of the underlying mesh. Others such as the Poisson solver based on a multigrid method have more complex scaling behavior. Multigrid methods suffer from processor starvation and dominating communication costs at coarser grids with increase in the number of processors. In this paper, we propose a combination of uniform grid mesh with AMR mesh, and the merger of two different sets of solvers to overcome the scalability limitation of the Poisson solver in FLASH. The principal challenge in the proposed merger is the efficiency of the communication algorithm to map the mesh back and forth between uniform grid and AMR. We present two different parallel mapping algorithms and also discuss results from performance studies of the two implementations. Copyright

Software - Practice and Experience | 2015

Ongoing verification of a multiphysics community code: FLASH

Anshu Dubey; Klaus Weide; Dongwook Lee; John Bachan; Christopher S. Daley; Samuel Olofin; Noel T. Taylor; Paul Rich; Lynn B. Reid

When developing a complex, multi‐authored code, daily testing on multiple platforms and under a variety of conditions is essential. It is therefore necessary to have a regression test suite that is easily administered and configured, as well as a way to easily view and interpret the test suite results. We describe the methodology for verification of FLASH, a highly capable multiphysics scientific application code with a wide user base. The methodology uses a combination of unit and regression tests and an in‐house testing software that is optimized for operation under limited resources. Although our practical implementations do not always comply with theoretical regression‐testing research, our methodology provides a comprehensive verification of a large scientific code under resource constraints.Copyright

ieee international conference on high performance computing data and analytics | 2013

Pragmatic optimizations for better scientific utilization of large supercomputers

Anshu Dubey; Alan Clark Calder; Christopher S. Daley; Robert Fisher; C. Graziani; George C. Jordan; Donald Q. Lamb; Lynn B. Reid; Dean M. Townsley; Klaus Weide

Advances in modeling and algorithms, combined with growth in computing resources, have enabled simulations of multiphysics–multiscale phenomena that can greatly enhance our scientific understanding. However, on currently available high-performance computing (HPC) resources, maximizing the scientific outcome of simulations requires many trade-offs. In this paper we describe our experiences in running simulations of the explosion phase of Type Ia supernovae on the largest available platforms. The simulations use FLASH, a modular, adaptive mesh, parallel simulation code with a wide user base. The simulations use multiple physics components: hydrodynamics, gravity, a sub-grid flame model, a three-stage burning model, and a degenerate equation of state. They also use Lagrangian tracer particles, which are then post-processed to determine the nucleosynthetic yields. We describe the simulation planning process, and the algorithmic optimizations and trade-offs that were found to be necessary. Several of the optimizations and trade-offs were made during the course of the simulations as our understanding of the challenges evolved, or when simulations went into previously unexplored physical regimes. We also briefly outline the anticipated challenges of, and our preparations for, the next-generation computing platforms.

ieee international conference on high performance computing data and analytics | 2016

Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors

Jongsoo Park; Mikhail Smelyanskiy; Karthikeyan Vaidyanathan; Alexander Heinecke; Dhiraj D. Kalamkar; Md. Mostofa Ali Patwary; Vadim O. Pirogov; Pradeep Dubey; Xing Liu; Carlos Rosales; Cyril Mazauric; Christopher S. Daley

This paper presents optimizations in a high-performance conjugate gradient benchmark (HPCG) for multi-core Intel® Xeon® processors and many-core Xeon Phi™ coprocessors. Without careful optimization, the HPCG benchmark under-utilizes the compute resources available in modern processors due to its low arithmetic intensity and challenges in parallelizing the Gauss–Seidel smoother (GS). Our optimized implementation fuses GS with sparse matrix vector multiplication (SpMV) to address the low arithmetic intensity, overcoming the performance otherwise bound by memory bandwidth. This fusion optimization is progressively more effective in newer generation Xeon processors, demonstrating the usefulness of their larger caches for sparse matrix operations: Sandy Bridge, Ivy Bridge, and Haswell processors achieve 93%, 99%, and 103%, respectively, of the ideal performance with a constraint that matrices are streamed from memory. Our implementation also parallelizes GS using fine-grain level-scheduling, a method that has been believed not to scale with many cores. Our GS implementation scales with 60 cores in Xeon Phi coprocessors, for the finest level of the multi-grid pre-conditioner. At the coarser levels, we address the limited parallelism using block multi-color re-ordering, achieving 21 GFLOPS with one Xeon Phi coprocessor. These optimizations distinguish our HPCG implementation from the others that stream most of the data from main memory and rely on multi-color re-ordering for parallelism. Our optimized implementation has been evaluated in clusters with various configurations, and we find that low-diameter high-radix network topologies such as Dragonfly realize high parallelization efficiencies because of fast all-reduce collectives. In addition, we demonstrate that our optimizations not only benefit the HPCG dataset, which is based on a structured 3D grid, but also a wide range of unstructured matrices.

Computing in Science and Engineering | 2015

Lessons Learned from Optimizing Science Kernels for Intel's "Knights Corner"' Architecture

Jack Deslippe; Brian Austin; Christopher S. Daley; Woo-Sun Yang

Optimizing the codes and kernels representing the National Energy Research Scientific Computing Centers workload on the Knights Corner architecture helped pave the path for NERSCs newest machine. Cori will use the next generation of Intel Xeon Phi processors: Knights Landing.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2017

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Tyler Allen; Christopher S. Daley; Douglas W. Doerfler; Brian Austin; Nicholas J. Wright

Manycore architectures are an energy-efficient step towards exascale computing within a constrained power budget. The Intel Knights Landing (KNL) manycore chip is a specific example of this and has seen early adoption by a number of HPC facilities. It is therefore important to understand the performance and energy usage characteristics of KNL. In this paper, we evaluate the performance and energy efficiency of KNL in contrast to the Xeon (Haswell) architecture for applications representative of the workload of users at NERSC. We consider the optimal MPI/OpenMP configuration of each application and use the results to characterize KNL in contrast to Haswell. As well as traditional DDR memory, KNL contains MCDRAM and we also evaluate its efficacy. Our results show that, averaged over our benchmarks, KNL is 1.84\(\times \) more energy efficient than Haswell and has 1.27\(\times \) greater performance.

symposium on computer architecture and high performance computing | 2013

Parallel Algorithms for Using Lagrangian Markers in Immersed Boundary Method with Adaptive Mesh Refinement in FLASH

Prateeti Mohapatra; Anshu Dubey; Christopher S. Daley; Marcos Vanella; Elias Balaras

Computational fluid dynamics (CFD) are at the forefront of computational mechanics in requiring large-scale computational resources associated with high performance computing (HPC). Many flows of practical interest also include moving and deforming boundaries. High fidelity computations of fluid-structure interactions (FSI) are amongst the most challenging problems in computational mechanics. Additionally, many FSI applications have different resolution requirements in different parts of the domain and therefore requirement adaptive mesh refinement (AMR) for computational efficiency. FLASH is a well established AMR code with an existing Lagrangian framework which could be augmented and exploited to implement an immersed boundary method for simulating fluid-structure interactions atop an existing infrastructure. This paper describes the augmentations to the Lagrangian framework, and the new parallel algorithms added to the FLASH infrastructure that enabled the implementation of immersed boundary method in FLASH. The paper also presents scaling behavior and performance analysis of the implementations.

Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems | 2017

Performance analysis of emerging data analytics and HPC workloads

Christopher S. Daley; Prabhat; Sudip S. Dosanjh; Nicholas J. Wright

Supercomputers are increasingly being used to run a data analytics workload in addition to a traditional simulation science workload. This mixed workload must be rigorously characterized to ensure that appropriately balanced machines are deployed. In this paper we analyze a suite of applications representing the simulation science and data workload at the NERSC supercomputing center. We show how time is spent in application compute, library compute, communication and I/O, and present application performance on both the Intel Xeon and Intel Xeon-Phi partitions of the Cori supercomputer. We find commonality in the libraries used, I/O motifs and methods of parallelism, and obtain similar node-to-node performance for the base application configurations. We demonstrate that features of the Intel Xeon-Phi node architecture and a Burst Buffer can improve application performance, providing evidence that an exascale-era energy-efficient platform can support a mixed workload.

Explore More