Is this you? Create Your Porfile

Douglas W. Doerfler

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Douglas W. Doerfler is active.

Explore More

Publication

Featured researches published by Douglas W. Doerfler.

ieee international conference on high performance computing, data, and analytics | 2016

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

Douglas W. Doerfler; Jack Deslippe; Samuel Williams; Leonid Oliker; Brandon Cook; Thorsten Kurth; Mathieu Lobet; Tareq M. Malas; Jean-Luc Vay; Henri Vincenti

The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes. We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world.

ieee international conference on high performance computing data and analytics | 2016

Evaluating and optimizing the NERSC workload on Knights Landing

Taylor Barnes; Brandon Cook; Jack Deslippe; Douglas W. Doerfler; Brian Friesen; Yun He; Thorsten Kurth; Tuomas Koskela; Mathieu Lobet; Tareq M. Malas; Leonid Oliker; Andrey Ovsyannikov; Abhinav Sarje; Jean-Luc Vay; Henri Vincenti; Samuel Williams; Pierre Carrier; Nathan Wichmann; Marcus Wagner; Paul R. C. Kent; Christopher Kerr; John M. Dennis

NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture differences between the Xeon-Phi and traditional Xeon processors. We summarize the status of the applications and describe the greater optimization strategy that has formed.

arXiv: High Energy Physics - Lattice | 2016

MILC staggered conjugate gradient performance on Intel KNL

Ruizi Li; Carleton DeTar; Douglas W. Doerfler; Steven Gottlieb; Ashish Jha; Dhiraj D. Kalamkar; D. Toussaint

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. We compare performance of an MPI+OpenMP baseline version of the MILC code with a version incorporating the QPhiX staggered CG solver, for both one-node and multi-node runs.

Archive | 2007

Supercomputer and Cluster Performance Modeling and Analysis Efforts: 2004-2006

Judith E. Sturtevant; Anand Ganti; Harold Edward Meyer; Joel O. Stevenson; Robert E. Benner; Susan Phelps Goudy; Douglas W. Doerfler; Stefan P. Domino; Mark A. Taylor; Robert Joseph Malins; Ryan T. Scott; Daniel Wayne Barnette; Mahesh Rajan; James Alfred Ang; Amalia Rebecca Black; Thomas William Laub; Brian Claude Franke

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandias engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandias capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandias supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2017

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Tyler Allen; Christopher S. Daley; Douglas W. Doerfler; Brian Austin; Nicholas J. Wright

Manycore architectures are an energy-efficient step towards exascale computing within a constrained power budget. The Intel Knights Landing (KNL) manycore chip is a specific example of this and has seen early adoption by a number of HPC facilities. It is therefore important to understand the performance and energy usage characteristics of KNL. In this paper, we evaluate the performance and energy efficiency of KNL in contrast to the Xeon (Haswell) architecture for applications representative of the workload of users at NERSC. We consider the optimal MPI/OpenMP configuration of each application and use the results to characterize KNL in contrast to Haswell. As well as traditional DDR memory, KNL contains MCDRAM and we also evaluate its efficacy. Our results show that, averaged over our benchmarks, KNL is 1.84\(\times \) more energy efficient than Haswell and has 1.27\(\times \) greater performance.

Concurrency and Computation: Practice and Experience | 2018

Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

Douglas W. Doerfler; Brian Austin; Brandon Cook; Jack Deslippe; Krishna Kandalla; Peter Mendygral

There are many potential issues associated with deploying the Intel Xeon PhiTM (code named Knights Landing [KNL]) manycore processor in a large‐scale supercomputer. One in particular is the ability to fully utilize the high‐speed communications network, given that the serial performance of a Xeon PhiTM core is a fraction of a Xeon®core. In this paper, we take a look at the trade‐offs associated with allocating enough cores to fully utilize the Aries high‐speed network versus cores dedicated to computation, eg, the trade‐off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL, such as internode optimizations. We also evaluate one‐sided programming models such as Unified Parallel C. We quantify the impact of the above trade‐offs and features using a suite of National Energy Research Scientific Computing Center applications.

ieee international conference on high performance computing, data, and analytics | 2017

Analyzing Performance of Selected NESAP Applications on the Cori HPC System

Thorsten Kurth; William Arndt; Taylor A. Barnes; Brandon Cook; Jack Deslippe; Douglas W. Doerfler; Brian Friesen; Yun He; Tuomas Koskela; Mathieu Lobet; Tareq M. Malas; Leonid Oliker; Andrey Ovsyannikov; Samuel Williams; Woo-Sun Yang; Zhengji Zhao

NERSC has partnered with over 20 representative application developer teams to evaluate and optimize their workloads on the Intel® Xeon Phi™Knights Landing processor. In this paper, we present a summary of this two year effort and will present the lessons we learned in that process. We analyze the overall performance improvements of these codes quantifying impacts of both Xeon Phi™architectural features as well as code optimization on application performance. We show that the architectural advantage, i.e. the average speedup of optimized code on KNL vs. optimized code on Haswell is about 1.1\(\times \). The average speedup obtained through application optimization, i.e. comparing optimized vs. original codes on KNL, is about 5\(\times \).

Archive | 2012

Summary of Work for ASC L2 Milestone 4465: Characterize the Role of the Mini-Application in Predicting Key Performance Characteristics of Real Applications

Sandia Report; Richard Frederick Barrett; Paul Stewart Crozier; Douglas W. Doerfler; Simon D. Hammond; Michael A. Heroux; Paul Lin; Heidi K. Thornquist; Timothy Guy Trucano

Archive | 2011