Jeffery A. Kuehn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeffery A. Kuehn is active.

Explore More

Publication

Featured researches published by Jeffery A. Kuehn.

ieee international symposium on workload characterization | 2006

Characterization of Scientific Workloads on Systems with Multi-Core Processors

Sadaf R. Alam; Richard Frederick Barrett; Jeffery A. Kuehn; Philip C. Roth; Jeffrey S. Vetter

Multi-core processors are planned for virtually all next-generation HPC systems. In a preliminary evaluation of AMD Opteron Dual-Core processor systems, we investigated the scaling behavior of a set of micro-benchmarks, kernels, and applications. In addition, we evaluated a number of processor affinity techniques for managing memory placement on these multi-core systems. We discovered that an appropriate selection of MPI task and memory placement schemes can result in over 25% performance improvement for key scientific calculations. We collected detailed performance data for several large-scale scientific applications. Analyses of the application performance results confirmed our micro-benchmark and scaling results

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model | 2010

Introducing OpenSHMEM: SHMEM for the PGAS community

Barbara M. Chapman; Tony Curtis; Swaroop Pophale; Stephen W. Poole; Jeffery A. Kuehn; Chuck Koelbel; Lauren Smith

The OpenSHMEM community would like to announce a new effort to standardize SHMEM, a communications library that uses one-sided communication and utilizes a partitioned global address space. OpenSHMEM is an effort to bring together a variety of SHMEM and SHMEM-like implementations into an open standard using a community-driven model. By creating an open-source specification and reference implementation of OpenSHMEM, there will be a wider availability of a PGAS library model on current and future architectures. In addition, the availability of an OpenSHMEM model will enable the development of performance and validation tools. We propose an OpenSHMEM specification to help tie together a number of divergent implementations of SHMEM that are currently available. To support an existing and growing user community, we will develop the OpenSHMEM web presence, including a community wiki and training material, and face-to-face interaction, including workshops and conference participation.

ieee international conference on high performance computing data and analytics | 2008

Early evaluation of IBM BlueGene/P

Sadaf R. Alam; Richard Frederick Barrett; Michael H Bast; Mark R. Fahey; Jeffery A. Kuehn; Collin McCurdy; James H. Rogers; Philip C. Roth; Ramanan Sankaran; Jeffrey S. Vetter; Patrick H. Worley; Weikuan Yu

BlueGene/P (BG/P) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BG/L). BG/P is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier network. The design is intended to provide a highly scalable, physically dense system with relatively low power requirements per flop. In this paper, we report on our examination of BG/P, presented in the context of a set of important scientific applications, and as compared to other major large scale supercomputers in use today. Our investigation confirms that BG/P has good scalability with an expected lower performance per processor when compared to the Cray XT4s Opteron. We also find that BG/P uses very low power per floating point operation for certain kernels, yet it has less of a power advantage when considering science driven metrics for mission applications.

conference on high performance computing (supercomputing) | 2007

Cray XT4: an early evaluation for petascale scientific simulation

Sadaf R. Alam; Jeffery A. Kuehn; Richard Frederick Barrett; Jeffrey M. Larkin; Mark R. Fahey; Ramanan Sankaran; Patrick H. Worley

The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Crays custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.

ieee international conference on high performance computing data and analytics | 2008

An Evaluation of the Oak Ridge National Laboratory Cray XT3

Sadaf R. Alam; Richard Frederick Barrett; Mark R. Fahey; Jeffery A. Kuehn; O. E. Bronson Messer; Richard Tran Mills; Philip C. Roth; Jeffrey S. Vetter; Patrick H. Worley

In 2005, Oak Ridge National Laboratory (ORNL) received delivery of a 5294 processor Cray XT3. The XT3 is Crays third-generation massively parallel processing system. The ORNL system uses a single-processor node built around the AMD Opteron and uses a custom chip—called SeaStar—for interprocessor communication. The system uses a lightweight operating system called Catamount on its compute nodes. This paper provides a performance evaluation of the Cray XT3, including measurements for micro-benchmark, kernel, and application benchmarks. In particular, we provide performance results for strategic Department of Energy applications areas including climate, biology, astrophysics, combustion, and fusion. Our results, on up to 4096 processors, demonstrate that the Cray XT3 provides competitive processor performance, high interconnect bandwidth, and high parallel efficiency on a diverse application workload, typical in the DOE Office of Science.

ieee conference on mass storage systems and technologies | 2011

A technique for moving large data sets over high-performance long distance networks

Bradley W. Settlemyer; Jonathan D. Dobson; Stephen W. Hodson; Jeffery A. Kuehn; Stephen W. Poole; Thomas M. Ruwart

In this paper we look at the performance characteristics of three tools used to move large data sets over dedicated long distance networking infrastructure. Although performance studies of wide area networks have been a frequent topic of interest, performance analyses have tended to focus on network latency characteristics and peak throughput using network traffic generators. In this study we instead perform an end-to-end long distance networking analysis that includes reading large data sets from a source file system and committing the data to a remote destination file system. An evaluation of end-to-end data movement is also an evaluation of the system configurations employed and the tools used to move the data. For this paper, we have built several storage platforms and connected them with a high performance long distance network configuration. We use these systems to analyze the capabilities of three data movement tools: BBcp, GridFTP, and XDD. Our studies demonstrate that existing data movement tools do not provide efficient performance levels or exercise the storage devices in their highest performance modes.

Combustion Theory and Modelling | 2008

Detonation initiation on the microsecond time scale: DDTs

D. R. Kassoy; Jeffery A. Kuehn; Matthew Nabity; John F. Clarke

Spatially resolved, thermal power deposition of limited duration into a finite volume of reactive gas is the initiator for a deflagration-to-detonation transition (DDT) on the microsecond time scale. The reactive Euler equations with one-step Arrhenius kinetics are used to derive a novel formula for the gas velocity supporting the lead shock in a detonation. Numerical solutions of the reactive Euler equations are used to describe the detailed sequence of reactive gasdynamic transients leading to a planar detonation, characterised by unusually large power output, far from the power deposition location. Results are presented for deposition into a region isolated from the planar boundary of the reactive gas as well as for that adjacent to the boundary. The quantitative dependences of DDT evolution on the location and magnitude of thermal power deposition and activation energy are identified.

international conference on performance engineering | 2012

Towards efficient supercomputing: searching for the right efficiency metric

Chung-Hsing Hsu; Jeffery A. Kuehn; Stephen W. Poole

Efficiency in supercomputing has traditionally focused on execution time. In early 2000s, the concept of total cost of ownership was re-introduced, with the introduction of efficiency measure to include aspects such as energy and space. Yet the supercomputing community has never agreed upon a metric that can cover these aspects completely and also provide a fair basis for comparison. This paper examines the metrics that have been proposed in the past decade, and proposes a vector-valued metric for efficient supercomputing. Using this metric, the paper presents a study of where the supercomputing industry has been and where it stands today with respect to efficient supercomputing.

international conference on cluster computing | 2010

Confidence: Analyzing performance with empirical probabilities

Bradley W. Settlemyer; Stephen W. Hodson; Jeffery A. Kuehn; Stephen W. Poole

Variability in the performance of shared system components is a major obstacle in analyzing the effective throughput of leadership class computers. Shared file systems and networks are serious impediments to achieving repeatable application performance on HPC systems. In particular, performance analysts are likely to be interested in quantifying differences between average-case behavior, worst-case behavior, and standard deviation for shared system components. Typical descriptions of these statistics assume a normal distribution; however, in one-sided and multi-modal performance distributions, summary statistics are often misleading. In this paper we describe Confidence, a tool for analyzing the full spectrum of performance for a benchmarking code. By including all of the experimental outcomes in the analysis without discarding any measurements, Confidence enables a novel analysis of benchmark performance.

Archive | 2012

SystemBurn: Principles of Design and Operation, Release 2.0

Jeffery A. Kuehn; Stephen W. Poole; Stephen W. Hodson; Josh Lothian; Jonathan D. Dobson; David B Reister; Nicholas R Lewkow; Steven R Glandon; Jacob T Peek

As high performance computing technology progresses toward the progressively more extreme scales required to address critical computational problems of both national and global interest, power and cooling for these extreme scale systems is becoming a growing concern. A standardized methodology for testing system requirements under maximal system load and validating system environmental capability to meet those requirements is critical to maintaining system stability and minimizing power and cooling risks for high end data centers. Moreover, accurate testing permits the high end data center to avoid issues of under- or over-provisioning power and cooling capacity saving resources and mitigating hazards. Previous approaches to such testing have employed an ad hoc collection of tools, which have been anecdotally perceived to produce a heavy system load. In this report, we present SystemBurn, a software tool engineered to allow a system user to methodically create a maximal system load on large scale systems for the purposes of testing and validation.

Explore More