Is this you? Create Your Porfile

John May

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John May is active.

Explore More

Publication

Featured researches published by John May.

international parallel and distributed processing symposium | 2001

MPX: Software for multiplexing hardware performance counters in multithreaded programs

John May

Hardware performance counters are CPU registers that count data loads and stores, cache misses, and other events. Counter data can help programmers understand software performance. Although CPUs typically have multiple counters, each can monitor only one type of event at a time, and some counters can monitor only certain events. Therefore, some CPUs cannot concurrently monitor interesting combinations of events. Software multiplexing partly overcomes this limitation by using time sharing to monitor multiple events on one counter. However counter multiplexing is harder to implement for multithreaded programs than for single-threaded ones because of certain difficulties in managing the length of the time slices. This paper describes a software library called MPX that overcomes these difficulties. MPX allows applications to gather hardware counter data concurrently for any combination of countable events. MPX data are typically within a few percent of counts recorded without multiplexing.

conference on high performance computing (supercomputing) | 2005

Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool

Karen L. Karavanic; John May; Kathryn Mohror; Brian Miller; Kevin A. Huck; Rashawn L. Knapp; Brian Pugh

PerfTrack is a data store and interface for managing performance data from large-scale parallel applications. Data collected in different locations and formats can be compared and viewed in a single performance analysis session. The underlying data store used in PerfTrack is implemented with a database management system (DBMS). PerfTrack includes interfaces to the data store and scripts for automatically collecting data describing each experiment, such as build and platform details. We have implemented a prototype of PerfTrack that can use Oracle or PostgreSQL for the data store. We demonstrate the prototypes functionality with three case studies: one is a comparative study of an ASC purple benchmark on high-end Linux and AIX platforms; the second is a parameter study conducted at Lawrence Livermore National Laboratory (LLNL) on two high end platforms, a 128 node cluster of IBM Power 4 processors and BlueGene/L; the third demonstrates incorporating performance data from the Paradyn Parallel Performance Tool into an existing PerfTrack data store.

petascale data storage workshop | 2008

Pianola: A script-based I/O benchmark

John May

Script-based I/O benchmarks record the I/O behavior of applications by using an instrumentation library to trace I/O events and their timing. A replay engine can then reproduce these events from the script in the absence of the original application. This type of benchmark reproduces real-world I/O workloads without the need to distribute, build, or run complex applications. However, faithfully recreating the I/O behavior of the original application requires careful design in both the instrumentation library and the replay engine. This paper presents the Pianola script-based benchmarking system, which includes an accurate and unobtrusive instrumentation system and a simple-to-use replay engine, along with some additional utility programs to manage the creation and replay of scripts. We show that for some sample applications, Pianola reproduces the qualitative features of the I/O behavior. Moreover, the overall replay time and the cumulative read and write times are usually within 10% of the values measured for the original applications.

Concurrency and Computation: Practice and Experience | 2005

Evaluating high‐performance computers

Jeffrey S. Vetter; Bronis R. de Supinski; Lynn Kissel; John May; Sheila Vaidya

Comparisons of high‐performance computers based on their peak floating point performance are common but seldom useful when comparing performance on real workloads. Factors that influence sustained performance extend beyond a systems floating‐point units, and real applications exercise machines in complex and diverse ways. Even when it is possible to compare systems based on their performance, other considerations affect which machine is best for a given organization. These include the cost, the facilities requirements (power, floorspace, etc.), the programming model, the existing code base, and so on. This paper describes some of the important measures for evaluating high‐performance computers. We present data for many of these metrics based on our experience at Lawrence Livermore National Laboratory (LLNL), and we compare them with published information on the Earth Simulator. We argue that evaluating systems involves far more than comparing benchmarks and acquisition costs. We show that evaluating systems often involves complex choices among a variety of factors that influence the value of a supercomputer to an organization, and that the high‐end computing community should view cost/performance comparisons of different architectures with skepticism. Published in 2005 by John Wiley & Sons, Ltd.

Other Information: PBD: 4 Jan 2000 | 2000

Final Report: Programming Models for Shared Memory Clusters

John May; B. de Supinski; B. Pudliner; S. Taylor; S. Baden

Most large parallel computers now built use a hybrid architecture called a shared memory cluster. In this design, a computer consists of several nodes connected by an interconnection network. Each node contains a pool of memory and multiple processors that share direct access to it. Because shared memory clusters combine architectural features of shared memory computers and distributed memory computers, they support several different styles of parallel programming or programming models. (Further information on the design of these systems and their programming models appears in Section 2.) The purpose of this project was to investigate the programming models available on these systems and to answer three questions: (1) How easy to use are the different programming models in real applications? (2) How do the hardware and system software on different computers affect the performance of these programming models? (3) What are the performance characteristics of different programming models for typical LLNL applications on various shared memory clusters?

Archive | 2000