Mike Fagan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mike Fagan is active.

Explore More

Publication

Featured researches published by Mike Fagan.

Concurrency and Computation: Practice and Experience | 2009

HPCTOOLKIT: tools for performance analysis of optimized parallel programs

Laksono Adhianto; S. Banerjee; Mike Fagan; Mark W. Krentel; Gabriel Marin; John M. Mellor-Crummey; Nathan R. Tallent

HPCTOOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation of application performance for both sequential and parallel programs. HPCTOOLKIT can pinpoint and quantify scalability bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Recently, new capabilities were added to HPCTOOLKIT for collecting call path profiles for fully optimized codes without any compiler support, pinpointing and quantifying bottlenecks in multithreaded programs, exploring performance information and source code using a new user interface, and displaying hierarchical space–time diagrams based on traces of asynchronous call path samples. This paper provides an overview of HPCTOOLKIT and illustrates its utility for performance analysis of parallel applications. Copyright

ACM Transactions on Mathematical Software | 2008

OpenAD/F: A Modular Open-Source Tool for Automatic Differentiation of Fortran Codes

Jean Utke; Uwe Naumann; Mike Fagan; Nathan R. Tallent; Michelle Mills Strout; Patrick Heimbach; Chris Hill; Carl Wunsch

The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program. The derivative evaluation is performed by a Fortran code resulting from the analysis and transformation of the original program that defines the function of interest. Open/ADF has been designed with a particular emphasis on modularity, flexibility, and the use of open source components. While the code transformation follows the basic principles of automatic differentiation, the tool implements new algorithmic approaches at various levels, for example, for basic block preaccumulation and call graph reversal. Unlike most other automatic differentiation tools, Open/ADF uses components provided by the Open/AD framework, which supports a comparatively easy extension of the code transformations in a language-independent fashion. It uses code analysis results implemented in the OpenAnalysis component. The interface to the language-independent transformation engine is an XML-based format, specified through an XML schema. The implemented transformation algorithms allow efficient derivative computations using locally optimized cross-country sequences of vertex, edge, and face elimination steps. Specifically, for the generation of adjoint codes, Open/ADF supports various code reversal schemes with hierarchical checkpointing at the subroutine level. As an example from geophysical fluid dynamics, a nonlinear time-dependent scalable, yet simple, barotropic ocean model is considered. OpenAD/Fs reverse mode is applied to compute sensitivities of some of the models transport properties with respect to gridded fields such as bottom topography as independent (control) variables.

7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization | 1998

Preliminary Results from the Application of Automated Adjoint Code Generation to CFL3D

Alan Carle; Mike Fagan; Lawrence L. Green

This report describes preliminary results obtained using an automated adjoint code generator for Fortran to augment a widely-used computational fluid dynamics flow solver to compute derivatives. These preliminary results with this augmented code suggest that, even in its infancy, the automated adjoint code generator can accurately and efficiently deliver derivatives for use in transonic Euler-based aerodynamic shape optimization problems with hundreds to thousands of independent design variables.

programming language design and implementation | 2009

Binary analysis for measurement and attribution of program performance

Nathan R. Tallent; John M. Mellor-Crummey; Mike Fagan

Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understanding code performance requires information about a routines calling context. Existing performance tools fall short in this respect. Prior strategies for attributing context-sensitive performance at the source level either compromise measurement accuracy, remain too close to the binary, or require custom compilers. To understand the performance of fully optimized modular code, we developed two novel binary analysis techniques: 1) on-the-fly analysis of optimized machine code to enable minimally intrusive and accurate attribution of costs to dynamic calling contexts; and 2) post-mortem analysis of optimized machine code and its debugging sections to recover its program structure and reconstruct a mapping back to its source code. By combining the recovered static program structure with dynamic calling context information, we can accurately attribute performance metrics to calling contexts, procedures, loops, and inlined instances of procedures. We demonstrate that the fusion of this information provides unique insight into the performance of complex modular codes. This work is implemented in the HPCToolkit performance tools (http://hpctoolkit.org).

Journal of Physics: Conference Series | 2008

HPCToolkit: performance tools for scientific computing

Nathan R. Tallent; John M. Mellor-Crummey; Laksono Adhianto; Mike Fagan; Mark W. Krentel

As part of the U.S. Department of Energys Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkits utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei.

ieee international conference on high performance computing data and analytics | 2009

Diagnosing performance bottlenecks in emerging petascale applications

Nathan R. Tallent; John M. Mellor-Crummey; Laksono Adhianto; Mike Fagan; Mark W. Krentel

Cutting-edge science and engineering applications require petascale computing. It is, however, a significant challenge to use petascale computing platforms effectively. Consequently, there is a critical need for performance tools that enable scientists to understand impediments to performance on emerging petascale systems. In this paper, we describe HPCToolkit-a suite of multi-platform tools that supports sampling-based analysis of application performance on emerging petascale platforms. HPCToolkit uses sampling to pinpoint and quantify both scaling and node performance bottlenecks. We study several emerging petascale applications on the Cray XT and IBM BlueGene/P platforms and use HPCToolkit to identify specific source lines - in their full calling context - associated with performance bottlenecks in these codes. Such information is exactly what application developers need to know to improve their applications to take full advantage of the power of petascale systems.

source code analysis and manipulation | 2004

Control flow reversal for adjoint code generation

Uwe Naumann; Jean Utke; Andrew Lyons; Mike Fagan

We describe an approach to the reversal of the control flow of structured programs. It is used to automatically generate adjoint code for numerical programs by semantic source transformation. After a short introduction to applications and the implementation tool set, we describe the building blocks using a simple example. We then illustrate the code reversal within basic blocks. The main part of the paper covers the reversal of structured control flow graphs. We show the algorithmic steps for simple branches and loops and give a detailed algorithm for the reversal of arbitrary combinations of loops and branches in a general control flow graph

international conference on supercomputing | 2013

A new approach for performance analysis of openMP programs

Xu Liu; John M. Mellor-Crummey; Mike Fagan

The number of hardware threads is growing with each new generation of multicore chips; thus, one must effectively use threads to fully exploit emerging processors. OpenMP is a popular directive-based programming model that helps programmers exploit thread-level parallelism. In this paper, we describe the design and implementation of a novel performance tool for OpenMP. Our tool distinguishes itself from existing OpenMP performance tools in two principal ways. First, we develop a measurement methodology that attributes blame for work and inefficiency back to program contexts. We show how to integrate prior work on measurement methodologies that employ directed and undirected blame shifting and extend the approach to support dynamic thread-level parallelism in both time-shared and dedicated environments. Second, we develop a novel deferred context resolution method that supports online attribution of performance metrics to full calling contexts within an OpenMP program execution. This approach enables us to collect compact call path profiles for OpenMP program executions without the need for traces. Support for our approach is an integral part of an emerging standard performance tool application programming interface for OpenMP. We demonstrate the effectiveness of our approach by applying our tool to analyze four well-known application benchmarks that cover the spectrum of OpenMP features. In case studies with these benchmarks, insights from our tool helped us significantly improve the performance of these codes.

acm sigplan symposium on principles and practice of parallel programming | 2015

High performance locks for multi-level NUMA systems

Milind Chabbi; Mike Fagan; John M. Mellor-Crummey

Efficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level cohort locks perform better on NUMA systems, but fail to deliver top performance for deep NUMA hierarchies. In this paper, we describe a hierarchical variant of the MCS lock that adapts the principles of cohort locking for architectures with deep NUMA hierarchies. We describe analytical models for throughput and fairness of Cohort-MCS (C-MCS) and Hierarchical MCS (HMCS) locks that enable us to tailor these locks for high performance on any target platform without empirical tuning. Using these models, one can select parameters such that an HMCS lock will deliver better fairness than a C-MCS lock for a given throughput, or deliver better throughput for a given fairness. Our experiments show that, under high contention, a three-level HMCS lock delivers up to 7.6x higher lock throughput than a C-MCS lock on a 128-thread IBM Power 755 and a five-level HMCS lock delivers up to 72x higher lock throughput on a 4096-thread SGI UV 1000. On the K-means clustering code from the MineBench suit, a three-level HMCS lock reduces the running time by up to 55% compared to the C-MCS lock on a IBM Power 755.

source code analysis and manipulation | 2006

Data Representation Alternatives in Semantically Augmented Numerical Models

Mike Fagan; Laurent Hascoët; Jean Utke

Transformations of numerical source code may require the augmentation of the original variables with new data to represent additional data the transformed program operates on. Automatic differentiation makes extensive use of this concept. We describe the two principal approaches to implement the variable augmentation, complete encapsulation and complete separation. The paper concentrates on two major aspects. First, we characterize the advantages of each approach and illustrate the effort needed to realize these advantages in Fortran, C, and C++ as the languages we are most interested in. Second, we discuss the practical solutions that in effect represent hybrids of the two approaches.

Explore More