Is this you? Create Your Porfile

Gregory L. Lee

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory L. Lee is active.

Explore More

Publication

Featured researches published by Gregory L. Lee.

international parallel and distributed processing symposium | 2007

Stack Trace Analysis for Large Scale Debugging

Dorian C. Arnold; Dong H. Ahn; B.R. de Supinski; Gregory L. Lee; Barton P. Miller; Martin Schulz

We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by sampling stack traces to form process equivalence classes, groups of processes exhibiting similar behavior. We can then use full-featured debuggers on representatives from these behavior classes for root cause analysis. STAT scalably collects stack traces over a sampling period to assemble a profile of the applications behavior. STAT routines process the samples to form a call graph prefix tree that encodes common behavior classes over the programs process space and time. STAT leverages MRNet, an infrastructure for tool control and data analyses, to overcome scalability barriers faced by heavy-weight debuggers. We present STATs design and an evaluation that shows STAT gathers informative process traces from thousands of processes with sub-second latencies, a significant improvement over existing tools. Our case studies of production codes verify that STAT supports the quick identification of errors that were previously difficult to locate.

ieee international conference on high performance computing data and analytics | 2009

Scalable temporal order analysis for large scale debugging

Dong H. Ahn; Bronis R. de Supinski; Ignacio Laguna; Gregory L. Lee; Ben Liblit; Barton P. Miller; Martin Schulz

We present a scalable temporal order analysis technique that supports debugging of large scale applications by classifying MPI tasks based on their logical program execution order. Our approach combines static analysis techniques with dynamic analysis to determine this temporal order scalably. It uses scalable stack trace analysis techniques to guide selection of critical program execution points in anomalous application runs. Our novel temporal ordering engine then leverages this information along with the applications static control structure to apply data flow analysis techniques to determine key application data such as loop control variables. We then use lightweight techniques to gather the dynamic data that determines the temporal order of the MPI tasks. Our evaluation, which extends the Stack Trace Analysis Tool (STAT), demonstrates that this temporal order analysis technique can isolate bugs in benchmark codes with injected faults as well as a real world hang case with AMG2006.

ieee international conference on high performance computing data and analytics | 2008

Lessons learned at 208K: towards debugging millions of cores

Gregory L. Lee; Dong H. Ahn; Dorian C. Arnold; Bronis R. de Supinski; Matthew P. LeGendre; Barton P. Miller; Martin Schulz; Ben Liblit

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.

international conference on parallel processing | 2008

Overcoming Scalability Challenges for Tool Daemon Launching

Dong H. Ahn; Dorian C. Arnold; Bronis R. de Supinski; Gregory L. Lee; Barton P. Miller; Martin Schulz

Many tools that target parallel and distributed environments must co-locate a set of daemons with the distributed processes of the target application. However, efficient and portable deployment of these daemons on large scale systems is an unsolved problem. We overcome this gap with LaunchMON, a scalable, robust, portable, secure, and general purpose infrastructure for launching tool daemons. Its API allows tool builders to identify all processes of a target job, launch daemons on the relevant nodes and control daemon interaction. Our results show that LaunchMON scales to very large daemon counts and substantially enhances performance over existing ad hoc mechanisms.

ieee international conference on high performance computing data and analytics | 2015

The Spack package manager: bringing order to HPC software chaos

Todd Gamblin; Matthew P. LeGendre; Michael R. Collette; Gregory L. Lee; Adam Moody; Bronis R. de Supinski; Scott Futral

Large HPC centers spend considerable time supporting software for thousands of users, but the complexity of HPC software is quickly outpacing the capabilities of existing software management tools. Scientific applications require specific versions of compilers, MPI, and other dependency libraries, so using a single, standard software stack is infeasible. However, managing many configurations is difficult because the configuration space is combinatorial in size. We introduce Spack, a tool used at Lawrence Livermore National Laboratory to manage this complexity. Spack provides a novel, recursive specification syntax to invoke parametric builds of packages and dependencies. It allows any number of builds to coexist on the same system, and it ensures that installed packages can find their dependencies, regardless of the environment. We show through real-world use cases that Spack supports diverse and demanding applications, bringing order to HPC software chaos.

parallel computing | 2013

LIBI: A framework for bootstrapping extreme scale software systems

J.D. Goehner; Dorian C. Arnold; Dong H. Ahn; Gregory L. Lee; B.R. de Supinski; Matthew P. LeGendre; Barton P. Miller; Martin Schulz

As the sizes of high-end computing systems continue to grow to massive scales, efficient bootstrapping for distributed software infrastructures is becoming a greater challenge. Distributed software infrastructure bootstrapping is the procedure of instantiating all processes of the distributed system on the appropriate hardware nodes and disseminating to these processes the information that they need to complete the infrastructures start-up phase. In this paper, we describe the lightweight infrastructure-bootstrapping infrastructure (LIBI), both a bootstrapping API specification and a reference implementation. We describe a classification system for process launching mechanism and then present a performance evaluation of different process launching schemes based on our LIBI prototype.

international parallel and distributed processing symposium | 2016

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications

Simone Atzeni; Ganesh Gopalakrishnan; Zvonimir Rakamarić; Dong H. Ahn; Ignacio Laguna; Martin Schulz; Gregory L. Lee; Joachim Protze; Matthias S. Müller

OpenMP plays a growing role as a portable programming model to harness on-node parallelism, yet, existing data race checkers for OpenMP have high overheads and generate many false positives. In this paper, we propose the first OpenMP data race checker, ARCHER, that achieves high accuracy, low overheads on large applications, and portability. ARCHER incorporates scalable happens-before tracking, exploits structured parallelism via combined static and dynamic analysis, and modularly interfaces with OpenMP runtimes. ARCHER significantly outperforms TSan and Intel® Inspector XE, while providing the same or better precision. It has helped detect critical data races in the Hypre library that is central to many projects at Lawrence Livermore National Laboratory and elsewhere.

Communications of The ACM | 2015

Debugging high-performance computing applications at massive scales

Ignacio Laguna; Dong H. Ahn; Bronis R. de Supinski; Todd Gamblin; Gregory L. Lee; Martin Schulz; Saurabh Bagchi; Milind Kulkarni; Bowen Zhou; Zhezhe Chen; Feng Qin

Dynamic analysis techniques help programmers find the root cause of bugs in large-scale parallel applications.

ieee international symposium on workload characterization | 2007

Pynamic: the Python Dynamic Benchmark

Gregory L. Lee; Dong H. Ahn; B.R. de Supinski; John C. Gyllenhaal; P. Miller

Python is widely used in scientific computing to facilitate application development and to support features such as computational steering. Making full use of some of Pythons popular features, which improve programmer productivity, leads to applications that access extremely high numbers of dynamically linked libraries (DLLs). As a result, some important Python-based applications severely stress a systems dynamic linking and loading capabilities and also cause significant difficulties for most development environment tools, such as debuggers. Furthermore, using the Python paradigm for large scale MPI-based applications can create significant file IO and further stress tools and operating systems. In this paper, we present Pynamic, the first benchmark program to support configurable emulation of a wide-range of the DLL usage of Python-based applications for large scale systems. Pynamic has already accurately reproduced system software and tool issues encountered by important large Python-based scientific applications on our supercomputers. Pynamic provided insight for our system software and tool vendors, and our application developers, into the impact of several design decisions. As we describe the Pynamic benchmark, we will highlight some of the issues discovered in our large scale system software and tools using Pynamic.

ieee international conference on high performance computing data and analytics | 2014

Towards providing low-overhead data race detection for large OpenMP applications

Joachim Protze; Simone Atzeni; Dong H. Ahn; Martin Schulz; Ganesh Gopalakrishnan; Matthias S. Müller; Ignacio Laguna; Zvonimir Rakamarić; Gregory L. Lee

Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race detectors are aimed at low-level threading models, and cannot handle high-level models such as OpenMP. Further, they do not provide mechanisms by which static analysis methods can target selected regions of code with sufficient precision. In this paper, we present our solutions to both challenges. Specifically, we identify patterns within OpenMP runtimes that tend to mislead existing dynamic race checkers and provide mechanisms that help establish an explicit happens-before relation to prevent such misleading checks. We also implement a fine-grained blacklist mechanism to allow a runtime analyzer to exclude regions of code at line number granularity. We support race checking by adapting ThreadSanitizer, a mature data-race checker developed at Google that is now an integral part of Clang and GCC; and we have implemented our techniques within the state-of-the-art Intel OpenMP Runtime. Our results demonstrate that these techniques can significantly improve runtime analysis accuracy and overhead in the context of data race checking of OpenMP applications.

Explore More