Is this you? Create Your Porfile

Matthew P. LeGendre

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew P. LeGendre is active.

Explore More

Publication

Featured researches published by Matthew P. LeGendre.

international conference on supercomputing | 2013

Automatically adapting programs for mixed-precision floating-point computation

Michael O. Lam; Jeffrey K. Hollingsworth; Bronis R. de Supinski; Matthew P. LeGendre

As scientific computation continues to scale, it is crucial to use floating-point arithmetic processors as efficiently as possible. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set will result in inaccurate results. In this poster, we present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. This allows developers to easily experiment with mixed-precision configurations without modifying their source code, and it permits auto-tuning of floating-point precision. We also implemented a simple search algorithm to automatically identify which code regions can use lower precision. We include results for several benchmarks that show both the efficacy and overhead of our tool.

ieee international conference on high performance computing data and analytics | 2008

Lessons learned at 208K: towards debugging millions of cores

Gregory L. Lee; Dong H. Ahn; Dorian C. Arnold; Bronis R. de Supinski; Matthew P. LeGendre; Barton P. Miller; Martin Schulz; Ben Liblit

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.

ieee international conference on high performance computing data and analytics | 2015

The Spack package manager: bringing order to HPC software chaos

Todd Gamblin; Matthew P. LeGendre; Michael R. Collette; Gregory L. Lee; Adam Moody; Bronis R. de Supinski; Scott Futral

Large HPC centers spend considerable time supporting software for thousands of users, but the complexity of HPC software is quickly outpacing the capabilities of existing software management tools. Scientific applications require specific versions of compilers, MPI, and other dependency libraries, so using a single, standard software stack is infeasible. However, managing many configurations is difficult because the configuration space is combinatorial in size. We introduce Spack, a tool used at Lawrence Livermore National Laboratory to manage this complexity. Spack provides a novel, recursive specification syntax to invoke parametric builds of packages and dependencies. It allows any number of builds to coexist on the same system, and it ensures that installed packages can find their dependencies, regardless of the environment. We show through real-world use cases that Spack supports diverse and demanding applications, bringing order to HPC software chaos.

parallel computing | 2013

LIBI: A framework for bootstrapping extreme scale software systems

J.D. Goehner; Dorian C. Arnold; Dong H. Ahn; Gregory L. Lee; B.R. de Supinski; Matthew P. LeGendre; Barton P. Miller; Martin Schulz

As the sizes of high-end computing systems continue to grow to massive scales, efficient bootstrapping for distributed software infrastructures is becoming a greater challenge. Distributed software infrastructure bootstrapping is the procedure of instantiating all processes of the distributed system on the appropriate hardware nodes and disseminating to these processes the information that they need to complete the infrastructures start-up phase. In this paper, we describe the lightweight infrastructure-bootstrapping infrastructure (LIBI), both a bootstrapping API specification and a reference implementation. We describe a classification system for process launching mechanism and then present a performance evaluation of different process launching schemes based on our LIBI prototype.

international conference on supercomputing | 2013

Massively parallel loading

Wolfgang Frings; Dong H. Ahn; Matthew P. LeGendre; Todd Gamblin; Bronis R. de Supinski; Felix Wolf

Dynamic linking has many advantages for managing large code bases, but dynamically linked applications have not typically scaled well on high performance computing systems. Splitting a monolithic executable into many dynamic shared object (DSO) files decreases compile time for large codes, reduces runtime memory requirements by allowing modules to be loaded and unloaded as needed, and allows common DSOs to be shared among many executables. However, launching an executable that depends on many DSOs causes a flood of file system operations at program start-up, when each process in the parallel application loads its dependencies. At large scales, this operation has an effect similar to a site-wide denial-of-service attack, as even large parallel file systems struggle to service so many simultaneous requests. In this paper, we present SPINDLE, a novel approach to parallel loading that coordinates simultaneous file system operations with a scalable network of cache server processes. Our approach is transparent to user applications. We extend the GNU loader, which is used in Linux as well as proprietary operating systems, to limit the number of simultaneous file system operations, quickly loading DSOs without thrashing the file system. Our experiments show that our prototype implementation has a low overhead and increases the scalability of Pynamic, a benchmark that stresses the dynamic loader, by a factor of 20.

international parallel and distributed processing symposium | 2013

Efficient and Scalable Retrieval Techniques for Global File Properties

Dong H. Ahn; Michael J. Brim; Bronis R. de Supinski; Todd Gamblin; Gregory L. Lee; Matthew P. LeGendre; Barton P. Miller; Martin Schulz

Large-scale systems typically mount many different file systems with distinct performance characteristics and capacity. Applications must efficiently use this storage in order to realize their full performance potential. Users must take into account potential file replication throughout the storage hierarchy as well as contention in lower levels of the I/O system, and must consider communicating the results of file I/O between application processes to reduce file system accesses. Addressing these issues and optimizing file accesses requires detailed runtime knowledge of file system performance characteristics and the location(s) of files on them. In this paper, we propose Fast Global File Status (FGFS), a scalable mechanism to retrieve file information, such as its degree of distribution or replication and consistency. We use a novel node-local technique that turns expensive, non-scalable file system calls into simple string comparison operations. FGFS raises the namespace of a locally-defined file path to a global namespace with little or no file system calls to obtain global file properties efficiently. Our evaluation on a large multi-physics application shows that most FGFS file status queries on its executable and 848 shared library files complete in 272 milliseconds or faster at 32,768 MPI processes. Even the most expensive operation, which checks global file consistency, completes in under 7 seconds at this scale, an improvement of several orders of magnitude over the traditional checksum technique.

Proceedings of the Third International Workshop on HPC User Support Tools | 2016

Managing combinatorial software installations with Spack

Gregory Becker; Peter Scheibel; Matthew P. LeGendre; and Todd Gamblin

HPC centers deploy a variety of scientific software, but the complexity of building scientific applications makes package management increasingly difficult. Users demand combinatorial versions of packages, but site administrators may need to perform in-place upgrades for security and for bug fixes.This paper describes an extension of the Spack package manager that allows HPC centers to navigate a compromise between fully combinatorial versioning and a stable, upgradable software stack. Spack provides a set of templated packages that can be deployed in arbitrarily many combinatorial build configurations. We introduce subspaces, an extension of Spacks versioning system that allows HPC sites to choose an arbitrary combinatorial complexity for packages they deploy. Subspaces allow us to use a single Spack package to generate binary packages for systems such as RPM. Using subspaces, support staff can configure the degree of combinatorial versioning exposed to the user. This capability enables an intuitive and flexible user environment that can be leveraged across multiple HPC sites.

international parallel and distributed processing symposium | 2015

A Scalable Prescriptive Parallel Debugging Model

Nicklas Bo Jensen; Niklas Quarfot Nielsen; Gregory L. Lee; Sven Karlsson; Matthew P. LeGendre; Martin Schulz; Dong H. Ahn

Debugging is a critical step in the development of any parallel program. However, the traditional interactive debugging model, where users manually step through code and inspect their application, does not scale well even for current supercomputers due its centralized nature. While lightweight debugging models, which have been proposed as an alternative, scale well, they can currently only debug a subset of bug classes. We therefore propose a new model, which we call prescriptive debugging, to fill this gap between these two approaches. This user-guided model allows programmers to express and test their debugging intuition in a way that helps to reduce the error space. Based on this debugging model we introduce a prototype implementation embodying this model, the DySectAPI, allowing programmers to construct probe trees for automatic, event-driven debugging at scale. In this paper we introduce the concepts behind DySectAPI and, using both experimental results and analytical modelling, we show that the DySectAPI implementation can run with a low overhead on current systems. We achieve a logarithmic scaling of the prototype and show predictions that even for a large system the overhead of the prescriptive debugging model will be small.

ieee international conference on high performance computing data and analytics | 2012

Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation

Michael O. Lam; Bronis R. de Supinksi; Matthew P. LeGendre; Jeffrey K. Hollingsworth

As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. We present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. Initial results with the Algebraic MultiGrid kernel demonstrate a nearly 2χ speedup.

ieee international conference on high performance computing data and analytics | 2014