Todd A. Inglett | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Todd A. Inglett is active.

Explore More

Publication

Featured researches published by Todd A. Inglett.

International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.

international parallel and distributed processing symposium | 2012

Evaluating the Impact of TLB Misses on Future HPC Systems

Alessandro Morari; Roberto Gioiosa; Robert W. Wisniewski; Bryan S. Rosenburg; Todd A. Inglett; Mateo Valero

TLB misses have been considered an important source of system overhead and one of the causes that limit scalability on large supercomputers. This assumption lead to HPC lightweight kernel designs that usually statically map page table entries to TLB entries and do not take TLB misses. While this approach worked for petascale clusters, programming and debugging exascale applications composed of billions of threads is not a trivial task and users have started to explore novel programming models and tools, which require a richer system software support. In this study we present a quantitative analysis of the effect of TLB misses on current and future parallel applications at scale. To provide a fair evaluation, we compare a noiseless OS (CNK) with a custom version of the same OS capable of handling TLB misses on a BG/P system (up to 4096 cores). Our methodology follows a two-step approach: we first analyze the effects of TLB misses with a low-overhead, range-checking TLB miss handler, and then simulate a more complex TLB management system through TLB noise injection. We analyze the system behavior with different page sizes and increasing number of nodes and perform a sensitivity analysis. Our results show that the overhead introduced by TLB misses on complex HPC applications from the LLNL and ANL benchmarks is below 2% if the TLB pressure is contained and/or the TLB miss handler overhead is low, even with 1MB-pages and under large TLB noise injection. These results open the possibility of implementing richer OS memory management services to satisfy the requirements of future applications and users.

Operating Systems Review | 2006

HPC-Colony: services and interfaces for very large systems

Sayantan Chakravorty; Celso L. Mendes; Laxmikant V. Kalé; Terry Jones; Andrew T. Tauferner; Todd A. Inglett; José E. Moreira

Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.

Archive | 2010

Linux OS Jitter Measurements at Large Node Counts using a BlueGene/L

Terry Jones; Andrew T. Tauferner; Todd A. Inglett

We present experimental results for a coordinated scheduling implementation of the Linux operating system. Results were collected on an IBM Blue Gene/L machine at scales up to 16K nodes. Our results indicate coordinated scheduling was able to provide a dramatic improvement in scaling performance for two applications characterized as bulk synchronous parallel programs.

Archive | 2007