Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kurt Walter Pinnow.
Ibm Journal of Research and Development | 2005
George S. Almasi; Charles J. Archer; José G. Castaños; John A. Gunnels; C. Christopher Erway; Philip Heidelberger; Xavier Martorell; José E. Moreira; Kurt Walter Pinnow; Joe Ratterman; Burkhard Steinmacher-Burow; William Gropp; Brian R. Toonen
The Blue Gene®/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part of this support is an optimized implementation of the Message Passing Interface (MPI), which leverages the hardware features of BG/L. MPI for BG/L is implemented on top of a more basic message-passing infrastructure called the message layer. This message layer can be used both to implement other higher-level libraries and directly by applications. MPI and the message layer are used in the two BG/L modes of operation: the coprocessor mode and the virtual node mode. Performance measurements show that our message-passing services deliver performance close to the hardware limits of the machine. They also show that dedicating one of the processors of a node to communication functions (coprocessor mode) greatly improves the message-passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positive impact on application performance.
european conference on parallel processing | 2004
George S. Almasi; Charles J. Archer; José G. Castaños; C. Christopher Erway; Philip Heidelberger; Xavier Martorell; José E. Moreira; Kurt Walter Pinnow; Joe Ratterman; Nils Smeds; Burkhard Steinmacher-Burow; William Gropp; Brian R. Toonen
The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making message passing the natural programming model for BlueGene/L. In this paper we present our implementation of MPI for BlueGene/L. In particular, we discuss how we leveraged the architectural features of BlueGene/L to arrive at an efficient implementation of MPI in this machine. We validate our approach by comparing MPI performance against the hardware limits and also the relative performance of the different modes of operation of BlueGene/L. We show that dedicating one of the processors of a node to communication functions greatly improves the bandwidth achieved by MPI operation, whereas running two MPI tasks per compute node can have a positive impact on application performance.
Ibm Journal of Research and Development | 2008
Yuan-Ping Pang; Timothy J. Mullins; Brent Allen Swartz; Jeff S. McAllister; Brian E. Smith; Charles J. Archer; Roy Glenn Musselman; Amanda Peters; Brian Paul Wallenfelt; Kurt Walter Pinnow
EUDOC™ is a molecular docking program that has successfully helped to identify new drug leads. This virtual screening (VS) tool identifies drug candidates by computationally testing the binding of these drugs to biologically important protein targets. This approach can reduce the research time required of biochemists, accelerating the identification of therapeutically useful drugs and helping to transfer discoveries from the laboratory to the patient. Migration of the EUDOC application code to the IBM Blue Gene/L™ (BG/L) supercomputer has been highly successful. This migration led to a 200-fold improvement in elapsed time for a representative VS application benchmark. Three focus areas provided benefits. First, we enhanced the performance of serial code through application redesign, hand-tuning, and increased usage of SIMD (single-instruction, multiple-data) floating-point unit operations. Second, we studied computational load-balancing schemes to maximize processor utilization and application scalability for the massively parallel architecture of the BG/L system. Third, we greatly enhanced system I/O interaction design. We also identified and resolved severe performance bottlenecks, allowing for efficient performance on more than 4,000 processors. This paper describes specific improvements in each of the areas of focus.
Archive | 1995
Daniel Manual Dias; Randy L. Egan; Roy Louis Hoffman; Richard P. King; Kurt Walter Pinnow; Christos A. Polyzois
Archive | 2000
Abdo Esmail Abdo; Kevin James Kathmann; Kurt Walter Pinnow
Archive | 2007
Thomas M. Gooding; David L. Hermsmeier; Roy Glenn Musselman; Amanda Peters; Kurt Walter Pinnow; Brent Allen Swartz
Archive | 2007
Charles J. Archer; Amanda Peters; Kurt Walter Pinnow; Brent Allen Swartz
Archive | 2007
Charles J. Archer; Roy Glenn Musselman; Amanda Peters; Kurt Walter Pinnow; Brent Allen Swartz; Brian Paul Wallenfelt
Archive | 2006
Charles J. Archer; Roy Glenn Musselman; Amanda Peters; Kurt Walter Pinnow; Brent Allen Swartz; Brian Paul Wallenfelt
Archive | 2007
Charles J. Archer; Roy Glenn Musselman; Amanda Peters; Kurt Walter Pinnow; Brent Allen Swartz