Robert Hood
Ames Research Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Robert Hood.
measurement and modeling of computer systems | 1996
Robert Hood
In the p2d2 project at NAS we have built a portable debugger for distributed programs. At the heart of the design is a client-server architecture that promotes portability of the user interface code by isolating system dependent code in a debugger server. In addition, we have designed and implemented a user interface that is capable of displaying and controlling many processes, without requiring a different interaction window for each process, The user interface presents the computation at three levels of abstraction: all of the processes, a process group, and a single process. As the number of processes decreases, the amount of information given per process increases. The specific information displayed is under user control through programmable view specifications. This process navigation paradigm is useful both for the control of processes and for examining their states.
conference on high performance computing (supercomputing) | 1994
Doreen Cheng; Robert Hood
We describe the design and implementation of a portable debugger for parallel and distributed programs. The design incorporates a client server model in order to isolate nonportable debugger code from the user interface. The precise definition of a protocol for client server interaction facilitates a high degree of client portability. Replication of server components permits the implementation of a debugger for distributed computations. Portability across message passing implementations is achieved with a protocol that specifies the interaction between a message passing library and the debugger. This permits the same debugger to be used both on PVM and MPI programs. The process abstractions used for debugging message passing programs can be adapted to debug HPF programs at the source level. This permits the meaningful display of information obscured in tool generated code.<<ETX>>
scientific cloud computing | 2012
Piyush Mehrotra; Jahed Djomehri; Steve Heistand; Robert Hood; Haoqiang Jin; Arthur Lazanoff; Subhash Saini; Rupak Biswas
Cloud computing environments are now widely available and are being increasingly utilized for technical computing. They are also being touted for high-performance computing (HPC) applications in science and engineering. For example, Amazon EC2 Services offers a specialized Cluster Compute instance to run HPC applications. In this paper, we compare the performance characteristics of Amazon EC2 HPC instances to that of NASAs Pleiades supercomputer, an SGI ICE cluster. For this study, we utilized the HPCC kernels and the NAS Parallel Benchmarks along with four full-scale applications from the repertoire of codes that are being used by NASA scientists and engineers. We compare the total runtime of these codes for varying number of cores. We also break out the computation and communication times for a subset of these applications to explore the effect of interconnect differences on the two systems. In general, the single node performance of the two platforms is equivalent. However, for most of the codes when scaling to larger core counts, the performance of EC2 HPC instance generally lags that of Pleiades due to worse network performance of the former. In addition to analyzing application performance, we also briefly touch upon the overhead due to virtualization and the usability of cloud environments such as Amazon EC2.
conference on high performance computing (supercomputing) | 2005
Rupak Biswas; M. Jahed Djomehri; Robert Hood; Haoqiang Jin; Cetin Kiris; Subhash Saini
Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floatingpoint performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and In- finiBand interconnects hold promise for multi-node application scaling to at least 2048 processors.
international parallel and distributed processing symposium | 2010
Robert Hood; Haoqiang Jin; Piyush Mehrotra; Johnny Chang; M. Jahed Djomehri; Sharad Gavali; Dennis C. Jespersen; Kenichi Taylor; Rupak Biswas
Resource sharing in commodity multicore processors can have a significant impact on the performance of production applications. In this paper we use a differential performance analysis methodology to quantify the costs of contention for resources in the memory hierarchy of several multicore processors used in high-end computers. In particular, by comparing runs that bind MPI processes to cores in different patterns, we can isolate the effects of resource sharing. We use this methodology to measure how such sharing affects the performance of four applications of interest to NASA — OVERFLOW, MITgcm, Cart3D, and NCC. We also use a subset of the HPCC benchmarks and hardware counter data to help interpret and validate our findings. We conduct our study on high-end computing platforms that use four different quad-core microprocessors — Intel Clovertown, Intel Harpertown, AMD Barcelona, and Intel Nehalem-EP. The results help further our understanding of the requirements these codes place on their production environments and also of each computers ability to deliver performance.
ieee international conference on high performance computing, data, and analytics | 2011
Subhash Saini; Haoqiang Jin; Robert Hood; David Barker; Piyush Mehrotra; Rupak Biswas
Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.
merged international parallel processing symposium and symposium on parallel and distributed processing | 1998
Michael Frumkin; Robert Hood; Luis Lopez
We report on features added to a parallel debugger to simplify the debugging of message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models | 2009
Haoqiang Jin; Robert Hood; Piyush Mehrotra
We describe a performance study of the UPC PGAS model applied to three application benchmarks from the NAS Parallel Benchmark suite. The work focuses on the performance implications of programming choices made for data affinity and data access. We compare runs of multiple versions of each benchmark encoded in UPC on both shared-memory and cluster-based parallel systems. This study points out the potential of UPC and some issues in achieving high performance when using the language.
high performance distributed computing | 2002
Gregory Matthews; Robert Hood; S. P. Johnson; P. F. Leggett
In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of that program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
ieee international conference on high performance computing data and analytics | 2012
Subhash Saini; Steve Heistand; Haoqiang Jin; Johnny Chang; Robert Hood; Piyush Mehrotra; Rupak Biswas
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASAs cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASAs Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.