Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bob Kuhn is active.

Publication


Featured researches published by Bob Kuhn.


ieee international conference on high performance computing data and analytics | 2005

Automated, scalable debugging of MPI programs with Intel® Message Checker

Jayant DeSouza; Bob Kuhn; Bronis R. de Supinski; Victor Samofalov; Sergey N. Zheltov; Stanislav Viktorovich Bratanov

The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center. IMCs unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.


parallel computing | 2001

Producing scalable performance with OpenMP: experiments with two CFD applications

Jay Hoeflinger; Prasad Alavilli; T. L. Jackson; Bob Kuhn

Abstract OpenMP is a relatively new programming paradigm, which can easily deliver good parallel performance for small numbers (


Concurrency and Computation: Practice and Experience | 2000

OpenMP versus threading in C/C++

Bob Kuhn; Paul M. Petersen; Eamonn O'Toole

When comparing OpenMP to other parallel programming models, it is easier to choose between OpenMP and MPI than between OpenMP and POSIX Threads (Pthreads). With languages like C and C++, developers have frequently chosen Pthreads to incorporate parallelism in applications. Few developers are currently using OpenMP C/C++, but they should. We show that converting Genehunter, a hand-threaded C program, to OpenMP increases robustness without sacrificing performance. It is also a good case study as it highlights several issues that are important in understanding how OpenMP uses threads. Genehunter is a genetics program which analyzes DNA assays from members of a family tree where a disease is present in certain members and not in others. This analysis is done in an attempt to identify the gene most likely to cause the disease. This problem is called linkage analysis. The same sections of Genehunter were parallelized first by hand-threading and then with OpenMP on Compaq Alpha Tru64 systems. We present examples using both methods and illustrate the tools that proved useful in the process. Our basic conclusion is that, although we could express the parallelism using either Pthreads or OpenMP, it was easier to express the parallelism at a higher level of abstraction. OpenMP allowed enough control to express the parallelism without exposing the implementation details. Also, due to the higher level specification of parallelism with OpenMP, the tools available to assist in the construction of correct and efficient programs provide more useful information than the equivalent tools available for hand-threaded programs. The following concepts are presented: differences between coding styles for OpenMP and Pthreads; data scoping specification for correct parallel programming; adapting a signal based exception mechanism to a parallel program; OpenMP tools: Debuggers—Ladebug, TotalView and Assure; Profilers—Hiprof and GuideView; performance tuning with memory allocation, synchronization, and scheduling. Genehunter does not cover a few important topics in C/C++ programming style, which will be discussed separately. These are: interfacing a GUI team of threads with an OpenMP compute team; coordinating data structure with scheduling. Copyright


international workshop on openmp | 2001

An Integrated Performance Visualizer for MPI/OpenMP Programs

Jay Hoeflinger; Bob Kuhn; Wolfgang E. Nagel; Paul M. Petersen; Hrabri Rajic; Sanjiv Shah; Jeffrey S. Vetter; Michael Voss; Renee Woo

As cluster computing has grown, so has its use for large scientific calculations. Recently, many researchers have experimented with using MPI between nodes of a clustered machine and OpenMP within a node, to manage the use of parallel processing. Unfortunately, very few tools are available for doing an integrated analysis of an MPI/OpenMP program. KAI Software, Pallas GmbH and the US Department of Energy have partnered together to build such a tool, VGV. VGV is designed for doing scalable performance analysis - that is, to make the performance analysis process qualitatively the same for small cluster machines as it is for the largest ASCI systems. This paper describes VGV and gives a flavor of how to find performance problems using it.


international parallel and distributed processing symposium | 2002

VGV: supporting performance analysis of object-oriented mixed MPI/OpenMPI parallel applications

Seon Wook Kim; Michael Voss; Bob Kuhn; Hans-Christian Hoppe; Wolfgang E. Nagel

In the past, developers of parallel science and engineering applications have been reluctant to embrace objectoriented languages due to the high abstraction penalties they incur at runtime. However, recent advances in design techniques and compiler technology have allowed C++ to emerge as a practical choice for these applications. Due to the lag in acceptance of these languages for parallel computing, there has also been a lag in commercial tool support for this application domain. This paper presents extensions made to the Vampir/GuideView (VGV) tool set to support performance analysis of object-oriented mixed MPI/OpenMP parallel applications. First, a performance data abstraction for object-oriented mixed MPI/OpenMP parallel applications is proposed. Next, the implementation of this abstraction within the VGV tool set is presented. Finally, our tool set is demonstrated by applying it to two parallel applications. VGV is the first commercial tool to provide performance analysis facilities for these types of applications.


languages, compilers, and tools for embedded systems | 2011

Scheduling of stream-based real-time applications for heterogeneous systems

Bruno Virlet; Xing Zhou; Jean Pierre Giacalone; Bob Kuhn; María Jesús Garzarán; David A. Padua

Designers of mobile devices face the challenge of providing the user with more processing power while increasing battery life. Heterogeneous systems offer some opportunities to solve this challenge. In an heterogeneous system, multiple classes of processors with dynamic voltage and frequency scaling functionality are embedded in the mobile device. With such a system it is possible to maximize performance while minimizing power consumption if tasks are mapped to the class of processors where they execute the most efficiently. In this paper, we study the scheduling of tasks in a real-time context on a heterogeneous system-on-chip that has dynamic voltage and frequency scaling functionality. We develop a heuristic scheduling algorithm which minimizes the energy while still meeting the deadline. We introduce the concept of cross-platform task heterogeneity and model sets of tasks to conduct extensive experiments. The experimental results show that our heuristic has a much higher success rate than existing state of the art heuristics and derives a solution whose energy requirements are close to those of the optimal solution.


acm sigplan symposium on principles and practice of parallel programming | 2014

Vector seeker: a tool for finding vector potential

G. Carl Evans; Seth Abraham; Bob Kuhn; David A. Padua

The importance of vector instructions is growing in modern computers. Almost all architectures include some form of vector instructions and the tendency is for the size of the instructions to grow with newer designs. To take advantage of the performance that these systems offer, it is imperative that programs use these instructions, and yet they do not always do so. The tools to take advantage of these extensions require programmer assistance either by hand coding or providing hints to the compiler. We present Vector Seeker, a tool to help investigate vector parallelism in existing codes. Vector Seeker runs with the execution of a program to optimistically measure the vector parallelism that is present. Besides describing Vector Seeker, the paper also evaluates its effectiveness using two applications from Petascale Application Collaboration Teams (PACT) and eight applications from Media Bench II. These results are compared to known results from manual vectorization studies. Finally, we use the tool to automatically analyze codes from Numerical Recipes and TSVC and then compare the results with the automatic vectorization algorithms of Intels ICC.


ieee international symposium on workload characterization | 2015

PC Design, Use, and Purchase Relations

Al Rashid; Bob Kuhn; Bijan Arbab; David J. Kuck

For 25 years, industry standard benchmarks have proliferated, attempting to approximate user activities. This has helped drive the success of PCs to commodity levels by characterizing apps for designers and offering performance information for users. However, the many new configurations of each PC release cycle often leave users unsure about how to choose one. This paper takes a different approach, with tools based on new metrics to analyze real usage by millions of people. Our goal is to develop a methodology for deeper understanding of usage that can help designers satisfy users. These metrics demonstrate that usages are uniformly different between high- and low-end CPU-based systems, regardless of why a user bought a given system. We outline how this data can be used to partition markets and make more effective hardware (hw) and software (sw) design decisions tailoring systems for prospective markets.


network and parallel computing | 2004

Productivity in HPC Clusters

Bob Kuhn

This presentation discusses HPC productivity in terms of: (1) effective architectures, (2) parallel programming models, and (3) applications development tools. The demands placed on HPC by owners and users of systems ranging from public research laboratories to private scientific and engineering companies enrich the topic with many competing technologies and approaches. Rather than expecting to eliminate each other in the short run, these HPC competitors should be learning from one another in order to stay in the race. Here we examine how these competing forces form the engine of improvement for overall HPC cost/effectiveness. First, what will the effective architectures be? Moore’s law is likely to still hold at the processor level over the next few years. Those words are, of course, typical from a semiconductor manufacturer. More important for this conference, our roadmap projects that it will accelerate over the next couple of years due to Chip Multi Processors, CMPs. It has also been observed that cluster size has been growing at the same rate. Few people really know how successful the Grid and Utility Computing will be, but virtual organizations may add another level of parallelism to the problem solving process. Second, on parallel programming models, hybrid parallelism, i.e. parallelism at multiple levels with multiple programming models, will be used in many applications. Hybrid parallelism may emerge because application speedup at each level can be multiplied by future architectures. But, these applications can also adapt best to the wide variety of data and problems. Robustness of this type is needed to avoid high software costs of converting or incrementally tuning existing program. This leads to OpenMP, MPI, and Grid programming model investments. Third, application tools are needed for programmer productivity. Frankly, integrated programming environments have not made much headway in HPC. Tools for debugging and performance analysis still define the basic needs. The term debugging is used advised because there are limits to the scalability of debuggers in the amount of code and number of processors even today. How can we breakthrough? Maybe through automated tools for finding bugs at the threading and process level? Performance analysis capability similarly will be exceeded by the growth of hardware parallelism, unless progress is made.


Archive | 2006

Methods and apparatus to perform process placement for distributed applications

Hu Chen; Wenguang Chen; Bob Kuhn; Eric Huang

Collaboration


Dive into the Bob Kuhn's collaboration.

Researchain Logo
Decentralizing Knowledge