Paul M. Petersen
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul M. Petersen.
workshop on i o in parallel and distributed systems | 2006
Utpal Banerjee; Brian E. Bliss; Zhiqiang Ma; Paul M. Petersen
This paper presents a rigorous mathematical theory for the detection of data races in threaded programs. After creating a structure with precise definitions and theorems, it goes on to develop four algorithms with the goal of detecting at least one race in the situation where the history kept on previous memory accesses is limited. The algorithms demonstrate the tradeoff between the amount of access history kept and the kinds of data races that can be detected. One of these algorithms is a reformulation of a previously known algorithm; the other three are new. One of the new ones is actually used in the tool called Intel® Thread Checker.
architectural support for programming languages and operating systems | 2006
Paul Sack; Brian E. Bliss; Zhiqiang Ma; Paul M. Petersen; Josep Torrellas
Debugging data races in parallel applications is a difficult task. Error-causing data races may appear to vanish due to changes in an applications optimization level, thread scheduling, whether or not a debugger is used, and other effects. Further, many race conditions cause incorrect program behavior only in rare scenarios and may lie undetected during software testing.Tools exist today that do a decent job in finding data races in multi-threaded applications. Some data-race detection tools are very efficient and can detect data races with less than a 2x performance penalty. Most such tools, however, do not provide enough information to the user, require recompilation, or impose other usage restrictions. Other tools, such as the one considered in this paper (Intels Thread Checker), provide users with plenty of useful information and can be used with any application binary, but have high overheads - often over 200x. It is the goal of this paper to speed up Thread Checker by filtering out the vast majority of memory references that are highly unlikely to be involved in data races. In our work, we develop filters that filter 90-100% of all memory references from the datarace detection algorithm, resulting in speedups of 2.2-5.5x, with an average improvement of 3.3x.
international workshop on openmp | 2003
Paul M. Petersen; Sanjiv Shah
The Intel® Thread Checker is the second incarnation of projection based dynamic analysis technology first introduced with Assure that greatly simplifies application development with OpenMP. The ability to dynamically analyze multiple sibling OpenMP teams enhances the previous Assure support and complements previous work on static analysis. In addition, binary instrumentation capabilities allow detection of thread-safety violations in system and third party libraries that most applications use.
high level parallel programming models and supportive environments | 2003
Xinmin Tian; Milind Girkar; Sanjiv Shah; Douglas R. Armstrong; Ernesto Su; Paul M. Petersen
Exploiting Thread-Level Parallelism (TLP) is a promising way to improve the performance of applications with the advent of general-purpose cost effective uni-processor and shared-memory multiprocessor systems. In this paper, we describe the OpenMP* implementation in the Intel/spl reg/ C++ and Fortran compilers for Intel platforms. We present our major design consideration and decisions in the Intel compiler for generating efficient multithreaded codes guided by OpenMP directives and pragmas. We describe several transformation phases in the compiler for the OpenMP* parallelization. In addition to compiler support, the OpenMP runtime library is a critical part of the Intel compiler. We present runtime techniques developed in the Intel OpenMP runtime library for exploiting thread-level parallelism as well as integrating the OpenMP support with other forms of threading termed as sibling parallelism. The performance results of a set of benchmarks show good speedups over the well-optimized serial code performance on Intel/spl reg/ Pentium- and Itanium-processor based systems.
Concurrency and Computation: Practice and Experience | 2000
Bob Kuhn; Paul M. Petersen; Eamonn O'Toole
When comparing OpenMP to other parallel programming models, it is easier to choose between OpenMP and MPI than between OpenMP and POSIX Threads (Pthreads). With languages like C and C++, developers have frequently chosen Pthreads to incorporate parallelism in applications. Few developers are currently using OpenMP C/C++, but they should. We show that converting Genehunter, a hand-threaded C program, to OpenMP increases robustness without sacrificing performance. It is also a good case study as it highlights several issues that are important in understanding how OpenMP uses threads. Genehunter is a genetics program which analyzes DNA assays from members of a family tree where a disease is present in certain members and not in others. This analysis is done in an attempt to identify the gene most likely to cause the disease. This problem is called linkage analysis. The same sections of Genehunter were parallelized first by hand-threading and then with OpenMP on Compaq Alpha Tru64 systems. We present examples using both methods and illustrate the tools that proved useful in the process. Our basic conclusion is that, although we could express the parallelism using either Pthreads or OpenMP, it was easier to express the parallelism at a higher level of abstraction. OpenMP allowed enough control to express the parallelism without exposing the implementation details. Also, due to the higher level specification of parallelism with OpenMP, the tools available to assist in the construction of correct and efficient programs provide more useful information than the equivalent tools available for hand-threaded programs. The following concepts are presented: differences between coding styles for OpenMP and Pthreads; data scoping specification for correct parallel programming; adapting a signal based exception mechanism to a parallel program; OpenMP tools: Debuggers—Ladebug, TotalView and Assure; Profilers—Hiprof and GuideView; performance tuning with memory allocation, synchronization, and scheduling. Genehunter does not cover a few important topics in C/C++ programming style, which will be discussed separately. These are: interfacing a GUI team of threads with an OpenMP compute team; coordinating data structure with scheduling. Copyright
international workshop on openmp | 2001
Jay Hoeflinger; Bob Kuhn; Wolfgang E. Nagel; Paul M. Petersen; Hrabri Rajic; Sanjiv Shah; Jeffrey S. Vetter; Michael Voss; Renee Woo
As cluster computing has grown, so has its use for large scientific calculations. Recently, many researchers have experimented with using MPI between nodes of a clustered machine and OpenMP within a node, to manage the use of parallel processing. Unfortunately, very few tools are available for doing an integrated analysis of an MPI/OpenMP program. KAI Software, Pallas GmbH and the US Department of Energy have partnered together to build such a tool, VGV. VGV is designed for doing scalable performance analysis - that is, to make the performance analysis process qualitatively the same for small cluster machines as it is for the largest ASCI systems. This paper describes VGV and gives a flavor of how to find performance problems using it.
Archive | 1997
David K. Poulsen; Paul M. Petersen; Sanjiv Shah
Archive | 1999
Paul M. Petersen; Flint Pellett
Archive | 2011
Richard A. Hankins; Gautham N. Chinya; Hong Wang; Shivnandan D. Kaushik; Bryant Bigbee; John Paul Shen; Trung A. Diep; Xiang Zou; Baiju V. Patel; Paul M. Petersen; Sanjiv Shah; Ryan N. Rakvic; Prashant Sethi
Concurrency and Computation: Practice and Experience | 2000
Sanjiv Shah; Grant E. Haab; Paul M. Petersen; Joe Throop