Is this you? Create Your Porfile

Paul Stodghill

United States Department of Agriculture

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Stodghill is active.

Explore More

Publication

Featured researches published by Paul Stodghill.

acm sigplan symposium on principles and practice of parallel programming | 2003

Automated application-level checkpointing of MPI programs

Greg Bronevetsky; Daniel Marques; Keshav Pingali; Paul Stodghill

The running times of many computational science applications, such as protein-folding using ab initio methods, are much longer than the mean-time-to-failure of high-performance computing platforms. To run to completion, therefore, these applications must tolerate hardware failures.In this paper, we focus on the stopping failure model in which a faulty process hangs and stops responding to the rest of the system. We argue that tolerating such faults is best done by an approach called application-level coordinated non-blocking checkpointing, and that existing fault-tolerance protocols in the literature are not suitable for implementing this approach.We then present a suitable protocol, which is implemented by a co-ordination layer that sits between the application program and the MPI library. We show how this protocol can be used with a precompiler that instruments C/MPI programs to save application and MPI library state. An advantage of our approach is that it is independent of the MPI implementation. We present experimental results that argue that the overhead of using our system can be small.

Proceedings of the IEEE | 2005

Is Search Really Necessary to Generate High-Performance BLAS?

Kamen Yotov; Xiaoming Li; Gang Ren; María Jesús Garzarán; David A. Padua; Keshav Pingali; Paul Stodghill

A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional model-driven optimization cannot compete with search-based empirical optimization because tractable analytical models cannot capture all the complexities of modern high-performance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a model-driven optimization engine and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that model-driven optimization can be surprisingly effective and can generate code with performance comparable to that of code generated by ATLAS using global search.

Journal of Bacteriology | 2010

Transcriptome Analysis of Pseudomonas syringae Identifies New Genes, Noncoding RNAs, and Antisense Activity

Melanie J. Filiatrault; Paul Stodghill; Philip A. Bronstein; Simon Moll; Magdalen Lindeberg; George Grills; Peter A. Schweitzer; Wei Wang; Gary P. Schroth; Shujun Luo; Irina Khrebtukova; Yong Yang; Theodore Thannhauser; Bronwyn G. Butcher; Samuel Cartinhour; David J. Schneider

To fully understand how bacteria respond to their environment, it is essential to assess genome-wide transcriptional activity. New high-throughput sequencing technologies make it possible to query the transcriptome of an organism in an efficient unbiased manner. We applied a strand-specific method to sequence bacterial transcripts using Illuminas high-throughput sequencing technology. The resulting sequences were used to construct genome-wide transcriptional profiles. Novel bioinformatics analyses were developed and used in combination with proteomics data for the qualitative classification of transcriptional activity in defined regions. As expected, most transcriptional activity was consistent with predictions from the genome annotation. Importantly, we identified and confirmed transcriptional activity in areas of the genome inconsistent with the annotation and in unannotated regions. Further analyses revealed potential RpoN-dependent promoter sequences upstream of several noncoding RNAs (ncRNAs), suggesting a role for these ncRNAs in RpoN-dependent phenotypes. We were also able to validate a number of transcriptional start sites, many of which were consistent with predicted promoter motifs. Overall, our approach provides an efficient way to survey global transcriptional activity in bacteria and enables rapid discovery of specific areas in the genome that merit further investigation.

symposium on principles of programming languages | 1991

Dependence flow graphs: an algebraic approach to program dependencies

Keshav Pingali; Micah Beck; Richard Johnson; Mayan Moudgill; Paul Stodghill

The topic of intermediate languages for optimizing and parallelizing compilers has received much attention lately. In this paper, we argue that any good representation must have two crucial properties: first, the representation of a program must be a data structure that can be rapidly traversed to determine dependence information; second, the representation must be a program in its own right, with a parallel, local, model of execution. In this paper, we illustrate the importance of these points by examining algorithms for standard optimization-global constant propagation. We discuss the problems in working with current representations. Then, we propose a novel representation called the dependence flow graph which has each of the properties mentioned above. In this representation, dependencies are part of the computational mode, in that there is an algebra of operators over dependencies. We show that this representation leads to a simple algorithm, based on abstract interpretation, for solving the constant propagation problem. Our algorithm is simpler than, and as fast as, the best known algorithms for the problem. An interesting feature of our representation is that it naturally incorporates the best aspects of many other representations, including continuation-passing style, data and program dependence graphs, static single assignment form and dataflow program graphs.

conference on high performance computing (supercomputing) | 2004

Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs

Martin Schulz; Greg Bronevetsky; Rohit Fernandes; Daniel Marques; Keshav Pingali; Paul Stodghill

The running times of many computational science applications are much longer than the mean-time-to-failure of current high-performance computing platforms. To run to completion, such applications must tolerate hardware failures. Checkpoint-and-restart (CPR) is the most commonly used scheme for accomplishing this - the state of the computation is saved periodically on stable storage, and when a hardware failure is detected, the computation is restarted from the most recently saved state. Most automatic CPR schemes in the literature can be classified as system-level checkpointing schemes because they take core-dump style snapshots of the computational state when all the processes are blocked at global barriers in the program. Unfortunately, a system that implements this style of checkpointing is tied to a particular platform; in addition, it cannot be used if there are no global barriers in the program. We are exploring an alternative called application-level, non-blocking checkpointing. In our approach, programs are transformed by a pre-processor so that they become self-checkpointing and self-restartable on any platform; there is also no assumption about the existence of global barriers in the code. In this paper, we describe our implementation of application-level, non-blocking checkpointing. We present experimental results on both a Windows cluster and a Compaq Alpha cluster, which show that the overheads introduced by our approach are small.

languages and compilers for parallel computing | 1994

Solving Alignment Using Elementary Linear Algebra

David Bau; Induprakas Kodukula; Vladimir Kotlyar; Keshav Pingali; Paul Stodghill

Data and computation alignment is an important part of compiling sequential programs to architectures with non-uniform memory access times. In this paper, we show that elementary matrix methods can be used to determine communication-free alignment of code and data. We also solve the problem of replicating read-only data to eliminate communication. Our matrix-based approach leads to algorithms which are simpler and faster than existing algorithms for the alignment problem.

international conference on supercomputing | 2000

Next-generation generic programming and its application to sparse matrix computations

Nikolay Mateev; Keshav Pingali; Paul Stodghill; Vladimir Kotlyar

The contributions of this paper are the following.We introduce a new variety of generic programming in which algorithm implementors use a different API than data structure designers, the gap between the APIs being bridged by restructuring compilers. One view of this approach is that it exploits restructuring compiler technology to perform a novel kind of template instantiation. We demonstrate the usefulness of this new generic programming technology by deploying it in a system that generates efficient sparse codes from high-level algorithms and specifications of sparse matrix formats. We argue that sparse matrix formats should be viewed as indexed-sequential access data structures (in the database sense), and show that appropriate abstractions of the index structure of common formats can be conveyed to a restructuring compiler through the type system of a modern language that supports inheritance and templates.

measurement and modeling of computer systems | 2005

Automatic measurement of memory hierarchy parameters

Kamen Yotov; Keshav Pingali; Paul Stodghill

The running time of many applications is dominated by the cost of memory operations. To optimize such applications for a given platform, it is necessary to have a detailed knowledge of the memory hierarchy parameters of that platform. In practice, this information is poorly documented if at all. Moreover, there is growing interest in self-tuning, autonomic software systems that can optimize themselves for different platforms; these systems must determine memory hierarchy parameters automatically without human intervention.One solution is to use micro-benchmarks to determine the parameters of the memory hierarchy. In this paper, we argue that existing micro-benchmarks are inadequate, and present novel micro-benchmarks for determining parameters of all levels of the memory hierarchy, including registers, all data caches and the translation look-aside buffer. We have implemented these micro-benchmarks in a tool called X-Ray that can be ported easily to new platforms. We present experimental results that show that X-Ray successfully determines memory hierarchy parameters on current platforms, and compare its accuracy with that of existing tools.

Journal of Bacteriology | 2011

Characterization of the Fur Regulon in Pseudomonas syringae pv. tomato DC3000

Bronwyn G. Butcher; Philip A. Bronstein; Christopher R. Myers; Paul Stodghill; James J. Bolton; Eric Markel; Melanie J. Filiatrault; Bryan Swingle; Ahmed Gaballa; John D. Helmann; David J. Schneider; Samuel Cartinhour

The plant pathogen Pseudomonas syringae pv. tomato DC3000 (DC3000) is found in a wide variety of environments and must monitor and respond to various environmental signals such as the availability of iron, an essential element for bacterial growth. An important regulator of iron homeostasis is Fur (ferric uptake regulator), and here we present the first study of the Fur regulon in DC3000. Using chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-seq), 312 chromosomal regions were highly enriched by coimmunoprecipitation with a C-terminally tagged Fur protein. Integration of these data with previous microarray and global transcriptome analyses allowed us to expand the putative DC3000 Fur regulon to include genes both repressed and activated in the presence of bioavailable iron. Using nonradioactive DNase I footprinting, we confirmed Fur binding in 41 regions, including upstream of 11 iron-repressed genes and the iron-activated genes encoding two bacterioferritins (PSPTO_0653 and PSPTO_4160), a ParA protein (PSPTO_0855), and a two-component system (TCS) (PSPTO_3382 to PSPTO_3380).

international conference on supercomputing | 2005

Think globally, search locally

Kamen Yotov; Keshav Pingali; Paul Stodghill

A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which is implemented in most compilers, is to use analytical models to determine these values. The other approach, used in library generators like ATLAS, is to perform a global empirical search over the space of parameter values.Neither approach is completely suitable for use in general-purpose compilers that must generate high quality code for large programs running on complex architectures. Model-driven optimization may incur a performance penalty of 10-20% even for a relatively simple code like matrix multiplication. On the other hand, global search is not tractable for optimizing large programs for complex architectures because the optimization space is too large.In this paper, we advocate a methodology for generating high-performance code without increasing search time dramatically. Our methodology has three components: (i) modeling, (ii) local search, and (iii) model refinement. We demonstrate this methodology by using it to eliminate the performance gap between code produced by a model-driven version of ATLAS described by us in prior work, and code produced by the original ATLAS system using global search.

Explore More