Michelle Mills Strout

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michelle Mills Strout is active.

Explore More

Publication

Featured researches published by Michelle Mills Strout.

ACM Transactions on Mathematical Software | 2008

OpenAD/F: A Modular Open-Source Tool for Automatic Differentiation of Fortran Codes

Jean Utke; Uwe Naumann; Mike Fagan; Nathan R. Tallent; Michelle Mills Strout; Patrick Heimbach; Chris Hill; Carl Wunsch

The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program. The derivative evaluation is performed by a Fortran code resulting from the analysis and transformation of the original program that defines the function of interest. Open/ADF has been designed with a particular emphasis on modularity, flexibility, and the use of open source components. While the code transformation follows the basic principles of automatic differentiation, the tool implements new algorithmic approaches at various levels, for example, for basic block preaccumulation and call graph reversal. Unlike most other automatic differentiation tools, Open/ADF uses components provided by the Open/AD framework, which supports a comparatively easy extension of the code transformations in a language-independent fashion. It uses code analysis results implemented in the OpenAnalysis component. The interface to the language-independent transformation engine is an XML-based format, specified through an XML schema. The implemented transformation algorithms allow efficient derivative computations using locally optimized cross-country sequences of vertex, edge, and face elimination steps. Specifically, for the generation of adjoint codes, Open/ADF supports various code reversal schemes with hierarchical checkpointing at the subroutine level. As an example from geophysical fluid dynamics, a nonlinear time-dependent scalable, yet simple, barotropic ocean model is considered. OpenAD/Fs reverse mode is applied to compute sensitivities of some of the models transport properties with respect to gridded fields such as bottom topography as independent (control) variables.

programming language design and implementation | 2003

Compile-time composition of run-time data and iteration reorderings

Michelle Mills Strout; Larry Carter; Jeanne Ferrante

Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compile-time. Prior work has developed run-time reorderings of data and computation that enhance locality in such applications.This paper presents a compile-time framework that allows the explicit composition of run-time data and iteration-reordering transformations. Our framework builds on the iteration-reordering framework of Kelly and Pugh to represent the effects of a given composition. To motivate our extension, we show that new compositions of run-time reordering transformations can result in better performance on three benchmarks.We show how to express a number of run-time data and iteration-reordering transformations that focus on improving data locality. We also describe the space of possible run-time reordering transformations and how existing transformations fit within it. Since sparse tiling techniques are included in our framework, they become more generally applicable, both to a larger class of applications, and in their composition with other reordering transformations. Finally, within the presented framework data need be remapped only once at runtime for a given composition thus exhibiting one example of overhead reductions the framework can express.

ieee international conference on high performance computing data and analytics | 2004

Sparse Tiling for Stationary Iterative Methods

Michelle Mills Strout; Larry Carter; Jeanne Ferrante; Barbara Kreaseck

In modern computers, a program’s data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss–Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparsetiled Gauss–Seidel generates a solution that is bitwise identical to traditional Gauss–Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al.

international conference on parallel processing | 2006

Data-Flow Analysis for MPI Programs

Michelle Mills Strout; Barbara Kreaseck; Paul D. Hovland

Message passing via MPI is widely used in single-program, multiple-data (SPMD) parallel programs. Existing data-flow frameworks do not model the semantics of message-passing SPMD programs, which can result in less precise and even incorrect analysis results. We present a data-flow analysis framework for performing interprocedural analysis of message-passing SPMD programs. The framework is based on the MPI-ICFG representation, which is an interprocedural control-flow graph (ICFG) augmented with communication edges between possible send and receive pairs and partial context sensitivity. We show how to formulate nonseparable data-flow analyses within our framework using reaching constants as a canonical example. We also formulate and provide experimental results for the nonseparable analysis, activity analysis. Activity analysis is a domain-specific analysis used to reduce the computation and storage requirements for automatically differentiated MPI programs. Automatic differentiation is important for application domains such as climate modeling, electronic device simulation, oil reservoir simulation, medical treatment planning and computational economics to name a few. Our experimental results show that using the MPI-ICFG data-flow analysis framework improves the precision of activity analysis and as a result significantly reduces memory requirements for the automatically differentiated versions of a set of parallel benchmarks, including some of the NAS parallel benchmarks

conference on high performance computing (supercomputing) | 2007

Multi-level tiling: M for the price of one

DaeGon Kim; Lakshminarayanan Renganarayanan; Dave Rostron; Sanjay V. Rajopadhye; Michelle Mills Strout

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. High-performance implementations use multiple levels of tiling to exploit the hierarchy of parallelism and cache/register locality. Efficient generation of multi-level tiled code is essential for effective use of multi-level tiling. Parameterized tiled code, where tile sizes are not fixed but left as symbolic parameters can enable several dynamic and run-time optimizations. Previous solutions to multi-level tiled loop generation are limited to the case where tile sizes are fixed at compile time. We present an algorithm that can generate multi-level parameterized tiled loops at the same cost as generating single-level tiled loops. The efficiency of our method is demonstrated on several benchmarks. We also present a method-useful in register tiling-for separating partial and full tiles at any arbitrary level of tiling. The code generator we have implemented is available as an open source tool.

Proceedings of the 2004 workshop on Memory system performance | 2004

Metrics and models for reordering transformations

Michelle Mills Strout; Paul D. Hovland

Irregular applications frequently exhibit poor performance on contemporary computer architectures, in large part because of their inefficient use of the memory hierarchy. Run-time data, and iteration-reordering transformations have been shown to improve the locality and therefore the performance of irregular benchmarks. This paper describes models for determining which combination of run-time data- and iteration-reordering heuristics will result in the best performance for a given dataset. We propose that the data- and iteration-reordering transformations be viewed as approximating minimal linear arrangements on two separate hypergraphs: a spatial locality hypergraph and a temporal locality hypergraph. Our results measure the efficacy of locality metrics based on these hypergraphs in guiding the selection of data-and iteration-reordering heuristics. We also introduce new iteration- and data-reordering heuristics based on the hypergraph models that result in better performance than do previous heuristics.

ieee international conference on high performance computing data and analytics | 1999

Using Apples to Schedule Simple SARA on the Computational Grid

Alan Su; Francine Berman; Richard Wolski; Michelle Mills Strout

Computational Grids, composed of distributed and often heterogeneous computing resources, have become the platform of choice for many performance-challenged applications. Proof-of-concept implementations have demonstrated that both Grids and clustered environments have the potential to provide great performance benefits to distributed resource-intensive applications. However, at the present time, careful staging, scheduling, and/or reservation of resources is essential in order for applications to achieve performance in Grid environments. If Computational Grids and shared computational clusters are to achieve their full potential, it must be possible for users to achieve application performance at any given time, and when other users are present in the system. In this paper, we describe the initial development of an AppLeS (application-level scheduler) for the resource selection portion of the Synthetic Aperture Radar Atlas (SARA) application, developed at the Jet Propulsion Laboratory (JPL) and the San Diego Supercomputer Center (SDSC). We demonstrate the effectiveness of application scheduling for distributed data applications such as SARA by providing a performance-efficient strategy for retrieving SARA data files in everyday, multiple-user Grid environments.

workshop on program analysis for software tools and engineering | 2005

Representation-independent program analysis

Michelle Mills Strout; John M. Mellor-Crummey; Paul D. Hovland

Program analysis has many applications in software engineering and high-performance computation, such as program understanding, debugging, testing, reverse engineering, and optimization. A ubiquitous compiler infrastructure does not exist; therefore, program analysis is essentially reimplemented for each compiler infrastructure. The goal of the OpenAnalysis toolkit is to separate analysis from the intermediate representation (IR) in a way that allows the orthogonal development of compiler infrastructures and program analysis. Separation of analysis from specific IRs will allow faster development of compiler infrastructures, the ability to share and compare analysis implementations, and in general quicker breakthroughs and evolution in the area of program analysis. This paper presents how we are separating analysis implementations from IRs with analysis-specific, IR-independent interfaces. Analysis-specific IR interfaces for alias/pointer analysis algorithms and reaching constants illustrate that an IR interface designed for language dependence is capable of providing enough information to support the implementation of a broad range of analysis algorithms and also represent constructs within many imperative programming languages.

international conference on computational science | 2001

Rescheduling for Locality in Sparse Matrix Computations

Michelle Mills Strout; Larry Carter; Jeanne Ferrante

In modern computer architecture the use of memory hierarchies causes a programs data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have nonaffine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup.

languages and compilers for parallel computing | 2002

Combining performance aspects of irregular gauss-seidel via sparse tiling

Michelle Mills Strout; Larry Carter; Jeanne Ferrante; Jonathan Freeman; Barbara Kreaseck

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

Explore More