Kathryn Mohror
Portland State University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Kathryn Mohror.
ieee international conference on high performance computing data and analytics | 2009
Kathryn Mohror; Karen L. Karavanic
Event traces are required to correctly diagnose a number of performance problems that arise on todays highly parallel systems. Unfortunately, the collection of event traces can produce a large volume of data that is difficult, or even impossible, to store and analyze. One approach for compressing a trace is to identify repeating trace patterns and retain only one representative of each pattern. However, determining the similarity of sections of traces, i.e., identifying patterns, is not straightforward. In this paper, we investigate pattern-based methods for reducing traces that will be used for performance analysis. We evaluate the different methods against several criteria, including size reduction, introduced error, and retention of performance trends, using both benchmarks with carefully chosen performance behaviors, and a real application.
conference on high performance computing (supercomputing) | 2005
Karen L. Karavanic; John May; Kathryn Mohror; Brian Miller; Kevin A. Huck; Rashawn L. Knapp; Brian Pugh
PerfTrack is a data store and interface for managing performance data from large-scale parallel applications. Data collected in different locations and formats can be compared and viewed in a single performance analysis session. The underlying data store used in PerfTrack is implemented with a database management system (DBMS). PerfTrack includes interfaces to the data store and scripts for automatically collecting data describing each experiment, such as build and platform details. We have implemented a prototype of PerfTrack that can use Oracle or PostgreSQL for the data store. We demonstrate the prototypes functionality with three case studies: one is a comparative study of an ASC purple benchmark on high-end Linux and AIX platforms; the second is a parameter study conducted at Lawrence Livermore National Laboratory (LLNL) on two high end platforms, a 128 node cluster of IBM Power 4 processors and BlueGene/L; the third demonstrates incorporating performance data from the Paradyn Parallel Performance Tool into an existing PerfTrack data store.
international parallel and distributed processing symposium | 2017
Ivo Jimenez; Michael A. Sevilla; Noah Watkins; Carlos Maltzahn; Jay F. Lofstead; Kathryn Mohror; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau
Independent validation of experimental results in the field of systems research is a challenging task, mainly due to differences in software and hardware in computational environments. Recreating an environment that resembles the original is difficult and time-consuming. In this paper we introduce _Popper_, a convention based on a set of modern open source software (OSS) development principles for generating reproducible scientific publications. Concretely, we make the case for treating an article as an OSS project following a DevOps approach and applying software engineering best-practices to manage its associated artifacts and maintain the reproducibility of its findings. Popper leverages existing cloud-computing infrastructure and DevOps tools to produce academic articles that are easy to validate and extend. We present a use case that illustrates the usefulness of this approach. We show how, by following the _Popper_ convention, reviewers and researchers can quickly get to the point of getting results without relying on the original authors intervention.
international parallel and distributed processing symposium | 2005
John J. Hoffman; Andrew Byrd; Kathryn Mohror; Karen L. Karavanic
This paper presents PPerfGrid, a tool that addresses the challenges involved in the exchange of heterogeneous parallel computing performance data. Parallel computing performance data exists in a wide variety of different schemas and formats, from basic text files to relational databases to XML, and it is stored on geographically dispersed host systems of various platforms. PPerfGrid uses grid services to address these challenges. PPerfGrid exposes application and execution semantic objects as grid services and publishes their location and PPerfGrid clients access this registry, locate the PPerfGrid sites with performance data they are interested in, and bind to a set of grid services that represent this data. This set of application and execution grid services provides a uniform, virtual view of the data available in a particular PPerfGrid session. PPerfGrid addresses scalability by allowing specific questions to be asked about a data store, thereby narrowing the scope of the data returned to a client. In addition, by using a grid services approach, the application and execution grid services involved in a particular query can be dynamically distributed across several hosts, thereby taking advantage of parallelism and improving scalability. We describe our PPerfGrid prototype and include data from preliminary prototype performance tests.
high performance computing and communications | 2007
Kathryn Mohror; Karen L. Karavanic
Although event tracing of parallel applications offers highly detailed performance information, tracing on current leading edge systems may lead to unacceptable perturbation of the target program and unmanageably large trace files. High end systems of the near future promise even greater scalability challenges. Development of more scalable approaches requires a detailed understanding of the interactions between current approaches and high end runtime environments. In this paper we present the results of studies that examine several sources of overhead related to tracing: instrumentation, differing trace buffer sizes, periodic buffer flushes to disk, system changes, and increasing numbers of processors in the target application. As expected, the overhead of instrumentation correlates strongly with the number of events; however, our results indicate that the contribution of writing the trace buffer increases with increasing numbers of processors. We include evidence that the total overhead of tracing is sensitive to the underlying file system.
acm sigplan symposium on principles and practice of parallel programming | 2007
Kathryn Mohror; Karen L. Karavanic
Our goal in this work was to identify and quantify the overheads of tracing parallel applications. We investigate several different sources of overhead related to tracing: trace instrumentation, periodic writing of trace files to disk, differing trace buffer sizes, system changes, and increasing numbers of processors in the target application. We encountered overheads as large as 26.7% for writing the trace file to disk. We found that buffer sizes can make a difference in the overheads, and that differences in system software can also contribute to the level of the perturbation. Our results show that the overhead of instrumentation correlates strongly with the number of events, while the overhead of writing the trace buffer increases with increasing numbers of processors.
conference on high performance computing (supercomputing) | 2004
Kathryn Mohror; Karen L. Karavanic
Programmers of message-passing codes for clusters of workstations face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data. This paper reports on our recent efforts developing a performance tool for MPI applications on Linux clusters. Our target MPI implementations were LAM/MPI and MPICH2, both of which support portions of the MPI-2 Standard. We started with an existing performance tool and added support for non-shared file systems, MPI-2 one-sided communications, dynamic process creation, and MPI Object naming. We present results using the enhanced version of the tool to examine the performance of several applications. We describe a new performance tool benchmark suite we have developed, PPerfMark, and present results for the benchmark using the enhanced tool.
international parallel and distributed processing symposium | 2017
Teng Wang; Adam Moody; Yue Zhu; Kathryn Mohror; Kento Sato; Tanzima Islam; Weikuan Yu
Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. Their aggregate storage bandwidth grows linearly with system node count. However, although scientific applications can achieve scalable write bandwidth by having each process write to its node-local burst buffer, metadata challenges remain formidable, especially for files shared across many processes. This is due to the need to track and organize file segments across the distributed burst buffers in a global index. Because this global index can be accessed concurrently by thousands or more processes in a scientific application, the scalability of metadata management is a severe performance-limiting factor. In this paper, we propose MetaKV: a key-value store that provides fast and scalable metadata management for HPC metadata workloads on distributed burst buffers. MetaKV complements the functionality of an existing key-value store with specialized metadata services that efficiently handle bursty and concurrent metadata workloads: compressed storage management, supervised block clustering, and log-ring based collective message reduction. Our experiments demonstrate that MetaKV outperforms the state-of-the-art key-value stores by a significant margin. It improves put and get metadata operations by as much as 2.66× and 6.29×, respectively, and the benefits of MetaKV increase with increasing metadata workload demand.
parallel computing | 2012
Kathryn Mohror; Karen L. Karavanic
Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.
international conference on parallel processing | 2009
Kathryn Mohror; Karen L. Karavanic; Allan Snavely
Parallel event trace visualizations can aid in discovery of the root causes of certain performance problems on high-end systems. However, traditional trace visualizations are not inherently scalable and require considerable effort on the part of the user to identify similarities and differences in performance across parallel entities. In this work, we evaluate several methods for deciding when traces of different processes in a run are similar enough that only one of the traces needs to be retained and rendered in the visualization. We show visualizations of reduced traces and evaluate them for compression, error, and retention of correct diagnostic information.
