Katherine E. Isaacs
University of California, Davis
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Katherine E. Isaacs.
IEEE Transactions on Visualization and Computer Graphics | 2012
Aaditya G. Landge; Joshua A. Levine; Abhinav Bhatele; Katherine E. Isaacs; Todd Gamblin; Martin Schulz; S. H. Langer; Peer-Timo Bremer; Valerio Pascucci
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3Ds performance on an IBM Blue Gene/P system.
ieee international conference on high performance computing data and analytics | 2012
Abhinav Bhatele; Todd Gamblin; Steven H. Langer; Peer-Timo Bremer; Erik W. Draeger; Bernd Hamann; Katherine E. Isaacs; Aaditya G. Landge; Joshua A. Levine; Valerio Pascucci; Martin Schulz; Charles H. Still
The placement of tasks in a parallel application on specific nodes of a supercomputer can significantly impact performance. Traditionally, this task mapping has focused on reducing the distance between communicating tasks on the physical network. This minimizes the number of hops that point-to-point messages travel and thus reduces link sharing between messages and contention. However, for applications that use collectives over sub-communicators, this heuristic may not be optimal. Many collectives can benefit from an increase in bandwidth even at the cost of an increase in hop count, especially when sending large messages. For example, placing communicating tasks in a cube configuration rather than a plane or a line on a torus network increases the number of possible paths messages might take. This increases the available bandwidth which can lead to significant performance gains. We have developed Rubik, a tool that provides a simple and intuitive interface to create a wide variety of mappings for structured communication patterns. Rubik supports a number of elementary operations such as splits, tilts, or shifts, that can be combined into a large number of unique patterns. Each operation can be applied to disjoint groups of processes involved in collectives to increase the effective bandwidth. We demonstrate the use of Rubik for improving performance of two parallel codes, pF3D and Qbox, which use collectives over sub-communicators.
eurographics | 2014
Katherine E. Isaacs; Alfredo Gimenez; Ilir Jusufi; Todd Gamblin; Abhinav Bhatele; Martin Schulz; Bernd Hamann; Peer-Timo Bremer
Performance visualization comprises techniques that aid developers and analysts in improving the time and energy efficiency of their software. In this work, we discuss performance as it relates to visualization and survey existing approaches in performance visualization. We present an overview of what types of performance data can be collected and a categorization of the types of goals that performance visualization techniques can address. We develop a taxonomy for the contexts in which different performance visualizations reside and describe the state of the art research pertaining to each. Finally, we discuss unaddressed and future challenges in performance visualization.
IEEE Transactions on Visualization and Computer Graphics | 2014
Katherine E. Isaacs; Peer-Timo Bremer; Ilir Jusufi; Todd Gamblin; Abhinav Bhatele; Martin Schulz; Bernd Hamann
With the continuous rise in complexity of modern supercomputers, optimizing the performance of large-scale parallel programs is becoming increasingly challenging. Simultaneously, the growth in scale magnifies the impact of even minor inefficiencies - potentially millions of compute hours and megawatts in power consumption can be wasted on avoidable mistakes or sub-optimal algorithms. This makes performance analysis and optimization critical elements in the software development process. One of the most common forms of performance analysis is to study execution traces, which record a history of per-process events and interprocess messages in a parallel application. Trace visualizations allow users to browse this event history and search for insights into the observed performance behavior. However, current visualizations are difficult to understand even for small process counts and do not scale gracefully beyond a few hundred processes. Organizing events in time leads to a virtually unintelligible conglomerate of interleaved events and moderately high process counts overtax even the largest display. As an alternative, we present a new trace visualization approach based on transforming the event history into logical time inferred directly from happened-before relationships. This emphasizes the codes structural behavior, which is much more familiar to the application developer. The original timing data, or other information, is then encoded through color, leading to a more intuitive visualization. Furthermore, we use the discrete nature of logical timelines to cluster processes according to their local behavior leading to a scalable visualization of even long traces on large process counts. We demonstrate our system using two case studies on large-scale parallel codes.
ieee international conference on high performance computing data and analytics | 2012
Abhinav Bhatele; Todd Gamblin; Katherine E. Isaacs; Brian T. N. Gunney; Martin Schulz; Peer-Timo Bremer; Bernd Hamann
Performance analysis of parallel scientific codes is becoming increasingly difficult due to the rapidly growing complexity of applications and architectures. Existing tools fall short in providing intuitive views that facilitate the process of performance debugging and tuning. In this paper, we extend recent ideas of projecting and visualizing performance data for faster, more intuitive analysis of applications. We collect detailed per-level and per-phase measurements for a dynamically load-balanced, structured AMR library and project per-core data collected in the hardware domain on to the applications communication topology. We show how our projections and visualizations lead to a rapid diagnosis of and mitigation strategy for a previously elusive scaling bottleneck in the library that is hard to detect using conventional tools. Our new insights have resulted in a 22% performance improvement for a 65,536-core run of the AMR library on an IBM Blue Gene/P system.
ieee international conference on high performance computing data and analytics | 2012
Katherine E. Isaacs; Aaditya G. Landge; Todd Gamblin; Peer-Timo Bremer; Valerio Pascucci; Bernd Hamann
The growth in size and complexity of scaling applications and the systems on which they run pose challenges in analyzing and improving their overall performance. With metrics coming from thousands or millions of processes, visualization techniques are necessary to make sense of the increasing amount of data. To aid the process of exploration and understanding, we announce the initial release of Boxfish, an extensible tool for manipulating and visualizing data pertaining to application behavior. Combining and visually presenting data and knowledge from multiple domains, such as the applications communication patterns and the hardwares network configuration and routing policies, can yield the insight necessary to discover the underlying causes of observed behavior. Boxfish allows users to query, filter and project data across these domains to create interactive, linked visualizations.
IEEE Transactions on Parallel and Distributed Systems | 2016
Katherine E. Isaacs; Todd Gamblin; Abhinav Bhatele; Martin Schulz; Bernd Hamann; Peer-Timo Bremer
Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a traces logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operations delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure from traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. We apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.
ieee international conference on high performance computing, data, and analytics | 2014
Abhinav Bhatele; Nikhil Jain; Katherine E. Isaacs; Ronak Buch; Todd Gamblin; Steven H. Langer; Laxmikant V. Kalé
Six of the ten fastest supercomputers in the world in 2014 use a torus interconnection network for message passing between compute nodes. Torus networks provide high bandwidth links to near-neighbors and low latencies over multiple hops on the network. However, large diameters of such networks necessitate a careful placement of parallel tasks on the compute nodes to minimize network congestion. This paper presents a methodological study of optimizing application performance on a five-dimensional torus network via the technique of topology-aware task mapping. Task mapping refers to the placement of processes on compute nodes while carefully considering the network topology between the nodes and the communication behavior of the application. We focus on the IBM Blue Gene/Q machine and two production applications - a laser-plasma interaction code called pF3D and a lattice QCD application called MILC. Optimizations presented in the paper improve the communication performance of pF3D by 90% and that of MILC by up to 47%.
ieee international conference on high performance computing data and analytics | 2015
Katherine E. Isaacs; Abhinav Bhatele; Jonathan Lifflander; David Böhme; Todd Gamblin; Martin Schulz; Bernd Hamann; Peer-Timo Bremer
Asynchrony and non-determinism in Charm++ programs present a significant challenge in analyzing their event traces. We present a new framework to organize event traces of parallel programs written in Charm++. Our reorganization allows one to more easily explore and analyze such traces by providing context through logical structure. We describe several heuristics to compensate for missing dependencies between events that currently cannot be easily recorded. We introduce a new task ordering that recovers logical structure from the non-deterministic execution order. Using the logical structure, we define several metrics to help guide developers to performance problems. We demonstrate our approach through two proxy applications written in Charm++. Finally, we discuss the applicability of this framework to other task-based runtimes and provide guidelines for tracing to support this form of analysis.
Proceedings of the First Workshop on Visual Performance Analysis | 2014
Collin M. McCarthy; Katherine E. Isaacs; Abhinav Bhatele; Peer-Timo Bremer; Bernd Hamann
Understanding the interactions between a parallel application and the interconnection network over which it exchanges data is critical to optimizing performance in modern supercomputers. However, recent supercomputing architectures use networks that do not have natural low-dimensional representations, making them difficult to comprehend or visualize. In particular, high-dimensional torus networks are common and are used in four of the top ten supercomputers and eight of the top ten on the Graph500 list. We present a new visualization of five-dimensional torus networks. We use four connected views depicting the network at different levels of detail, allowing analysts to observe general large-scale traffic patterns while simultaneously viewing individual links or outliers in any specific section of the network. We demonstrate this approach by analyzing network traffic for a pF3D simulation running on the IBM Blue Gene/Q architecture, and show how it is both intuitive and effective for understanding and optimizing parallel application behavior.
