Catherine Olschanowsky
Colorado State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Catherine Olschanowsky.
ieee international conference on high performance computing data and analytics | 2014
Catherine Olschanowsky; Michelle Mills Strout; Stephen M. Guzik; John Loffeld; J. Hittinger
Structured-grid PDE solver frameworks parallelize over boxes, which are rectangular domains of cells or faces in a structured grid. In the Chombo framework, the box sizes are typically 163 or 323, but larger box sizes such as 1283 would result in less surface area and therefore less storage, copying, and/or ghost cells communication overhead. Unfortunately, current on node parallelization schemes perform poorly for these larger box sizes. In this paper, we investigate 30 different inter-loop optimization strategies and demonstrate the parallel scaling advantages of some of these variants on NUMA multicore nodes. Shifted, fused, and communication-avoiding variants for 1283 boxes result in close to ideal parallel scaling and come close to matching the performance of 163 boxes on three different multicore systems for a benchmark that is a proxy for program idioms found in Computational Fluid Dynamic (CFD) codes.
network aware data management | 2015
Chengyu Fan; Susmit Shannigrahi; Steve DiBenedetto; Catherine Olschanowsky; Christos Papadopoulos; Harvey B Newman
Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network. We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control.
parallel computing | 2016
Michelle Mills Strout; Alan LaMielle; Larry Carter; Jeanne Ferrante; Barbara Kreaseck; Catherine Olschanowsky
Sparse Polyhedral Framework(SPF) to specify loop transformations for irregular codes.We describe a code generator prototype built on SPF.We present experimental results comparing generated code against hand-coded. Applications that manipulate sparse data structures contain memory reference patterns that are unknown at compile time due to indirect accesses such as ABi. To exploit parallelism and improve locality in such applications, prior work has developed a number of Run-Time Reordering Transformations (RTRTs). This paper presents the Sparse Polyhedral Framework (SPF) for specifying RTRTs and compositions thereof and algorithms for automatically generating efficient inspector and executor code to implement such transformations. Experimental results indicate that the performance of automatically generated inspectors and executors competes with the performance of hand-written ones when further optimization is done.
workshop on local and metropolitan area networks | 2014
Catherine Olschanowsky; Susmit Shannigrahi; Christos Papadopoulos
Climate and other big data applications face substantial problems in terms of data storage, retrieval, sharing and management. While several community repositories and tools are available to help with climate data, these problems still persist and the community is actively looking for better solutions. In this project we apply NDN to support climate modeling applications. The information-centric nature of NDN, where content becomes a first class entity, simplifies many of the problems in this domain. NDN offers lightweight data publication, discovery and retrieval compared to IP-based solutions. However, introducing a new network architecture to a mature domain that routinely produces petabytes of datasets and a plethora of assorted tools to manipulate them, is a risky proposition. The advantages of NDN alone may not be sufficient to overcome the natural inertia. Our approach is to introduce NDN while carefully avoiding undue disruption to existing workflows. To that extent we employ a user interface that employs familiar filesystem operations to publish, discover and retrieve data, integrated with domain-specific translators that automatically convert and publish datasets as NDN objects. We outline the advantages of NDN in this application domain and the challenges we faced during the adaptation. We believe this is the first exercise in applying NDN in an existing, large, mature application domain.
languages and compilers for parallel computing | 2012
Michelle Mills Strout; Geri Georg; Catherine Olschanowsky
The Sparse Polyhedral Framework (SPF) extends the Polyhedral Model by using the uninterpreted function call abstraction for the compile-time specification of run-time reordering transformations such as loop and data reordering and sparse tiling approaches that schedule irregular sets of iteration across loops. The Polyhedral Model represents sets of iteration points in imperfectly nested loops with unions of polyhedral and represents loop transformations with affine functions applied to such polyhedra sets. Existing tools such as ISL, Cloog, and Omega manipulate polyhedral sets and affine functions, however the ability to represent the sets and functions where some of the constraints include uninterpreted function calls such as those needed in the SPF is non-existant or severely restricted. This paper presents algorithms for manipulating sets and relations with uninterpreted function symbols to enable the Sparse Polyhedral Framework. The algorithms have been implemented in an open source, C++ library called IEGenLib (The Inspector/Executor Generator Library).
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013
Christopher D. Krieger; Michelle Mills Strout; Catherine Olschanowsky; Andrew Stone; Stephen M. Guzik; Xinfeng Gao; Carlo Bertolli; Paul H. J. Kelly; Gihan R. Mudalige; Brian Van Straalen; Samuel Williams
There is a significant, established code base in the scientific computing community. Some of these codes have been parallelized already but are now encountering scalability issues due to poor data locality, inefficient data distributions, or load imbalance. In this work, we introduce a new abstraction called loop chaining in which a sequence of parallel and/or reduction loops that explicitly share data are grouped together into a chain. Once specified, a chain of loops can be viewed as a set of iterations under a partial ordering. This partial ordering is dictated by data dependencies that, as part of the abstraction, are exposed, thereby avoiding inter-procedural program analysis. Thus a loop chain is a partially ordered set of iterations that makes scheduling and determining data distributions across loops possible for a compiler and/or run-time system. The flexibility of being able to schedule across loops enables better management of the data locality and parallelism tradeoff. In this paper, we define the loop chaining concept and present three case studies using loop chains in scientific codes: the sparse matrix Jacobi benchmark, a domain-specific library, OP2, used in full applications with unstructured grids, and a domain-specific library, Chombo, used in full applications with structured grids. Preliminary results for the Jacobi benchmark show that a loop chain enabled optimization, full sparse tiling, results in a speedup of as much as 2.68x over a parallelized, blocked implementation on a multicore system with 40 cores.
Computers & Mathematics With Applications | 2016
Stephen M. Guzik; Xinfeng Gao; Catherine Olschanowsky
Abstract This work focuses on the development of a high-performance fourth-order finite-volume method to solve the nonlinear partial differential equations governing the compressible Navier–Stokes equations on a Cartesian grid with adaptive mesh refinement. The novelty of the present study is to introduce the loop chaining concept to this complex fourth-order fluid dynamics algorithm for significant improvement in code performance on parallel machines. Specific operations involved in the algorithm include the finite-volume formulation of fourth-order spatial discretization stencils and optimal inter-loop parallelization strategies. Numerical fluxes of the Navier–Stokes equations comprise the hyperbolic (inviscid) and elliptic (viscous) components. The hyperbolic flux is evaluated using high-resolution Godunov’s method and the elliptic flux is based on fourth-order centered-difference methods everywhere in the computational domain. The use of centered-difference methods everywhere supports the idea of fusing modular codes to achieve high efficiency on modern computers. Temporal discretization is performed using the standard fourth-order Runge–Kutta method. The fourth-order accuracy of solution in space and time is verified with a transient Couette flow problem. The algorithm is applied to solve the Sod’s shock tube and the transient flat-plate boundary layer flow. The numerical predictions are validated by comparing to the analytical solutions. The performance of the baseline code is compared to that of the fused scheme which fuses modular codes via loop chaining concept and a significant improvement in execution time is observed.
international parallel and distributed processing symposium | 2014
Michelle Mills Strout; Fabio Luporini; Christopher D. Krieger; Carlo Bertolli; Gheorghe-Teodor Bercea; Catherine Olschanowsky; J. Ramanujam; Paul H. J. Kelly
Many scientific applications are organized in a data parallel way: as sequences of parallel and/or reduction loops. This exposes parallelism well, but does not convert data reuse between loops into data locality. This paper focuses on this issue in parallel loops whose loop-to-loop dependence structure is data-dependent due to indirect references such as A[B[i]]. Such references are a common occurrence in sparse matrix computations, molecular dynamics simulations, and unstructured-mesh computational fluid dynamics (CFD). Previously, sparse tiling approaches were developed for individual benchmarks to group iterations across such loops to improve data locality. These approaches were shown to benefit applications such as moldyn, Gauss-Seidel, and the sparse matrix powers kernel, however the run-time routines for performing sparse tiling were hand coded per application. In this paper, we present a generalized full sparse tiling algorithm that uses the newly developed loop chain abstraction as input, improves inter-loop data locality, and creates a task graph to expose shared-memory parallelism at runtime. We evaluate the overhead and performance impact of the generalized full sparse tiling algorithm on two codes: a sparse Jacobi iterative solver and the Airfoil CFD benchmark.
Proceedings of the Third International Workshop on Accelerator Programming Using Directives | 2016
Ian J. Bertolacci; Michelle Mills Strout; Stephen M. Guzik; Jordan Riley; Catherine Olschanowsky
Exposing opportunities for parallelization while explicitly managing data locality is the primary challenge to porting and optimizing existing computational science simulation codes to improve performance and accuracy. OpenMP provides many mechanisms for expressing parallelism, but it primarily remains the programmers responsibility to group computations to improve data locality. The loopchain abstraction, where data access patterns are included with the specification of parallel loops, provides compilers with sufficient information to automate the parallelism versus data locality tradeoff. In this paper, we present a loop chain pragma and an extension to the omp for to enable the specification of loop chains and high-level specifications of schedules on loop chains. We show example usage of the extensions, describe their implementation, and show preliminary performance results for some simple examples.
ieee international conference on technologies for homeland security | 2013
Catherine Olschanowsky; M. Lawrence Weikum; Jason Smith; Christos Papadopoulos; Daniel Massey
The Internet relies on BGP for global routing, but there are many open questions related to BGP. Some researchers rely on BGP data to better understand routing behavior and develop new routing algorithms. Other researchers use BGP data to investigate issues that range from IP allocations to regional Internet connectivity in the face of political turmoil. And of course BGP data is used in routing security issue, both to detect issues, evaluate solutions, and even issue warnings or block invalid routes. All of these research challenges require access to a reliable set of BGP data from geographically diverse locations. This papers presents the BGPmon approach to collecting and distributing BGP data at global scale. BGPmon collects data from diverse set of peers and distributes the data in real-time to any interested client. BGPmon fulfills three main design objectives: it provides a scalable data collection solution, maintains data integrity despite traffic surges and slow client processing, and it provides a suite of associated tools in order to ease the overhead of developing BGP data processing tools. We demonstrate the effectiveness of the framework with a brief characterization of the data collected from direct peers.