Kathleen Knobe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kathleen Knobe is active.

Explore More

Publication

Featured researches published by Kathleen Knobe.

Journal of Parallel and Distributed Computing | 1990

Data optimization: allocation of arrays to reduce communication on SIMD machines

Kathleen Knobe; Joan D. Lukas; Guy L. Stelle Jr.

Abstract An optimizing compiler for a data parallel programming language can significantly improve program performance on a massively parallel computing system by incorporating new strategies for allocating array elements to processors. We discuss techniques for automatic layout of arrays in a compiler targeted to SIMD architectures, such as the Connection Machine computer system. Our primary goal is to minimize the cost of moving data among processors. We also attempt to minimize memory usage. Improved array layout may allow more specialized communication operations with lower cost. We discuss the algorithms to effect such improvement and present some typical examples of code fragments that can be improved significantly with respect to memory consumption and by orders of magnitude with respect to execution time.

symposium on principles of programming languages | 1998

Array SSA form and its use in parallelization

Kathleen Knobe; Vivek Sarkar

Static single assignment (SSA) form for scalars has been a significant advance. It has simplified the way we think about scalar variables. It has simplified the design of some optimizations and has made other optimizations more effective. Unfortunately none of this can be be said for SSA form for arrays. The current SSA processing of arrays views an array as a single object. But the kinds of analyses that sophisticated compilers need to perform on arrays, for example those that drive loop parallelization, are at the element level. Current SSA form for arrays is incapable of providing the element-level data flow information required for such analyses.In this paper, we introduce an Array SSA form that captures precise element-level data flow information for array variables in all cases. It is general and simple, and coincides with standard SSA form when applied to scalar variables. It can also be used for structures and other variable types that can be modeled as arrays. An important application of Array SSA form is in automatic parallelization. We show how Array SSA form can enable parallelization of any loop that is free of loop-carried true data dependences. This includes loops with loop-carried anti and output dependences, unanalyzable subscript expressions, and arbitrary control flow within an iteration. Array SSA form achieves this level of generality by making manifest its ¿ functions as runtime computations in cases that are not amenable to compile-time analysis.

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism archive | 2010

Concurrent Collections

Zoran Budimlic; Michael G. Burke; Vincent Cavé; Kathleen Knobe; Geoff Lowney; Ryan R. Newton; Jens Palsberg; David M. Peixotto; Vivek Sarkar; Frank Schlimbach; Sagnak Tasirlar

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.

acm sigplan symposium on principles and practice of parallel programming | 1999

Space-time memory: a parallel programming abstraction for interactive multimedia applications

Rishiyur S. Nikhil; Nissim Harel; James M. Rehg; Kathleen Knobe

Realistic interactive multimedia involving vision, animation, and multimedia collaboration is likely to become an important aspect of future computer applications. The scalable parallelism inherent in such applications coupled with their computational demands make them ideal candidates for SMPs and clusters of SMPs. These applications have novel requirements that offer new kinds of challenges for parallel system design.We have designed a programming system called Stampede that offers many functionalities needed to simplify development of such applications (such as high-level data sharing abstractions, dynamic cluster-wide threads, and multiple address spaces). We have built Stampede and it runs on clusters of SMPs. To date we have implemented two applications on Stampede, one of which is discussed herein.In this paper we describe a part of Stampede called Space-Time Memory (STM). It is a novel data sharing abstraction that enables interactive multimedia applications to manage a collection of time-sequenced data items simply, efficiently, and transparently across a cluster. STM relieves the application programmer from low level synchronization and data communication by providing a high level interface that subsumes buffer management, inter-thread synchronization, and location transparency for data produced and accessed anywhere in the cluster. STM also automatically handles garbage collection of data items that will no longer be accessed by any of the application threads. We discuss ease of use issues for developing applications using STM, and present preliminary performance results to show that STMs overhead is low.

acm sigplan symposium on principles and practice of parallel programming | 1988

Compiling Fortran 8x array features for the connection machine computer system

Eugene Albert; Kathleen Knobe; Joan D. Lukas; Guy L. Steele

The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs. The Connection Machine Fortran compiler generates VAX code that performs scalar operations and directs the Connection Machine to perform array operations. The Connection Machine virtual processor mechanism supports elemental operations on very large arrays. Most array operators and intrinsic functions map into single instructions or short instruction sequences. Noncontiguous array sections, array-valued subscripts, and parallel constructs such as WHERE and FORALL are also readily accommodated on the Connection Machine. In addition to such customary optimizations as common subexpression elimination, the CM Fortran compiler minimizes data motion for aligning array operations, minimizes transfers between the Connection Machine and the VAX and minimizes context switching for masked computations.

static analysis symposium | 2000

Unified Analysis of Array and Object References in Strongly Typed Languages

Stephen J. Fink; Kathleen Knobe; Vivek Sarkar

We present a simple, unified approach for the analysis and optimization of object field and array element accesses in strongly typed languages, that works in the presence of object references/pointers. This approach builds on Array SSA form [14], a uniform representation for capturing control and data flow properties at the level of array elements. The techniques presented here extend previous analyses at the array element level [15] to handle both array element and object field accesses uniformly.

international parallel and distributed processing symposium | 2010

Performance evaluation of concurrent collections on high-performance multicore computing systems

Aparna Chandramowlishwaran; Kathleen Knobe; Richard W. Vuduc

This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial “higher-level” partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6×. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.

IEEE Transactions on Parallel and Distributed Systems | 2003

Stampede: a cluster programming middleware for interactive stream-oriented applications

Rishiyur S. Nikhil; James M. Rehg; Yavor Angelov; Arnab Paul; Sameer Adhikari; Kenneth M. Mackenzie; Nissim Harel; Kathleen Knobe

Emerging application domains such as interactive vision, animation, and multimedia collaboration display dynamic scalable parallelism and high-computational requirements, making them good candidates for executing on parallel architectures such as SMPs and clusters of SMPs. Stampede is a programming system that has many of the needed functionalities such as high-level data sharing, dynamic cluster-wide threads and their synchronization, support for task and data parallelism, handling of time-sequenced data items, and automatic buffer management. We present an overview of Stampede, the primary data abstractions, the algorithmic basis of garbage collection, and the issues in implementing these abstractions on a cluster of SMPs. We also present a set of micromeasurements along with two multimedia applications implemented on top of Stampede, through which we demonstrate the low overhead of this runtime and that it is suitable for the streaming multimedia applications.

workshop on declarative aspects of multicore programming | 2009

Declarative aspects of memory management in the concurrent collections parallel programming model

Zoran Budimlic; Aparna Chandramowlishwaran; Kathleen Knobe; Geoff Lowney; Vivek Sarkar; Leo Treggiari

Concurrent Collections (CnC) is a declarative parallel language that allows the application developer to express their parallel application as a collection of high-level computations called steps that communicate via single-assignment data structures called items. A CnC program is specified in two levels. At the bottom level, an existing imperative language implements the computations within the individual computation steps. At the top level, CnC describes the relationships (ordering constraints) among the steps. The memory management mechanism of the existing imperative language manages data whose lifetime is within a computation step. A key limitation in the use of CnC for long-running programs is the lack of memory management and garbage collection for data items with lifetimes that are longer than a single computation step. Although the goal here is the same as that of classical garbage collection, the nature of problem and therefore nature of the solution is distinct. The focus of this paper is the memory management problem for these data items in CnC. We introduce a new declarative slicing annotation for CnC that can be transformed into a reference counting procedure for memory management. Preliminary experimental results obtained from a Cholesky example show that our memory management approach can result in space reductions for CnC data items of up to 28x relative to the baseline case of standard CnC without memory management.

static analysis symposium | 1998

Enabling Sparse Constant Propagation of Array Elements via Array SSA Form

Vivek Sarkar; Kathleen Knobe

We present a new static analysis technique based on Array SSA form [6]. Compared to traditional SSA form, the key enhancement in Array SSA form is that it deals with arrays at the element level instead of as monolithic objects. In addition, Array SSA form improves the φ function used for merging scalar or array variables in traditional SSA form. The computation of a φ function in traditional SSA form depends on the program’s control flow in addition to the arguments of the φ function. Our improved φ function (referred to as a φ function) includes the relevant control flow information explicitly as arguments through auxiliary variables that are called @ variables.

Explore More