Robert H. Kuhn
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Robert H. Kuhn.
symposium on principles of programming languages | 1981
David J. Kuck; Robert H. Kuhn; David A. Padua; Bruce Leasure; Michael Wolfe
Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs. The second are abstraction transformations that deal more globally with a dependence graph. These transformations have been implemented and applied to several different types of high-speed architectures.
symposium on code generation and optimization | 2012
Xing Zhou; Jean Pierre Giacalone; María Jesús Garzarán; Robert H. Kuhn; Yang Ni; David A. Padua
This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to balance communication overhead and redundant computation, and thus has the potential to provide better performance. In this paper, we describe the hierarchical overlapped tiling optimization and its implementation in an OpenCL compiler. We also evaluate the effectiveness of this optimization using 8 programs that implement different forms of stencil computation. Our results show that hierarchical overlapped tiling achieves an average 37% speedup over traditional tiling on a 32-core workstation.
international parallel and distributed processing symposium | 2002
Rudolf Eigenmann; Jay Hoeflinger; Robert H. Kuhn; David A. Padua; Ayon Basumallik; Seung-Jai Min; Jiajing Zhu
This paper presents an overview of an ongoing NSF-sponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory systems. The first part of the paper discusses a prototype compiler, now under development, that will accept OpenMP and will target TreadMarks, a Software Distributed Shared Memory System (SDSM), and Message-Passing Interface (MPI) library routines. A second part of the paper presents ideas for OpenMP extensions that enable the programmer to override the compiler whenever automatic methods fail to generate high-quality code.
international symposium on computer architecture | 1980
Robert H. Kuhn
In this paper, we consider the problem of restructuring or transforming algorithms to efficiently use a single-stage interconnection network. All algorithms contain some freedom in the way they are mapped to a machine. We use this freedom to show that superior interconnection efficiency can be obtained by implementing the interconnections required by the algorithm within the context of the algorithm rather than attempting to implement each request individually. The interconnection considered is the bidirectional shuffle-shift. It is shown that two algorithm transformations are useful for implementing several lower triangular and tridiagonal system algorithms on the shuffle-shift network. Of the 14 algorithms considered, 85% could be implemented on this network. The transformations developed to produce these results are described. They are general-purpose in nature and can be applied to a much larger class of algorithms.
international parallel and distributed processing symposium | 2006
Bingchen Li; Kang Chen; Zhiteng Huang; Hrabri Rajic; Robert H. Kuhn
Grid computing provides a very rich environment for scientific calculations. In addition to the challenges it provides, it also offers new opportunities for optimization. In this paper we have utilized DFS (distributed file streaming) framework to speed up NAS grid benchmark workflows. By studying I/O patterns of NGB codes we have identified program locations where it is possible to overlap computation and data workflow phases. By integrating DFS into NGB, we demonstrate a useful method of improving overall workflow efficiency by streaming the output of the current process to make an input of the following stage, reducing a workflow to a series of distributed producer consumer stages. DFS framework eliminates file transfers and in the process makes process scheduling more efficient, leading to overall performance improvements in the turnaround time for HC (helical chain) data flow graph under Globus grid environment with the embedded DFS over the original version of the benchmark
Archive | 1984
David J. Kuck; Robert H. Kuhn; Bruce Leasure; Michael Wolfe
Archive | 1984
David J. Kuck; Robert H. Kuhn; Bruce Leasure; Michael Wolfe
Archive | 2006
Eric Huang; Hu Chen; Wenguang Chen; Robert H. Kuhn
Archive | 2002
Hrabri Rajic; Robert H. Kuhn
IEEE Computer Society | 1981
Robert H. Kuhn; David A. Padua