Ken Kennedy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ken Kennedy is active.

Explore More

Publication

Featured researches published by Ken Kennedy.

ACM Transactions on Programming Languages and Systems | 1987

Automatic translation of FORTRAN programs to vector form

Randy Allen; Ken Kennedy

The recent success of vector computers such as the Cray-1 and array processors such as those manufactured by Floating Point Systems has increased interest in making vector operations available to the FORTRAN programmer. The FORTRAN standards committee is currently considering a successor to FORTRAN 77, usually called FORTRAN 8x, that will permit the programmer to explicitly specify vector and array operations. Although FORTRAN 8x will make it convenient to specify explicit vector operations in new programs, it does little for existing code. In order to benefit from the power of vector hardware, existing programs will need to be rewritten in some language (presumably FORTRAN 8x) that permits the explicit specification of vector operations. One way to avoid a massive manual recoding effort is to provide a translator that discovers the parallelism implicit in a FORTRAN program and automatically rewrites that program in FORTRAN 8x. Such a translation from FORTRAN to FORTRAN 8x is not straightforward because FORTRAN DO loops are not always semantically equivalent to the corresponding FORTRAN 8x parallel operation. The semantic difference between these two constructs is precisely captured by the concept of dependence. A translation from FORTRAN to FORTRAN 8x preserves the semantics of the original program if it preserves the dependences in that program. The theoretical background is developed here for employing data dependence to convert FORTRAN programs to parallel form. Dependence is defined and characterized in terms of the conditions that give rise to it; accurate tests to determine dependence are presented; and transformations that use dependence to uncover additional parallelism are discussed.

symposium on principles of programming languages | 1983

Conversion of control dependence to data dependence

John R. Allen; Ken Kennedy; Carrie Porterfield; Joe D. Warren

Program analysis methods, especially those which support automatic vectorization, are based on the concept of interstatement dependence where a dependence holds between two statements when one of the statements computes values needed by the other. Powerful program transformation systems that convert sequential programs to a form more suitable for vector or parallel machines have been developed using this concept [AllK 82, KKLW 80].The dependence analysis in these systems is based on data dependence. In the presence of complex control flow, data dependence is not sufficient to transform programs because of the introduction of control dependences. A control dependence exists between two statements when the execution of one statement can prevent the execution of the other. Control dependences do not fit conveniently into dependence-based program translators.One solution is to convert all control dependences to data dependences by eliminating goto statements and introducing logical variables to control the execution of statements in the program. In this scheme, action statements are converted to IF statements. The variables in the conditional expression of an IF statement can be viewed as inputs to the statement being controlled. The result is that control dependences between statements become explicit data dependences expressed through the definitions and uses of the controlling logical variables.This paper presents a method for systematically converting control dependences to data dependences in this fashion. The algorithms presented here have been implemented in PFC, an experimental vectorizer written at Rice University.

architectural support for programming languages and operating systems | 1991

Software prefetching

David Callahan; Ken Kennedy; Allan Porterfield

We present an approach, called software prefetching, to reducing cache miss latencies. By providing a nonblocking prefetch instruction that causes data at a specified memory address to be brought into cache, the compiler can overlap the memory latency with other computation. Our simulations show that, even when generated by a very simple compiler algorithm, prefetch instructions can eliminate nearly all cache misses, while causing only modest increases in data traffic between memory and cache.

programming language design and implementation | 1990

Improving register allocation for subscripted variables

David Callahan; Steve Carr; Ken Kennedy

Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables.In this paper, we present a source-to-source transformation, called scalar replacement, that finds opportunities for reuse of subscripted variables and replaces the references involved by references to temporary scalar variables. The objective is to increase the likelihood that these elements will be assigned to registers by the coloring-based register allocators found in most compilers. In addition, we present transformations to improve the overall effectiveness of scalar replacement and show how these transformations can be applied in a variety of loop nest types. Finally, we present experimental results showing that these techniques are extremely effective---capable of achieving integer factor speedups over code generated by good optimizing compilers of conventional design.

The Journal of Supercomputing | 1988

Compiling programs for distributed-memory multiprocessors

David Callahan; Ken Kennedy

We describe a new approach to programming distributed-memory computers. Rather than having each node in the system explicitly programmed, we derive an efficient message-passing program from a sequential shared-memory program annotated with directions on how elements of shared arrays are distributed to processors. This article describes one possible input language for describing distributions and then details the compilation process and the optimization necessary to generate an efficient program.

cluster computing and the grid | 2005

Task scheduling strategies for workflow-based applications in grids

Jim Blythe; Sonal Jain; Ewa Deelman; Yolanda Gil; Karan Vahi; Anirban Mandal; Ken Kennedy

Grid applications require allocating a large number of heterogeneous tasks to distributed resources. A good allocation is critical for efficient execution. However, many existing grid toolkits use matchmaking strategies that do not consider overall efficiency for the set of tasks to be run. We identify two families of resource allocation algorithms: task-based algorithms, that greedily allocate tasks to resources, and workflow-based algorithms, that search for an efficient allocation for the entire workflow. We compare the behavior of workflow-based algorithms and task-based algorithms, using simulations of workflows drawn from a real application and with varying ratios of computation cost to data transfer cost. We observe that workflow-based approaches have a potential to work better for data-intensive applications even when estimates about future tasks are inaccurate.

International Journal of Parallel Programming | 2005

New grid scheduling and rescheduling methods in the GrADS project

Fran Berman; Henri Casanova; Andrew A. Chien; Keith D. Cooper; Holly Dail; Anshuman Dasgupta; W. Deng; Jack J. Dongarra; Lennart Johnsson; Ken Kennedy; Charles Koelbel; Bo Liu; Xin Liu; Anirban Mandal; Gabriel Marin; Mark Mazina; John M. Mellor-Crummey; Celso L. Mendes; A. Olugbile; Jignesh M. Patel; Daniel A. Reed; Zhiao Shi; Otto Sievert; Huaxia Xia; A. YarKhan

The goal of the Grid Application Development Software (GrADS) Project is to provide programming tools and an execution environment to ease program development for the Grid. This paper presents recent extensions to the GrADS software framework: a new approach to scheduling workflow computations, applied to a 3-D image reconstruction application; a simple stop/migrate/restart approach to rescheduling Grid applications, applied to a QR factorization benchmark; and a process-swapping approach to rescheduling, applied to an N-body simulation. Experiments validating these methods were carried out on both the GrADS MacroGrid (a small but functional Grid) and the MicroGrid (a controlled emulation of the Grid).

programming language design and implementation | 1991

Practical dependence testing

Gina Goff; Ken Kennedy; Cllau-Wen Tseng

Precise and efficient dependence tests are essential to theeffectivermss ofaparallelizing compiler. This paper proposes a dependence testing scheme based on classifyingpairs ofsubscripted variable references. Exact yet fast dependence tests are presented for certain classes ofarray references, as well as empirical results showing that these references dominate scientific Fortran codes. These dependence tests are being implemented at Rice University in both PFC, aparallelizing compiler, and ParaScope, a parallel programming environment,

IEEE Transactions on Parallel and Distributed Systems | 1991

An implementation of interprocedural bounded regular section analysis

Paul Havlak; Ken Kennedy

Regular section analysis, which summarizes interprocedural side effects on subarrays in a form useful to dependence analysis, while avoiding the complexity of prior solutions, is shown to be a practical addition to a production compiler. Optimizing compilers should produce efficient code even in the presence of high-level language constructs. However, current programming support systems are significantly lacking in their ability to analyze procedure calls. This deficiency complicates parallel programming, because loops with calls can be a significant source of parallelism. The performance of regular section analysis is compared to two benchmarks: the LINPACK library of linear algebra subroutines and the Rice Compiler Evaluation Program Suite (RiCEPS), a set of complete application codes from a variety of scientific disciplines. The experimental results demonstrate that regular section analysis is an effective means of discovering parallelism, given programs written in an appropriately modular programming style. >

programming language design and implementation | 1989

Coloring heuristics for register allocation

Preston Briggs; Keith D. Cooper; Ken Kennedy; Linda Torczon

We describe an improvement to a heuristic introduced by Chaitin for use in graph coloring register allocation. Our modified heuristic produces better colorings, with less spill code. It has similar compile-time and implementation requirements. We present experimental data to compare the two methods.

Explore More