Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Richard E. Hank is active.

Publication


Featured researches published by Richard E. Hank.


international symposium on computer architecture | 1995

A comparison of full and partial predicated execution support for ILP processors

Scott A. Mahlke; Richard E. Hank; James E. McCormick; David I. August; Wen-mei W. Hwu

One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support predicated execution can be difficult. On one end of the design spectrum, architectural support for full predicated execution requires increasing the number of source operands for all instructions. Full predicate support provides for the most flexibility and the largest potential performance improvements. On the other end, partial predicated execution support, such as conditional moves, requires very little change to existing architectures. This paper presents a preliminary study to qualitatively and quantitatively address the benefit of full and partial predicated execution support. With our current compiler technology, we show that the compiler can use both partial and full predication to achieve speedup in large control-intensive programs. Some details of the code generation techniques are shown to provide insight into the benefit of going from partial to full predication. Preliminary experimental results are very encouraging: partial predication provides an average of 33% performance improvement for an 8-issue processor with no predicate support while full predication provides an additional 30% improvement.


international symposium on microarchitecture | 1994

Characterizing the impact of predicated execution on branch prediction

Scott A. Mahlke; Richard E. Hank; Roger A. Bringmann; John C. Gyllenhaal; David M. Gallagher; Wen-mei W. Hwu

Branch instructions are recognized as a major impediment to exploiting instruction level parallelism. Even with sophisticated branch prediction techniques, many frequently executed branches remain difficult to predict. An architecture supporting predicated execution may allow the compiler to remove many of these hard-to-predict branches, reducing the number of branch mispredictions and thereby improving performance. We present an in-depth analysis of the characteristics of those branches which are frequently mispredicted and examine the effectiveness of an advanced compiler to eliminate these branches. Over the benchmarks studied, an average of 27% of the dynamic branches and 56% of the dynamic branch mispredictions are eliminated with predicated execution support.


ACM Transactions on Computer Systems | 1993

Sentinel scheduling: a model for compiler-controlled speculative execution

Scott A. Mahlke; William Y. Chen; Roger A. Bringmann; Richard E. Hank; Wen-mei W. Hwu; B. Ramakrishna Rau; Michael S. Schlansker

Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to efficiently handle exceptions for speculative instructions. In this article, a set of architectural features and compile-time scheduling support collectively referred to as sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for both compiler-controlled speculative execution and exception handling. All program exceptions are accurately detected and reported in a timely manner with sentinel scheduling. Recovery from exceptions is also ensured with the model. Experimental results show the effectiveness of sentinel scheduling for exploiting instruction-level parallelism and overhead associated with exception handling.


Proceedings of the IEEE | 1995

Compiler technology for future microprocessors

Wen-mei W. Hwu; Richard E. Hank; David M. Gallagher; Scott A. Mahlke; Daniel M. Lavery; Grant E. Haab; John C. Gyllenhaal; David I. August

Advances in hardware technology have made it possible for microprocessors to execute a large number of instructions concurrently (i.e., in parallel). These microprocessors take advantage of the opportunity to execute instructions in parallel to increase the execution speed of a program. As in other forms of parallel processing, the performance of these microprocessors can vary greatly depending on the qualify of the software. In particular the quality of compilers can make an order of magnitude difference in performance. This paper presents a new generation of compiler technology that has emerged to deliver the large amount of instruction-level-parallelism that is already required by some current state-of-the-art microprocessors and will be required by more future microprocessors. We introduce critical components of the technology which deal with difficult problems that are encountered when compiling programs for a high degree of instruction-level-parallelism. We present examples to illustrate the functional requirements of these components. To provide more insight into the challenges involved, we present in-depth case studies on predicated compilation and maintenance of dependence information, two of the components that are largely missing from most current commercial compilers.


international symposium on microarchitecture | 1993

Superblock formation using static program analysis

Richard E. Hank; Scott A. Mahlke; Roger A. Bringmann; John C. Gyllenhaal; Wen-mei W. Hwu

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle.<<ETX>>


international symposium on microarchitecture | 1995

Region-based compilation: an introduction and motivation

Richard E. Hank; W.W. Hwu; Bantwal R. Rau

As the amount of instruction-level parallelism required to fully utilize VLIW and superscalar processors increases, compilers must perform increasingly more aggressive analysis, optimization, parallelization and scheduling on the input programs. Traditionally, compilers have been built assuming functions as the unit of compilation. In this framework, function boundaries tend to hide valuable optimization opportunities from the compiler. Function inlining may be applied to assemble strongly coupled functions into the same compilation unit at the cost of very large function bodies. This paper introduces a new technique, called region-based compilation, where the compiler is allowed to repartition the program into more desirable compilation units. Region-based compilation allows the compiler to control problem size while exposing inter-procedural optimization and code motion opportunities.


international symposium on computer architecture | 1993

Register connection: a new approach to adding registers into instruction set architectures

Tokuzo Kiyohara; Scott A. Mahlke; William Y. Chen; Roger A. Bringmann; Richard E. Hank; Sadun Anik; Wen-mei W. Hwu

Code optimization and scheduling for superscalar and superpipelined processors often increase the register requirement of programs. For existing instruction sets with a small to moderate number of registers, this increased register requirement can be a factor that limits the effectivess of the compiler. In this paper, we introduce a new architectural method for adding a set of extended registers into an architecture. Using a novel concept of connection, this method allows the data stored in the extended registers to be accessed by instructions that apparently reference core registers. Furthermore, we address the technical issues involved in applying the new method to an architecture: instruction set extension, procedure call convention, context switching considerations, upward compatibility, efficient implementation, compiler support, and performance. Experimental results based on a prototype compiler and execution driven simulation show that the proposed method can significantly improve the performance of superscalar processors with a small or moderate number of registers.


international symposium on microarchitecture | 1993

Speculative execution exception recovery using write-back suppression

Roger A. Bringmann; Scott A. Mahlke; Richard E. Hank; John C. Gyllenhaal; Wen-mei W. Hwu

Compiler-controlled speculative execution has been shown to be effective in increasing the available instruction level parallelism (ILP) found in non-numeric programs. An important problem associated with compiler-controlled speculative execution is to accurately report and handle exceptions caused by speculatively executed instructions. Previous solutions to this problem incur either excessive hardware overhead or significant register pressure. The paper introduces a new architectural scheme referred to as write-back suppression. This scheme systematically suppresses register file updates for subsequent speculative instructions after an exception condition is detected for a speculatively executed instruction. The authors show that with a modest amount of hardware, write-back suppression supports accurate reporting and handling of exceptions for compiler-controlled speculative execution with minimal additional register pressure. Experiments based on a prototype compiler implementation and hardware simulation indicate that ensuring accurate handling of exceptions with write-back suppression incurs little run-time performance overhead. >


international symposium on microarchitecture | 1992

An efficient architecture for loop based data preloading

William Y. Chen; Roger A. Bringmann; S.A. MahIke; Richard E. Hank; J.E. Sicolo

Cache prefetching with the assistance of an optimizing compiler is an effective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its benefit can be unpredictable. A new architectural support for preloading, the preload buffer, is proposed in this paper. Unlike previously proposed methods of non-binding cache loads, the preload is a binding access to the memory system. The preload buffer is simple in design and predictable in performance. Simple interleaving permits accesses to the preload buffer to be free of bank conflicts. Trace driven system simulation is used to show that the performance achieved with preloading hides memory latency better than either no prefetching or cache prefetching. In addition, both bus tmfic mte and cache miss rate are reduced.


International Journal of Parallel Programming | 1997

Region-based compilation: introduction, motivation, and initial experience

Richard E. Hank; Wen-mei W. Hwu; B. Ramakrishna Rau

The most important task of a compiler designed to exploit instruction-level parallelism (ILP) is instruction scheduling. If higher levels of ILP are to be achieved, the compiler must use, as the unit of scheduling, regions consisting of multiple basic blocks—preferably those that frequently execute consecutively, and which capture cycles in the program’s execution. Traditionally, compilers have been built using the function as the unit of compilation. In this framework, function boundaries often act as barriers to the formation of the most suitable scheduling regions. Function inlining may be used to circumvent this problem by assembling strongly coupled functions into the same compilation unit, but at the cost of very large function bodies. Consequently, global optimizations whose compile time and space requirements are superlinear in the size of the compilation unit, may be rendered prohibitively expensive. This paper introduces a new approach, called region-based compilation, wherein the compiler, after inlining, repartitions the program into more desirable compilation units, termed regions. Region-based compilation allows the compiler to control problem size and complexity while exposing inter-procedural scheduling, optimization and code motion opportunities.

Collaboration


Dive into the Richard E. Hank's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge