Richard E. Hank
University of Illinois at Urbana–Champaign
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard E. Hank.
international symposium on computer architecture | 1995
Scott A. Mahlke; Richard E. Hank; James E. McCormick; David I. August; Wen-mei W. Hwu
One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support predicated execution can be difficult. On one end of the design spectrum, architectural support for full predicated execution requires increasing the number of source operands for all instructions. Full predicate support provides for the most flexibility and the largest potential performance improvements. On the other end, partial predicated execution support, such as conditional moves, requires very little change to existing architectures. This paper presents a preliminary study to qualitatively and quantitatively address the benefit of full and partial predicated execution support. With our current compiler technology, we show that the compiler can use both partial and full predication to achieve speedup in large control-intensive programs. Some details of the code generation techniques are shown to provide insight into the benefit of going from partial to full predication. Preliminary experimental results are very encouraging: partial predication provides an average of 33% performance improvement for an 8-issue processor with no predicate support while full predication provides an additional 30% improvement.
international symposium on microarchitecture | 1994
Scott A. Mahlke; Richard E. Hank; Roger A. Bringmann; John C. Gyllenhaal; David M. Gallagher; Wen-mei W. Hwu
Branch instructions are recognized as a major impediment to exploiting instruction level parallelism. Even with sophisticated branch prediction techniques, many frequently executed branches remain difficult to predict. An architecture supporting predicated execution may allow the compiler to remove many of these hard-to-predict branches, reducing the number of branch mispredictions and thereby improving performance. We present an in-depth analysis of the characteristics of those branches which are frequently mispredicted and examine the effectiveness of an advanced compiler to eliminate these branches. Over the benchmarks studied, an average of 27% of the dynamic branches and 56% of the dynamic branch mispredictions are eliminated with predicated execution support.
ACM Transactions on Computer Systems | 1993
Scott A. Mahlke; William Y. Chen; Roger A. Bringmann; Richard E. Hank; Wen-mei W. Hwu; B. Ramakrishna Rau; Michael S. Schlansker
Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to efficiently handle exceptions for speculative instructions. In this article, a set of architectural features and compile-time scheduling support collectively referred to as sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for both compiler-controlled speculative execution and exception handling. All program exceptions are accurately detected and reported in a timely manner with sentinel scheduling. Recovery from exceptions is also ensured with the model. Experimental results show the effectiveness of sentinel scheduling for exploiting instruction-level parallelism and overhead associated with exception handling.
Proceedings of the IEEE | 1995
Wen-mei W. Hwu; Richard E. Hank; David M. Gallagher; Scott A. Mahlke; Daniel M. Lavery; Grant E. Haab; John C. Gyllenhaal; David I. August
Advances in hardware technology have made it possible for microprocessors to execute a large number of instructions concurrently (i.e., in parallel). These microprocessors take advantage of the opportunity to execute instructions in parallel to increase the execution speed of a program. As in other forms of parallel processing, the performance of these microprocessors can vary greatly depending on the qualify of the software. In particular the quality of compilers can make an order of magnitude difference in performance. This paper presents a new generation of compiler technology that has emerged to deliver the large amount of instruction-level-parallelism that is already required by some current state-of-the-art microprocessors and will be required by more future microprocessors. We introduce critical components of the technology which deal with difficult problems that are encountered when compiling programs for a high degree of instruction-level-parallelism. We present examples to illustrate the functional requirements of these components. To provide more insight into the challenges involved, we present in-depth case studies on predicated compilation and maintenance of dependence information, two of the components that are largely missing from most current commercial compilers.
international symposium on microarchitecture | 1993
Richard E. Hank; Scott A. Mahlke; Roger A. Bringmann; John C. Gyllenhaal; Wen-mei W. Hwu
To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle.<<ETX>>
international symposium on microarchitecture | 1995
Richard E. Hank; W.W. Hwu; Bantwal R. Rau
As the amount of instruction-level parallelism required to fully utilize VLIW and superscalar processors increases, compilers must perform increasingly more aggressive analysis, optimization, parallelization and scheduling on the input programs. Traditionally, compilers have been built assuming functions as the unit of compilation. In this framework, function boundaries tend to hide valuable optimization opportunities from the compiler. Function inlining may be applied to assemble strongly coupled functions into the same compilation unit at the cost of very large function bodies. This paper introduces a new technique, called region-based compilation, where the compiler is allowed to repartition the program into more desirable compilation units. Region-based compilation allows the compiler to control problem size while exposing inter-procedural optimization and code motion opportunities.
international symposium on computer architecture | 1993
Tokuzo Kiyohara; Scott A. Mahlke; William Y. Chen; Roger A. Bringmann; Richard E. Hank; Sadun Anik; Wen-mei W. Hwu
Code optimization and scheduling for superscalar and superpipelined processors often increase the register requirement of programs. For existing instruction sets with a small to moderate number of registers, this increased register requirement can be a factor that limits the effectivess of the compiler. In this paper, we introduce a new architectural method for adding a set of extended registers into an architecture. Using a novel concept of connection, this method allows the data stored in the extended registers to be accessed by instructions that apparently reference core registers. Furthermore, we address the technical issues involved in applying the new method to an architecture: instruction set extension, procedure call convention, context switching considerations, upward compatibility, efficient implementation, compiler support, and performance. Experimental results based on a prototype compiler and execution driven simulation show that the proposed method can significantly improve the performance of superscalar processors with a small or moderate number of registers.
international symposium on microarchitecture | 1993
Roger A. Bringmann; Scott A. Mahlke; Richard E. Hank; John C. Gyllenhaal; Wen-mei W. Hwu
Compiler-controlled speculative execution has been shown to be effective in increasing the available instruction level parallelism (ILP) found in non-numeric programs. An important problem associated with compiler-controlled speculative execution is to accurately report and handle exceptions caused by speculatively executed instructions. Previous solutions to this problem incur either excessive hardware overhead or significant register pressure. The paper introduces a new architectural scheme referred to as write-back suppression. This scheme systematically suppresses register file updates for subsequent speculative instructions after an exception condition is detected for a speculatively executed instruction. The authors show that with a modest amount of hardware, write-back suppression supports accurate reporting and handling of exceptions for compiler-controlled speculative execution with minimal additional register pressure. Experiments based on a prototype compiler implementation and hardware simulation indicate that ensuring accurate handling of exceptions with write-back suppression incurs little run-time performance overhead. >
international symposium on microarchitecture | 1992
William Y. Chen; Roger A. Bringmann; S.A. MahIke; Richard E. Hank; J.E. Sicolo
Cache prefetching with the assistance of an optimizing compiler is an effective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its benefit can be unpredictable. A new architectural support for preloading, the preload buffer, is proposed in this paper. Unlike previously proposed methods of non-binding cache loads, the preload is a binding access to the memory system. The preload buffer is simple in design and predictable in performance. Simple interleaving permits accesses to the preload buffer to be free of bank conflicts. Trace driven system simulation is used to show that the performance achieved with preloading hides memory latency better than either no prefetching or cache prefetching. In addition, both bus tmfic mte and cache miss rate are reduced.
International Journal of Parallel Programming | 1997
Richard E. Hank; Wen-mei W. Hwu; B. Ramakrishna Rau
The most important task of a compiler designed to exploit instruction-level parallelism (ILP) is instruction scheduling. If higher levels of ILP are to be achieved, the compiler must use, as the unit of scheduling, regions consisting of multiple basic blocks—preferably those that frequently execute consecutively, and which capture cycles in the program’s execution. Traditionally, compilers have been built using the function as the unit of compilation. In this framework, function boundaries often act as barriers to the formation of the most suitable scheduling regions. Function inlining may be used to circumvent this problem by assembling strongly coupled functions into the same compilation unit, but at the cost of very large function bodies. Consequently, global optimizations whose compile time and space requirements are superlinear in the size of the compilation unit, may be rendered prohibitively expensive. This paper introduces a new approach, called region-based compilation, wherein the compiler, after inlining, repartitions the program into more desirable compilation units, termed regions. Region-based compilation allows the compiler to control problem size and complexity while exposing inter-procedural scheduling, optimization and code motion opportunities.