Stephen Hines | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen Hines is active.

Explore More

Publication

Featured researches published by Stephen Hines.

international symposium on microarchitecture | 2007

Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache

Stephen Hines; David B. Whalley; Gary S. Tyson

Very small instruction caches have been shown to greatly reduce fetch energy. However, for many applications the use of a small filter cache can lead to an unacceptable increase in execution time. In this paper, we propose the tagless hit instruction cache (TH-IC), a technique for completely eliminating the performance penalty associated with filter caches, as well as a further reduction in energy consumption due to not having to access the tag array on cache hits. Using a few metadata bits per line, we are able to more efficiently track the cache contents and guarantee when hits will occur in our small TH-IC. When a hit is not guaranteed, we can instead fetch directly from the L1 instruction cache, eliminating any additional cycles due to a TH-IC miss. Experimental results show that the overall processor energy consumption can be significantly reduced due to the faster application running time and the elimination of tag comparisons for most of the accesses.

programming language design and implementation | 2004

Fast searches for effective optimization phase sequences

Prasad A. Kulkarni; Stephen Hines; Jason D. Hiser; David B. Whalley; Jack W. Davidson; Douglas L. Jones

It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such searches have been shown to produce more efficient code, the approach can be extremely slow because the application is compiled and executed to evaluate each sequences effectiveness. Consequently, evolutionary or iterative compilation schemes have been promoted for compilation systems targeting embedded applications where longer compilation times may be tolerated in the final stage of development. In this paper we describe two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm. The first approach reduces the search time by avoiding unnecessary executions of the application when possible. Results indicate search time reductions of 65% on average, often reducing searches from hours to minutes. The second approach modifies the search so fewer generations are required to achieve the same results. Measurements show that the average number of required generations decreased by 68%. These improvements have the potential for making evolutionary compilation a viable choice for tuning embedded applications.

international symposium on computer architecture | 2005

Improving Program Efficiency by Packing Instructions into Registers

Stephen Hines; Joshua Green; Gary S. Tyson; David B. Whalley

New processors, both embedded and general purpose, often have conflicting design requirements involving space, power, and performance. Architectural features and compiler optimizations often target one or more design goals at the expense of the others. This paper presents a novel architectural and compiler approach to simultaneously reduce power requirements, decrease code size, and improve performance by integrating an instruction register file (IRF) into the architecture. Frequently occurring instructions are placed in the IRF. Multiple entries in the IRF can be referenced by a single packed instruction in ROM or LI instruction cache. Unlike conventional code compression, our approach allows the frequent instructions to be referenced in arbitrary combinations. The experimental results show significant improvements in space and power, as well as some improvement in execution time when using only 32 entries. These advantages make packing instructions into registers an effective approach for improving overall efficiency.

ACM Transactions on Architecture and Code Optimization | 2005

Fast and efficient searches for effective optimization-phase sequences

Prasad A. Kulkarni; Stephen Hines; David B. Whalley; Jason D. Hiser; Jack W. Davidson; Douglas L. Jones

It has long been known that a fixed ordering of optimization phases will not produce the best code for every application. One approach for addressing this phase-ordering problem is to use an evolutionary algorithm to search for a specific sequence of phases for each module or function. While such searches have been shown to produce more efficient code, the approach can be extremely slow because the application is compiled and possibly executed to evaluate each sequences effectiveness. Consequently, evolutionary or iterative compilation schemes have been promoted for compilation systems targeting embedded applications where meeting strict constraints on execution time, code size, and power consumption is paramount and longer compilation times may be tolerated in the final stage of development, when an application is compiled one last time and embedded in a product. Unfortunately, even for small embedded applications, the search process can take many hours or even days making the approach less attractive to developers. In this paper, we describe two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm. The first approach reduces the search time by avoiding unnecessary executions of the application when possible. Results indicate search time reductions of 62%, on average, often reducing searches from hours to minutes. The second approach modifies the search so fewer generations are required to achieve the same results. Measurements show this approach decreases the average number of required generations by 59%. These improvements have the potential for making evolutionary compilation a viable choice for tuning embedded applications.

ACM Transactions in Embedded Computing Systems | 2006

VISTA: VPO interactive system for tuning applications

Prasad A. Kulkarni; Wankang Zhao; Stephen Hines; David B. Whalley; Xin Yuan; Robert van Engelen; Kyle A. Gallivan; Jason D. Hiser; Jack W. Davidson; Baosheng Cai; Mark W. Bailey; Hwashin Moon; Kyunghwan Cho; Yunheung Paek

Software designers face many challenges when developing applications for embedded systems. One major challenge is meeting the conflicting constraints of speed, code size, and power consumption. Embedded application developers often resort to hand-coded assembly language to meet these constraints since traditional optimizing compiler technology is usually of little help in addressing this challenge. The results are software systems that are not portable, less robust, and more costly to develop and maintain. Another limitation is that compilers traditionally apply the optimizations to a program in a fixed order. However, it has long been known that a single ordering of optimization phases will not produce the best code for every application. In fact, the smallest unit of compilation in most compilers is typically a function and the programmer has no control over the code improvement process other than setting flags to enable or disable certain optimization phases. This paper describes a new code improvement paradigm implemented in a system called VISTA that can help achieve the cost/performance trade-offs that embedded applications demand. The VISTA system opens the code improvement process and gives the application programmer, when necessary, the ability to finely control it. VISTA also provides support for finding effective sequences of optimization phases. This support includes the ability to interactively get static and dynamic performance information, which can be used by the developer to steer the code improvement process. This performance information is also internally used by VISTA for automatically selecting the best optimization sequence from several attempted. One such feature is the use of a genetic algorithm to search for the most efficient sequence based on specified fitness criteria. We include a number of experimental results that evaluate the effectiveness of using a genetic algorithm in VISTA to find effective optimization phase sequences.

international symposium on microarchitecture | 2005

Reducing instruction fetch cost by packing instructions into register windows

Stephen Hines; Gary S. Tyson; David B. Whalley

Instruction packing is a combination compiler/architectural approach that allows for decreased code size, reduced power consumption and improved performance. The packing is obtained by placing frequently occurring instructions into an instruction register file (IRF). Multiple IRF entries can then be accessed using special packed instructions. Previous IRF efforts focused on using a single 32-entry register file for the duration of an application. This paper presents software and hardware extensions to the IRF supporting multiple instruction register windows to allow a greater number of relevant instructions to be available for packing in each function. Windows are shared among similar functions to reduce the overall costs involved in such an approach. The results indicate that significant improvements in instruction fetch cost can be obtained by using this simple architectural enhancement. We also show that using an IRF with a loop cache, which is also used to reduce energy consumption, results in much less energy consumption than using either feature in isolation.

compilers, architecture, and synthesis for embedded systems | 2006

Adapting compilation techniques to enhance the packing of instructions into registers

Stephen Hines; David B. Whalley; Gary S. Tyson

The architectural design of embedded systems is becoming increasingly idiosyncratic to meet varying constraints regarding energy consumption, code size, and execution time. Traditional compiler optimizations are often tuned for improving general architectural constraints, yet these heuristics may not be as beneficial to less conventional designs. Instruction packing is a recently developed compiler/architectural approach for reducing energy consumption, code size, and execution time by placing the frequently occurring instructions into an Instruction Register File (IRF). Multiple IRF instructions are made accessible via special packed instruction formats. This paper presents the design and analysis of a compilation framework and its associated optimizations for improving the efficiency of instruction packing. We show that several new heuristics can be developed for IRF promotion, instruction selection, register re-assignment and instruction scheduling, leading to significant reductions in energy consumption, code size, and/or execution time when compared to results using a standard optimizing compiler targeting the IRF.

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture | 2010

Program differentiation

Daniel Chang; Stephen Hines; Paul West; Gary S. Tyson; David B. Whalley

Mobile electronics are undergoing a convergence of formerly multiple dedicated-application devices into a single programmable device -- the smart phone. The programmability of these devices increases their vulnerability to malicious attack. In this paper, we propose a new malware management system that seeks to use program differentiation to reduce the propagation of malware when a software vulnerability exists. By modifying aspects of the application control flow, we allow portions of an application executable to be permuted into unique versions for each distributed instance. Differentiation is achieved using hardware and systems software modifications that are amenable to and scalable in embedded systems. Our initial areas for modification include function call/return and system call semantics, as well as a hardware-supported Instruction Register File. Differentiation of executables hinders analysis for vulnerabilities as well as prevents vulnerability exploitation in a single distributed version from propagating to other instances of that application. Computational demands on any instance of the application are minimized, while the resources required to attack multiple systems grows with the number of systems attacked. By focusing on prevention of malware propagation in addition to traditional absolute defenses, we target the economics of malware in order to make attacks prohibitively expensive and infeasible.

languages, compilers, and tools for embedded systems | 2007

Addressing instruction fetch bottlenecks by using an instruction register file

Stephen Hines; Gary S. Tyson; David B. Whalley

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an applications instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.

languages, compilers, and tools for embedded systems | 2006

Reducing the cost of conditional transfers of control by using comparison specifications

William C. Kreahling; Stephen Hines; David B. Whalley; Gary S. Tyson

A significant portion of a programs execution cycles are typically dedicated to performing conditional transfers of control. Much of the research on reducing the costs of these operations has focused on the branch, while the comparison has been largely ignored. In this paper we investigate reducing the cost of comparisons in conditional transfers of control. We decouple the specification of the values to be compared from the actual comparison itself, which now occurs as part of the branch instruction. The specification of the register or immediate values involved in the comparison is accomplished via a new instruction called a comparison specification, which is loop invariant. Decoupling the specification of the comparison from the actual comparison performed before the branch reduces the number of instructions in the loop, which provides performance benefits not possible when using conventional comparison instructions. Results from applying this technique on the ARM processor show that both the number of instructions executed and execution cycles are reduced.

Explore More