Michael S. Schlansker

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael S. Schlansker is active.

Explore More

Publication

Featured researches published by Michael S. Schlansker.

programming language design and implementation | 1992

Register allocation for software pipelined loops

B. R. Rau; M. Lee; P. P. Tirumalai; Michael S. Schlansker

Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of register allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best register allocation strategy, both with and without the hardware support for software pipelining.

architectural support for programming languages and operating systems | 1992

Sentinel scheduling for VLIW and superscalar processors

Scott A. Mahlke; William Y. Chen; Wen-mei W. Hwu; B. Ramakrishna Rau; Michael S. Schlansker

Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to accurately detect and report all program execution errors at the time of occurrence. In this paper, a set of architectural features and compile-time scheduling support referred to as sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for compiler-controlled speculative execution that accurately detects and reports all exceptions. Sentinel scheduling also supports speculative execution of store instructions by providing a store buffer which allows probationary entries. Experimental results show that sentinel scheduling is highly effective for a wide range of VLIW and superscalar processors.

international symposium on microarchitecture | 1992

Code generation schema for modulo scheduled loops

B.R. Rau; Michael S. Schlansker; P.P. Tirumalai

Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. Modulo scheduling is one approach for generating such schedules. This naner addresses an issue which has received little attentibn’ thus far, but which is non-trivial in its complexity: the task of generating correct, high-performance code once the modulo schedule has been generated, taking into account the nature of the loop and the register allocation strategy that will be used. This issue is studied both with and without hardware features that are specifically aimed at supporting modulo scheduling.

IEEE Computer | 2000

EPIC: Explicitly Parallel Instruction Computing

Michael S. Schlansker; B.R. Rau

Over the past two and a half decades, the computer industry has grown accustomed to the spectacular rate of increase in microprocessor performance. The industry accomplished this without fundamentally rewriting programs in parallel form, without changing algorithms or languages, and often without even recompiling programs. Instruction level parallel processing achieves high performance without major changes to software. However, computers have thus far achieved this goal at the expense of tremendous hardware complexity-a complexity that has grown so large as to challenge the industrys ability to deliver ever-higher performance. The authors developed the Explicitly Parallel Instruction Computing (EPIC) style of architecture to enable higher levels of instruction-level-parallelism without unacceptable hardware complexity. They focus on the broader concept of EPIC as embodied by HPL-PD (formerly known as HPL PlayDoh) architecture, which encompasses a large space of possible EPIC ISAs (instruction set architectures). In this article, the authors focus on HPL-PD because it represents the essence of the EPIC philosophy while avoiding the idiosyncracies of a specific ISA.

ACM Transactions on Computer Systems | 1993

Sentinel scheduling: a model for compiler-controlled speculative execution

Scott A. Mahlke; William Y. Chen; Roger A. Bringmann; Richard E. Hank; Wen-mei W. Hwu; B. Ramakrishna Rau; Michael S. Schlansker

Speculative execution is an important source of parallelism for VLIW and superscalar processors. A serious challenge with compiler-controlled speculative execution is to efficiently handle exceptions for speculative instructions. In this article, a set of architectural features and compile-time scheduling support collectively referred to as sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for both compiler-controlled speculative execution and exception handling. All program exceptions are accurately detected and reported in a timely manner with sentinel scheduling. Recovery from exceptions is also ensured with the model. Experimental results show the effectiveness of sentinel scheduling for exploiting instruction-level parallelism and overhead associated with exception handling.

conference on high performance computing (supercomputing) | 1990

Parallelization of loops with exits on pipelined architectures

Parthasarathy P. Tirumalai; Meng Lee; Michael S. Schlansker

Modulo scheduling theory can be applied successfully to overlap Fortran DO loops on pipelined computers issuing multiple operations per cycle both with and without special loop architectural support. It is shown that a broader class of loops-repeat-until, while, and loops with more than one exit-where the trip count is not known beforehand, can also be overlapped efficiently on multiple issue pipelined machines. Special features that are required in the architecture as well as compiler representations for accelerating these loop constructions are discussed. The approach uses hardware architectural support, program transformation techniques, performance bounds calculations, and scheduling heuristics. Performance results are presented for a few select examples. A prototype scheduler is currently under construction for the Cydra 5 directed dataflow computer.<<ETX>>

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2001

Bitwidth cognizant architecture synthesis of custom hardware accelerators

Scott A. Mahlke; Rajiv A. Ravindran; Michael S. Schlansker; Robert Schreiber; Timothy Sherwood

Program-in chip-out (PICO) is a system for automatically synthesizing embedded hardware accelerators from loop nests specified in the C programming language. A key issue confronted when designing such accelerators is the optimization of hardware by exploiting information that is known about the varying number of bits required to represent and process operands. In this paper, we describe the handling and exploitation of integer bitwidth in PICO. A bitwidth analysis procedure is used to determine bitwidth requirements for all integer variables and operations in a C application. Given known bitwidths for all variables, complex problems arise when determining a program schedule that specifies on which function unit (FU) and at what time each operation executes. If operations are assigned to FUs with no knowledge of bitwidth, bitwidth-related cost benefit is lost when each unit is built to accommodate the widest operation assigned. By carefully placing operations of similar width on the same unit, hardware costs are decreased. This problem is addressed using a preliminary clustering of operations that is based jointly on width and implementation cost. These clusters are then honored during resource allocation and operation scheduling to create an efficient width-conscious design. Experimental results show that exploiting integer bitwidth substantially reduces the gate count of PICO-synthesized hardware accelerators across a range of applications.

IEEE Computer | 1997

Challenges to combining general-purpose and multimedia processors

Thomas M. Conte; Pradeep Dubey; Matthew D. Jennings; Ruby B. Lee; Alex Peleg; Salliah Rathnam; Michael S. Schlansker; Peter Song; Andrew Wolfe

A hair styling device including a housing providing an internal chamber and an opening therein, the chamber having air withdrawn therefrom for causing a vacuum in the chamber so that a lock of hair may be drawn into the chamber for drying and styling. Supplementary air orifices are provided about the opening for supplying air, which is preferably heated, to be drawn into the chamber with the lock of hair so as to assist in the drying process. The device may be manipulated with one hand so that the single lock, which has been drawn into the chamber, may be readily styled during the drying process.

international symposium on microarchitecture | 2006

Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Jack Sampson; Ruben Gonzalez; Jean-Francois Collard; Norman P. Jouppi; Michael S. Schlansker; Brad Calder

We examine the ability of CMPs, due to their lower on-chip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on-chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsystem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier. We examine two types of barrier filters, one synchronizing through instruction cache lines, and the other through data cache lines

international symposium on microarchitecture | 1996

Analysis techniques for predicated code

Richard Johnson; Michael S. Schlansker

Predicated execution offers new approaches to exploiting instruction-level parallelism (ILP), but it also presents new challenges for compiler analysis and optimization. In predicated code, each operation is guarded by a boolean operand whose run-time value determines whether the operation is executed or nullified. While research has shown the utility of predication in enhancing ILP, there has been little discussion of the difficulties surrounding compiler support for predicated execution. Conventional program analysis tools (e.g. data flow analysis) assume that operations execute unconditionally within each basic block and thus make incorrect assumptions about the run-rime behavior of predicated code. These tools can be modified to be correct without requiring predicate analysis, but this yields overly-conservative results in crucial areas such as scheduling and register allocation. To generate high-quality code for machines offering predicated execution, a compiler must incorporate information about relations between predicates into its analysis. We present new techniques for analyzing predicated code. Operations which compute predicates are analyzed to determine relations between predicate values. These relations are captured in a graph-based data structure, which supports efficient manipulation of boolean expression representing facts about predicated code. This approach forms the basis for predicate-sensitive data flow analysis. Conventional data flow algorithms can be systematically upgraded to be predicate sensitive by incorporating information about predicates. Predicate-sensitive data flow analysis yields significantly more accurate results than conventional data flow analysis when applied to predicated code.

Explore More