Michal Karczmarek
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michal Karczmarek.
compiler construction | 2002
William Thies; Michal Karczmarek; Saman P. Amarasinghe
We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analyses and optimizations. In this paper, we motivate, describe and justify the language features of StreamIt, which include: a structured model of streams, a messaging system for control, a re-initialization mechanism, and a natural textual syntax.
architectural support for programming languages and operating systems | 2002
Michael I. Gordon; William Thies; Michal Karczmarek; Jasper Lin; Ali S. Meli; Andrew A. Lamb; Chris L. Leger; Jeremy Wong; Henry Hoffmann; David Maze; Saman P. Amarasinghe
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.
international parallel and distributed processing symposium | 2004
William Thies; Michael I. Gordon; Michal Karczmarek; Jasper Lin; David Maze; Rodric M. Rabbah; Saman P. Amarasinghe
High-performance streaming applications are a new and distinct domain of programs that is increasingly important. The StreamIt language provides novel high-level representations to improve programmer productivity and program robustness within the streaming domain. At the same time, the StreamIt compiler aims to improve the performance of streaming applications via stream-specific analysis and optimizations. In this paper, we motivate, describes and justify the StreamIt language which include a structured model of streams, a messaging system for control, and a natural textual syntax.
languages, compilers, and tools for embedded systems | 2003
Michal Karczmarek; William Thies; Saman P. Amarasinghe
As embedded DSP applications become more complex, it is increasingly important to provide high-level stream abstractions that can be compiled without sacrificing efficiency. In this paper, we describe scheduler support for StreamIt, a high-level language for signal processing applications. A StreamIt program consists of a set of autonomous filters that communicate with each other via FIFO queues. As in Synchronous Dataflow (SDF), the input and output rates of each filter are known at compile time. However, unlike SDF, the stream graph is represented using hierarchical structures, each of which has a single input and a single output.We describe a scheduling algorithm that leverages the structure of StreamIt to provide a flexible tradeoff between code size and buffer size. The algorithm describes the execution of each hierarchical unit as a set of phases. A complete cycle through the phases represents a single steady-state execution. By varying the granularity of a phase, our algorithm provides a continuum between single appearance schedules and minimum latency schedules. We demonstrate that a minimal latency schedule is effective in decreasing buffer requirements for some applications, while the phased representation mitigates the associated increase in code size.
international conference on computer aided design | 2008
Michal Karczmarek; Arvind
One solution to the timing closure problem is to perform infrequent operations in more than one cycle. Despite simplicity of the solution statement, it is not easily considered because it requires changes in RTL, which, in turn, exacerbates the verification problem. We offer a timing closure solution guaranteed to preserve functional correctness of designs expressed using atomic actions or rules. We exploit the fact that the semantics of atomic actions are untimed, that is, the time to execute an action is not specified. The current hardware synthesis technique from atomic actions assumes that each rule takes one clock cycle to complete its computation. Consequently, the rule with the longest combinational path determines the clock cycle of the entire design, often leading to needlessly slow circuits. We present a synthesis procedure for a system where the combinational circuits embodied in a rule can take multiple cycles without changing the semantics of the original design. We also present preliminary results based on an experimental compiler which uses the Bluespec (BSV) compiler front end and generates Verilog. The results show that the clock speed and the performance of circuits can be improved substantially by allowing slow paths to complete over multiple cycles. Our technique is orthogonal to solutions based on multiple clock domains.
ACM Sigarch Computer Architecture News | 2002
William Thies; Michal Karczmarek; Michael I. Gordon; David Maze; Jeremy Wong; Henry Hoffmann; Matthew Brown; Saman P. Amarasinghe
A common machine language is an essential abstraction that allows programmers to express an algorithm in a way that can be efficiently executed on a variety of architectures. The key properties of a common machine language (CML) are: 1) it abstracts away the idiosyncratic differences between one architecture and another so that a programmer doesn’t have to worry about them, and 2) it encapsulates the common properties of the architectures such that a compiler for any given target can still produce an efficient executable. For von-Neumann architectures, the canonical CML is C: instructions consist of basic arithmetic operations, executed sequentially, which operate on either local variables or values drawn from a global block of memory. C has been implemented efficiently on a wide range of architectures, and it saves the programmer from having to adapt to each kind of register layout, cache configuration, and instruction set. However, recent years have seen the emergence of a class of grid-based architectures [2, 3, 4] for which the von-Neumann model no longer holds, and for which C is no longer an adequate CML. The design of these processors is fundamentally different in that they are conscious of wire delays–instead of just arithmetic computations–as the barriers to performance. Accordingly, grid-based architectures support finegrained, reconfigurable communication between replicated processing units. Rather than a single instruction stream with a monolithic memory, these machines contain multiple instruction streams with distributed memory banks. Though C can still be used to write efficient programs on these machines, doing so either requires architecture-specific directives or a very smart compiler that can extract the parallelism and communication from the C semantics. Both of these options renders C obsolete as a CML, since it fails to hide the architectural details from the programmer and it imposes abstractions which are a mismatch for the domain. To bridge this gap, we propose a new common machine language for grid-based processors: StreamIt. The StreamIt language makes explicit the large-scale parallelism and regular communication patterns that these architectures were designed to exploit. A program is represented not as a monolithic memory and instruction stream, but rather as a composition of autonomous filters, each of which contains its own memory and can only communicate with its immediate neighbors via high-bandwidth data channels. In addition, StreamIt provides a low-bandwidth messaging system that filters can use for non-local communication. We believe that StreamIt abstracts away the variations in grid-based processors while encapsulating their common properties, thereby enabling compilers to efficiently map a single source program to a variety of modern processors. 2. THE STREAMIT LANGUAGE
Archive | 2002
William Thies; Michal Karczmarek; Michael I. Gordon; David Maze; Jeremy Wong; Henry Hoffmann; Matthew Brown; Saman P. Amarasinghe
acm sigplan symposium on principles and practice of parallel programming | 2005
William Thies; Michal Karczmarek; Janis Sermulins; Rodric M. Rabbah; Saman P. Amarasinghe
Archive | 2009
Michal Karczmarek; Arvind Mithal; Muralidaran Vijayaraghavan
Archive | 2014
Wayne Burleson; Dataflow Graphs; Adnan Bouakaz; Thierry Gautier; Buffer Sizing; Joost P. H. M. Hausmans; Stefan J. Geuns; Maarten H. Wiggers; Marco J. G. Bekooij; Michal Karczmarek; Muralidaran Vijayaraghavan; Yu Bai; Klaus Schneider; Nikita Bhardwaj; Badarinath Katti; Tania Shazadi