Emily R. Blem
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emily R. Blem.
international symposium on computer architecture | 2011
Hadi Esmaeilzadeh; Emily R. Blem; Renée St. Amant; Karthikeyan Sankaralingam; Doug Burger
Since 2005, processor designers have increased core counts to exploit Moores Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.
Eurasip Journal on Embedded Systems | 2006
Philip Garcia; Katherine Compton; Michael J. Schulte; Emily R. Blem; Wenyin Fu
Over the past few years, the realm of embedded systems has expanded to include a wide variety of products, ranging from digital cameras, to sensor networks, to medical imaging systems. Consequently, engineers strive to create ever smaller and faster products, many of which have stringent power requirements. Coupled with increasing pressure to decrease costs and time-to-market, the design constraints of embedded systems pose a serious challenge to embedded systems designers. Reconfigurable hardware can provide a flexible and efficient platform for satisfying the area, performance, cost, and power requirements of many embedded systems. This article presents an overview of reconfigurable computing in embedded systems, in terms of benefits it can provide, how it has already been used, design issues, and hurdles that have slowed its adoption.
international symposium on microarchitecture | 2012
Hadi Esmaeilzadeh; Emily R. Blem; R. St. Amant; Karthikeyan Sankaralingam; Doug Burger
A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensive study that projects the speedup potential of future multicores and examines the underutilization of integration capacity-dark silicon-is timely and crucial.
ACM Transactions on Computer Systems | 2012
Hadi Esmaeilzadeh; Emily R. Blem; Renée St. Amant; Karthikeyan Sankaralingam; Doug Burger
Since 2004, processor designers have increased core counts to exploit Moore’s Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads for the topologies we study, leaving a nearly 24-fold gap from a target of doubled performance per generation.
compilers, architecture, and synthesis for embedded systems | 2005
Suman Mamidi; Emily R. Blem; Michael J. Schulte; C. John Glossner; Daniel Iancu; Andrei Iancu; Mayan Moudgill; Sanjay Jinturkar
Software defined radios, which provide a programmable solution for implementing the physical layer processing of multiple communication standards, are widely recognized as one of the most important new technologies for wireless communication systems. Emerging communication standards, however, require tremendous processing capabilities to perform high-bandwidth physical-layer processing in real time. In this paper, we present instruction set extensions for several important communication algorithms including convolutional encoding, Viterbi decoding, turbo decoding, and Reed-Solomon encoding and decoding. The performance benefits of these extensions are evaluated using a supercomputer class vectorizing compiler and the Sandblaster low-power multithreaded processor for software defined radio. The proposed instruction set extensions provide significant performance improvements, while maintaining a high degree of programmability.
Microprocessors and Microsystems | 2009
Suman Mamidi; Emily R. Blem; Michael J. Schulte; John Glossner; Daniel Iancu; Andrei Iancu; Mayan Moudgill; Sanjay Jinturkar
Software defined radios provide programmable solutions for implementing the physical layer processing of multiple communication standards. Mobile devices implementing these standards require high-performance processors to perform high-bandwidth physical layer processing in real time. In this paper, we present instruction set extensions for several important communication algorithms including cyclic redundancy checking, convolutional encoding, Viterbi decoding, turbo decoding, and Reed-Solomon encoding and decoding. We also present hardware designs for implementing these extensions, along with estimates of their area, critical path delay, and power consumption. The performance benefits of these extensions are evaluated using a supercomputer-class vectorizing compiler and the Sandblaster low-power multithreaded processor for software defined radio. The proposed instruction set extensions provide significant performance improvements at relatively low cost, while maintaining a high degree of programmability.
ACM Transactions on Computer Systems | 2015
Emily R. Blem; Jaikrishnan Menon; Thiruvengadam Vijayaraghavan; Karthikeyan Sankaralingam
RISC versus CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: Growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Furthermore, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important again, and we seek to answer this question through a detailed measurement-based study on real hardware running real applications. We analyze measurements on seven platforms spanning three ISAs (MIPS, ARM, and x86) over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant.
IEEE Computer Architecture Letters | 2013
Emily R. Blem; Hadi Esmaeilzadeh; Renée St. Amant; Karthikeyan Sankaralingam; Doug Burger
This paper describes a first order multicore model to project a tighter upper bound on performance than previous Amdahls Law based approaches. The speedup over a known baseline is a function of the core performance, microarchitectural features, application parameters, chip organization, and multicore topology. The model is flexible enough to consider both CPU and GPU like organizations as well as modern topologies from symmetric to aggressive heterogeneous (asymmetric, dynamic, and fused) designs. This extended model incorporates first order effect exposing more bottlenecks than previous applications of Amdahls Law while remaining simple and flexible enough to be adapted for many applications.
Communications of The ACM | 2013
Hadi Esmaeilzadeh; Emily R. Blem; Renée St. Amant; Karthikeyan Sankaralingam; Doug Burger
high-performance computer architecture | 2013
Emily R. Blem; Jaikrishnan Menon; Karthikeyan Sankaralingam