Is this you? Create Your Porfile

Jason E. Miller

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jason E. Miller is active.

Explore More

Publication

Featured researches published by Jason E. Miller.

international symposium on computer architecture | 2004

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Michael Bedford Taylor; James Psota; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Matthew I. Frank; Saman P. Amarasinghe; Anant Agarwal; Walter Lee; Jason E. Miller; David Wentzlaff; Ian Rudolf Bratt; Ben Greenwald; Henry Hoffmann; Paul Johnson; Jason Kim

This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally-exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBMs 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raws ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2/spl times/ for sequential applications with a very low degree of ILP, about 2/spl times/ to 9/spl times/ better for higher levels of ILP, and 10/spl times/-100/spl times/ better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.

high-performance computer architecture | 2010

Graphite: A distributed parallel simulator for multicores

Jason E. Miller; Harshad Kasture; George Kurian; Charles Gruenwald; Nathan Beckmann; Christopher Celio; Jonathan Eastep; Anant Agarwal

This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multi-core processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development. Several techniques are used to achieve this including: direct execution, seamless multicore and multi-machine distribution, and lax synchronization. Graphite is capable of accelerating simulations by distributing them across multiple commodity Linux machines. When using multiple machines, it provides the illusion of a single process with a single, shared address space, allowing it to run off-the-shelf pthread applications with no source code modification. Our results demonstrate that Graphite can simulate target architectures containing over 1000 cores on ten 8-core servers. Performance scales well as more machines are added with near linear speedup in many cases. Simulation slowdown is as low as 41× versus native execution.

networks on chips | 2012

DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

Chen Sun; Chia-Hsin Owen Chen; George Kurian; Lan Wei; Jason E. Miller; Anant Agarwal; Li-Shiuan Peh; Vladimir Stojanovic

With the rise of many-core chips that require substantial bandwidth from the network on chip (NoC), integrated photonic links have been investigated as a promising alternative to traditional electrical interconnects. While numerous opto-electronic NoCs have been proposed, evaluations of photonic architectures have thus-far had to use a number of simplifications, reflecting the need for a modeling tool that accurately captures the tradeoffs for the emerging technology and its impacts on the overall network. In this paper, we present DSENT, a NoC modeling tool for rapid design space exploration of electrical and opto-electrical networks. We explain our modeling framework and perform an energy-driven case study, focusing on electrical technology scaling, photonic parameters, and thermal tuning. Our results show the implications of different technology scenarios and, in particular, the need to reduce laser and thermal tuning power in a photonic network due to their non-data-dependent nature.

international conference on parallel architectures and compilation techniques | 2010

ATAC: a 1000-core cache-coherent processor with on-chip optical network

George Kurian; Jason E. Miller; James Psota; Jonathan Eastep; Jifeng Liu; Lionel C. Kimerling; Anant Agarwal

Based on current trends, multicore processors will have 1000 cores or more within the next decade. However, their promise of increased performance will only be realized if their inherent scaling and programming challenges are overcome. Fortunately, recent advances in nanophotonic device manufacturing are making CMOS-integrated optics a reality—interconnect technology which can provide significantly more bandwidth at lower power than conventional electrical signaling. Optical interconnect has the potential to enable massive scaling and preserve familiar programming models in future multicore chips. This paper presents ATAC, a new multicore architecture with integrated optics, and ACKwise, a novel cache coherence protocol designed to leverage ATACs strengths. ATAC uses nanophotonic technology to implement a fast, efficient global broadcast network which helps address a number of the challenges that future multicores will face. ACKwise is a new directory-based cache coherence protocol that uses this broadcast mechanism to provide high performance and scalability. Based on 64-core and 1024-core simulations with Splash2, Parsec, and synthetic benchmarks, we show that ATAC with ACKwise out-performs a chip with conventional interconnect and cache coherence protocols. On 1024-core evaluations, ACKwise protocol on ATAC outperforms the best conventional cache coherence protocol on an electrical mesh network by 2.5x with Splash2 benchmarks and by 61% with synthetic benchmarks.

international conference on autonomic computing | 2010

Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Henry Hoffmann; Jonathan Eastep; Marco D. Santambrogio; Jason E. Miller; Anant Agarwal

The rise of multicore computing has greatly increased system complexity and created an additional burden for software developers. This burden is especially troublesome when it comes to optimizing software on modern computing systems. Autonomic or adaptive computing has been proposed as one method to help application programmers handle this complexity. In an autonomic computing environment, system services monitor applications and automatically adapt their behavior to increase the performance of the applications they support. Unfortunately, applications often run as performance black-boxes and adaptive services must infer application performance from low-level information or rely on system-specific ad hoc methods. This paper proposes a standard framework, Application Heartbeats, which applications can use to communicate both their current and target performance and which autonomic services can use to query these values. The Application Heartbeats framework is designed around the well-known idea of a heartbeat. At important points in the program, the application registers a heartbeat. In addition, the interface allows applications to express their performance in terms of a desired heart rate and/or a desired latency between specially tagged heartbeats. Thus, the interface provides a standard method for an application to directly communicate its performance and goals while allowing autonomic services access to this information. Thus, Heartbeat-enabled applications are no longer performance black-boxes. This paper presents the Applications Heartbeats interface, characterizes two reference implementations (one suitable for clusters and one for multicore), and illustrates the use of Heartbeats with several examples of systems adapting behavior based on feedback from heartbeats.

symposium on cloud computing | 2010

An operating system for multicore and clouds: mechanisms and implementation

David Wentzlaff; Charles Gruenwald; Nathan Beckmann; Kevin Modzelewski; Adam Belay; Lamia Youseff; Jason E. Miller; Anant Agarwal

Cloud computers and multicore processors are two emerging classes of computational hardware that have the potential to provide unprecedented compute capacity to the average user. In order for the user to effectively harness all of this computational power, operating systems (OSes) for these new hardware platforms are needed. Existing multicore operating systems do not scale to large numbers of cores, and do not support clouds. Consequently, current day cloud systems push much complexity onto the user, requiring the user to manage individual Virtual Machines (VMs) and deal with many system-level concerns. In this work we describe the mechanisms and implementation of a factored operating system named fos. fos is a single system image operating system across both multicore and Infrastructure as a Service (IaaS) cloud systems. fos tackles OS scalability challenges by factoring the OS into its component system services. Each system service is further factored into a collection of Internet-inspired servers which communicate via messaging. Although designed in a manner similar to distributed Internet services, OS services instead provide traditional kernel services such as file systems, scheduling, memory management, and access to hardware. fos also implements new classes of OS services like fault tolerance and demand elasticity. In this work, we describe our working fos implementation, and provide early performance measurements of fos for both intra-machine and inter-machine operations.

acm sigplan symposium on principles and practice of parallel programming | 2010

Application heartbeats for software performance and health

Henry Hoffmann; Jonathan Eastep; Marco D. Santambrogio; Jason E. Miller; Anant Agarwal

Adaptive, or self-aware, computing has been proposed to help application programmers confront the growing complexity of multicore software development. However, existing approaches to adaptive systems are largely ad hoc and often do not manage to incorporate the true performance goals of the applications they are designed to support. This paper presents an enabling technology for adaptive computing systems: Application Heartbeats. The Application Heartbeats framework provides a simple, standard programming interface that applications can use to indicate their performance and system software (and hardware) can use to query an applications performance. The PARSEC benchmark suite is instrumented with Application Heartbeats to show the broad applicability of the interface and an external resource scheduler demonstrates the use of the interface by assigning cores to an application to maintain a designated performance goal.

design automation conference | 2012

Self-aware computing in the Angstrom processor

Henry Hoffmann; Jim Holt; George Kurian; Eric Lau; Martina Maggio; Jason E. Miller; Sabrina M. Neuman; Mahmut E. Sinangil; Yildiz Sinangil; Anant Agarwal; Anantha P. Chandrakasan; Srinivas Devadas

Addressing the challenges of extreme scale computing requires holistic design of new programming models and systems that support those models. This paper discusses the Angstrom processor, which is designed to support a new Self-aware Computing (SEEC) model. In SEEC, applications explicitly state goals, while other systems components provide actions that the SEEC runtime system can use to meet those goals. Angstrom supports this model by exposing sensors and adaptations that traditionally would be managed independently by hardware. This exposure allows SEEC to coordinate hardware actions with actions specified by other parts of the system, and allows the SEEC runtime system to meet application goals while reducing costs (e.g., power consumption).

international parallel and distributed processing symposium | 2012

Cross-layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System Using Real Application Workloads

George Kurian; Chen Sun; Chia-Hsin Owen Chen; Jason E. Miller; Lan Wei; Dimitri A. Antoniadis; Li-Shiuan Peh; Lionel C. Kimerling; Vladimir Stojanovic; Anant Agarwal

Recent advances in nanophotonic device research have led to a proliferation of proposals for new architectures that employ optics for on-chip communication. However, since standard simulation tools have not yet caught up with these advances, the quality and thoroughness of the evaluations of these architectures have varied widely. This paper provides the first complete end-to-end analysis of an architecture using on-chip optical interconnect. This analysis incorporates realistic performance and energy models for both electrical and optical devices and circuits into a full-fledged functional simulator, thus enabling detailed analyses when running actual applications. Since on-chip optics is not yet mature and unlikely to see widespread use for several more years, we perform our analysis on a future 1000-core processor implemented in an 11nm technology node. We find that the proposed optical interconnect can provide between 1.8x and 4.8x better energy-delay product than conventional electrical-only interconnects. In addition, based on a detailed energy breakdown of all processor components, we conclude that a thermal ring resonators and on-chip lasers that allow rapid power gating are key areas worthy of additional nanophotonic research. This will help guide future optical device research to the areas likely to provide the best payoff.

symposium on vlsi circuits | 2014

A Self-Aware Processor SoC using Energy Monitors Integrated into Power Converters for Self-Adaptation

Yildiz Sinangil; Sabrina M. Neuman; Mahmut E. Sinangil; Nathan Ickes; George Bezerra; Eric Lau; Jason E. Miller; Henry Hoffmann; Srinivas Devadas; Anantha P. Chandraksan

This paper presents a self-aware processor with energy monitoring circuits that can measure actual energy consumption of the key blocks. The monitors are embedded into on-chip DC/DC converters and generate results within 10% of accuracy with minimal power (<;0.1%) and area (<;1%) overhead. Our system, which is implemented in 0.18μm technology, is designed to be voltage scalable from 1.8V down to 0.6V. Low-voltage SRAM operation is made possible through the use of 8T bit-cells and write-assists. The d-caches are designed to be re-configurable in associativity and size to adapt to compute- versus cache-bound phases of applications. Cache configuration is performed in <; 3 clock cycles including tag invalidation. These hardware features enable a software self-aware computation engine (SEEC) to dynamically adapt the processor to meet performance and energy goals. Measurement results show that up to 8.4× energy savings can be achieved with DVFS and self-adaptation.

Explore More