Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eric S. Chung is active.

Publication


Featured researches published by Eric S. Chung.


field programmable gate arrays | 2011

CoRAM: an in-fabric memory architecture for FPGA-based computing

Eric S. Chung; James C. Hoe; Ken Mai

FPGAs have been used in many applications to achieve orders-of-magnitude improvement in absolute performance and energy efficiency relative to conventional microprocessors. Despite their promise in both processing performance and efficiency, FPGAs have not yet gained widespread acceptance as mainstream computing devices. A fundamental obstacle to FPGA-based computing today is the FPGAs lack of a common, scalable memory architecture. When developing applications for FPGAs, designers are often directly responsible for crafting the application-specific infrastructure logic that manages and transports data to and from the processing kernels. This infrastructure not only increases design time and effort but will frequently lock a design to a particular FPGA product line, hindering scalability and portability. We propose a new FPGA memory architecture called Connected RAM (CoRAM) to serve as a portable bridge between the distributed computation kernels and the external memory interfaces. In addition to improving performance and efficiency, the CoRAM architecture provides a virtualized memory environment as seen by the hardware kernels to simplify development and to improve an applications portability and scalability.


ACM Transactions on Reconfigurable Technology and Systems | 2009

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

Eric S. Chung; Michael K. Papamichael; Eriko Nurvitadhi; James C. Hoe; Ken Mai; Babak Falsafi

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system. We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.


field programmable gate arrays | 2008

A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs

Eric S. Chung; Eriko Nurvitadhi; James C. Hoe; Babak Falsafi; Ken Mai

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating systems with hundreds of processors or more. To overcome this bottleneck, we propose the PROTOFLEX simulation architecture, which uses FPGAs to accelerate simulation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, PROTOFLEX reduces complexity by virtualizing the execution of many logical processors onto a consolidated set of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance. To achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system We have created a first instance of the PROTOFLEX simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 39x speedup (and as high as 49x) over comparable software simulation across a suite of applications, including OLTP on a commercial database server.


IEEE Micro | 2005

TRUSS: a reliable, scalable server architecture

Brian T. Gold; Jangwoo Kim; Jared C. Smolens; Eric S. Chung; Vasileios Liaskovitis; Eriko Nurvitadhi; Babak Falsafi; James C. Hoe; Andreas G. Nowatzyk

Traditional techniques that mainframes use to increase reliability -special hardware or custom software - are incompatible with commodity server requirements. The Total Reliability Using Scalable Servers (TRUSS) architecture, developed at Carnegie Mellon, aims to bring reliability to commodity servers. TRUSS features a distributed shared-memory (DSM) multiprocessor that incorporates computation and memory storage redundancy to detect and recover from any single point of transient or permanent failure. Because its underlying DSM architecture presents the familiar shared-memory programming model, TRUSS requires no changes to existing applications and only minor modifications to the operating system to support error recovery.


international parallel and distributed processing symposium | 2007

PROToFLEX: FPGA-accelerated Hybrid Functional Simulator

Eric S. Chung; Eriko Nurvitadhi; James C. Hoe; Babak Falsafi; Ken Mai

PROTOFLEX is an FPGA-accelerated hybrid simulation/emulation platform designed to support large-scale multiprocessor hardware and software research. Unlike prior attempts at FPGA multiprocessor system emulators, PROTOFLEX emulates full-system fidelity-i.e., runs stock commercial operating systems with I/O support. This is accomplished without undue effort by leveraging a hybrid emulation technique called transplanting. Our transplant technology uses FPGAs to accelerate only common-case behaviors while relegating infrequent, complex behaviors (e.g., I/O devices) to software simulation. By working in concert with existing full-system simulators, transplanting avoids the costly and unnecessary construction of the entire target system in FPGA. We report preliminary findings from a working hybrid PROTOFLEX emulator of an UltraSPARC workstation running Solaris 8. We have also started developing a novel multiprocessor emulation approach that interleaves the execution of many (10s to 100s) processor contexts onto a shared emulation engine. This approach decouples the scale and complexity of the FPGA host from the simulated system size but nevertheless enables us to scale the desired emulation performance by the number of emulation engines used. Together, the transplant and interleaving techniques enable us to develop full-system FPGA emulators of up to thousands of processors without an overwhelming development effort.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2010

High-Level Design and Validation of the BlueSPARC Multithreaded Processor

Eric S. Chung; James C. Hoe

This paper presents our experiences in using high-level methods to design and validate a 16-way multithreaded microprocessor called BlueSPARC. BlueSPARC is an in-order, high-throughput processor supporting complex features such as privileged-mode operations, memory management, and a nonblocking cache subsystem. Using a high-level design language called Bluespec System Verilog (BSV), our final implementation achieves comparable synthesis quality to a similar commercial microprocessor developed using conventional register transfer level flows, and is capable of running unmodified commercial applications while hosted on a Xilinx XCV2P70 field-programmable gate array (FPGA) at 90 MHz. To validate our implementation, an FPGA-accelerated approach was developed to efficiently check the correct execution of real, nondeterministic multithreaded programs running on the BlueSPARC processor. Together, the high-level language features of BSV along with our validation approach enabled us to achieve a working FPGA-based implementation in less than one man-year.


field programmable gate arrays | 2012

Prototype and evaluation of the CoRAM memory architecture for FPGA-based computing

Eric S. Chung; Michael K. Papamichael; Gabriel Weisz; James C. Hoe; Ken Mai

The CoRAM memory architecture for FPGA-based computing augments traditional reconfigurable fabric with a natural and effective way for applications to interact with off-chip memory and I/O. The two central tenets of the CoRAM memory architecture are (1) the deliberate separation of concerns between computation versus data marshalling and (2) the use of a multithreaded software abstraction to replace FSM-based memory control logic. To evaluate the viability of the CoRAM memory architecture, we developed a full RTL implementation of a CoRAM microarchitecture instance that can be synthesized for standard cells or emulated on FPGAs. The results of our evaluation show that a soft emulation of the CoRAM memory architecture on current FPGAs can be impractical for memory-intensive, large-scale applications due to the high performance and area penalties incurred by the soft mechanisms. The results further show that in an envisioned FPGA built with CoRAM in mind, the introduction of hard macro blocks for data distribution can mitigate these inefficiencies---allowing applications to take advantage of the CoRAM memory architecture for ease of programmability and portability while still enjoying performance and efficiency comparable to RTL-level application development on conventional FPGAs.


formal methods | 2009

Implementing a high-performance multithreaded microprocessor: a case study in high-level design and validation

Eric S. Chung; James C. Hoe

We have developed a 16-way multithreaded microprocessor called BlueSPARC. This in-order, high-throughput processor incorporates complex features such as privileged operations, memory management, and a non-blocking cache subsystem. When supported by a hybrid simulation technique that handles rare, unimplemented behaviors in a software host, the BlueSPARC microprocessor runs unmodified UltraSPARC III-based commercial applications on Solaris 8 while hosted on a single Xilinx XCV2P70 FPGA clocked at 90MHz. This significant effort was achieved in under one man-year using a high-level language and a high-level validation approach. In the first part of the paper, we describe our experience in applying the Bluespec SystemVerilog (BSV) language to develop a large hardware design that must meet specific area and performance requirements. In the second part of the paper, we present the FPGA-accelerated validation approach we employed to check the correct execution of real multithreaded programs running on the BlueSPARC processor. We discuss the challenges and our solutions to validation in the presence of full-system interactions and microarchitectural nondeterminism.


Proceedings of the Workshop on Architecture Research Using FPGA Platforms | 2006

ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development

Eric S. Chung; James C. Hoe; Babak Falsafi


Synthesis Lectures on Computer Architecture | 2014

FPGA-Accelerated Simulation of Computer Systems

Hari Angepat; Derek Chiou; Eric S. Chung; James C. Hoe

Collaboration


Dive into the Eric S. Chung's collaboration.

Top Co-Authors

Avatar

James C. Hoe

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Ken Mai

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Babak Falsafi

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Eriko Nurvitadhi

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian T. Gold

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Derek Chiou

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar

Gabriel Weisz

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hari Angepat

University of Texas at Austin

View shared research outputs
Researchain Logo
Decentralizing Knowledge