Eriko Nurvitadhi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eriko Nurvitadhi is active.

Explore More

Publication

Featured researches published by Eriko Nurvitadhi.

ACM Transactions on Reconfigurable Technology and Systems | 2009

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

Eric S. Chung; Michael K. Papamichael; Eriko Nurvitadhi; James C. Hoe; Ken Mai; Babak Falsafi

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system.n We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.

field programmable gate arrays | 2008

A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs

Eric S. Chung; Eriko Nurvitadhi; James C. Hoe; Babak Falsafi; Ken Mai

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating systems with hundreds of processors or more. To overcome this bottleneck, we propose the PROTOFLEX simulation architecture, which uses FPGAs to accelerate simulation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, PROTOFLEX reduces complexity by virtualizing the execution of many logical processors onto a consolidated set of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance. To achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete systemn We have created a first instance of the PROTOFLEX simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 39x speedup (and as high as 49x) over comparable software simulation across a suite of applications, including OLTP on a commercial database server.

IEEE Micro | 2005

TRUSS: a reliable, scalable server architecture

Brian T. Gold; Jangwoo Kim; Jared C. Smolens; Eric S. Chung; Vasileios Liaskovitis; Eriko Nurvitadhi; Babak Falsafi; James C. Hoe; Andreas G. Nowatzyk

Traditional techniques that mainframes use to increase reliability -special hardware or custom software - are incompatible with commodity server requirements. The Total Reliability Using Scalable Servers (TRUSS) architecture, developed at Carnegie Mellon, aims to bring reliability to commodity servers. TRUSS features a distributed shared-memory (DSM) multiprocessor that incorporates computation and memory storage redundancy to detect and recover from any single point of transient or permanent failure. Because its underlying DSM architecture presents the familiar shared-memory programming model, TRUSS requires no changes to existing applications and only minor modifications to the operating system to support error recovery.

Journal of Systems Architecture | 2005

Dynamic voltage scaling techniques for power efficient video decoding

Ben Lee; Eriko Nurvitadhi; Reshma Dixit; Chansu Yu; Myungchul Kim

This paper presents a comparison of power-aware video decoding techniques that utilize dynamic voltage scaling (DVS). These techniques reduce the power consumption of a processor by exploiting high frame variability within a video stream. This is done through scaling of the voltage and frequency of the processor during the video decoding process. However, DVS causes frame deadline misses due to inaccuracies in decoding time predictions and granularity of processor settings used. Four techniques were simulated and compared in terms of power consumption, accuracy, and deadline misses. In addition, this paper proposes the frame-data computation aware (FDCA) technique, which is a useful power-saving technique not only for stored video but also for real-time video applications. The FDCA method is compared with the GOP, Direct, and Dynamic methods, which tend to be more suited for stored video applications. The simulation results indicated that the Dynamic per-frame technique, where the decoding time prediction adapts to the particular video being decoded, provides the most power saving with performance comparable to the ideal case. On the other hand, the FDCA method consumes more power than the Dynamic method but can be used for stored video and real-time time video scenarios without the need for any preprocessing. Our findings also indicate that, in general, DVS improves power savings, but the number of deadline misses also increase as the number of available processor settings increases. More importantly, most of these deadline misses are within 10-20% of the playout interval and thus have minimal affect on video quality. However, video clips with high variability in frame complexities combined with inaccurate decoding time predictions may degrade the video quality. Finally, our results show that a processor with 13 voltage/frequency settings is sufficient to achieve near maximum performance with the experimental environment and the video workloads we have used.

Frontiers in Education | 2003

Do class comments aid Java program understanding

Eriko Nurvitadhi; Wing Wah Leung; Curtis R. Cook

This paper describes an experiment that investigates the effects of class and method comments on Java program understanding among beginning programmers. Each of the 103 students from CS1 class at Oregon Slate University was given one of four versions (no comments, only method comments, only class comments, and both method and class comments) of a Java database program and answered questions about the program. The results indicated that method comments do increase low-level program understanding, while class comments did not increase high-level understanding. This raises questions about the role of class comments in object-oriented programs, as well as the kind of commenting guidelines that should be used in teaching CS1 classes.

international parallel and distributed processing symposium | 2007

PROToFLEX: FPGA-accelerated Hybrid Functional Simulator

Eric S. Chung; Eriko Nurvitadhi; James C. Hoe; Babak Falsafi; Ken Mai

PROTOFLEX is an FPGA-accelerated hybrid simulation/emulation platform designed to support large-scale multiprocessor hardware and software research. Unlike prior attempts at FPGA multiprocessor system emulators, PROTOFLEX emulates full-system fidelity-i.e., runs stock commercial operating systems with I/O support. This is accomplished without undue effort by leveraging a hybrid emulation technique called transplanting. Our transplant technology uses FPGAs to accelerate only common-case behaviors while relegating infrequent, complex behaviors (e.g., I/O devices) to software simulation. By working in concert with existing full-system simulators, transplanting avoids the costly and unnecessary construction of the entire target system in FPGA. We report preliminary findings from a working hybrid PROTOFLEX emulator of an UltraSPARC workstation running Solaris 8. We have also started developing a novel multiprocessor emulation approach that interleaves the execution of many (10s to 100s) processor contexts onto a shared emulation engine. This approach decouples the scale and complexity of the FPGA host from the simulated system size but nevertheless enables us to scale the desired emulation performance by the number of emulation engines used. Together, the transplant and interleaving techniques enable us to develop full-system FPGA emulators of up to thousands of processors without an overwhelming development effort.

design, automation, and test in europe | 2010

Automatic pipelining from transactional datapath specifications

Eriko Nurvitadhi; James C. Hoe; Timothy Kam; Shih-Lien Lu

We present a transactional datapath specification (T-spec) and the tool (T-piper) to synthesize automatically an in-order pipelined implementation from it. T-spec abstractly views a datapath as executing one transaction at a time, computing next system states based on current ones. From a T-spec, T-piper can synthesize a pipelined implementation that preserves original transaction semantics, while allowing simultaneous execution of multiple overlapped transactions across pipeline stages. T-piper not only ensures the correctness of pipelined executions, but can also employ forwarding and speculation to minimize performance loss due to data dependencies. Design case studies on RISC and CISC processor pipeline development are reported.

field programmable gate arrays | 2006

Design, implementation, and verification of active cache emulator (ACE)

Jumnit Hong; Eriko Nurvitadhi; Shih-Lien Lu

This paper presents the design, implementation, and verification of the Active Cache Emulator (ACE), a novel FPGA-based emulator that models an L3 cache actively and in real-time. ACE leverages interactions with its host system to model the target system (i.e. hypothetical system under study). Unlike most existing FPGA-based cache emulators that collect only memory traces from their host system, ACE provides feedback to its host by modeling the impact of the emulated cache on the system. Specifically, delays are injected to time dilate the host system which then experiences hit/miss latencies of the emulated cache. Such active emulation expands the context of performance measurements by capturing processor performance metrics (e.g. cycle per instruction) in addition to measuring the typical cache-specific performance metrics (e.g. miss ratio).ACE is designed to interface with a front-side bus (FSB) of a typical Pentium®-based PC system. To actively emulate cache latencies, ACE utilizes the snoop stall mechanism of the FSB to inject delays to the system. At present, ACE is implemented using a Xilinx XC2V6000 FPGA running at 66MHz, the same speed as its hosts FSB. Verification of ACE includes using the Cache Calibrator and RightMark Memory Analyzer software to confirm proper detection of the emulated cache by the host system, and comparing ACE results with SimpleScalar software simulations.

design automation conference | 2010

Automatic multithreaded pipeline synthesis from transactional datapath specifications

Eriko Nurvitadhi; James C. Hoe; Shih-Lien Lu; Timothy Kam

We present a technique to automatically synthesize a multithreaded in-order pipeline from a high-level unpipelined datapath specification. This work extends the previously proposed transactional specification (T-spec) and synthesis technology (T-piper). The technique not only works with instruction processors but also flexible enough to accept any sequential datapath. It maintains previously proposed non-threaded pipeline features and is enhanced with multithreading features. We report a design space exploration study of 32 multithreaded x86 processor pipelines, all synthesized from a single T-spec.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

Automatic Pipelining From Transactional Datapath Specifications

Eriko Nurvitadhi; James C. Hoe; Timothy Kam; Shih-Lien Lu

This paper presents a transactional specification framework (T-spec) for describing a datapath and the tool T-piper to synthesize automatically an in-order pipelined implementation with arbitrary user-specified pipeline-stage boundaries. T-spec abstractly views a datapath as executing one transaction at a time, computing the next system states based on the current ones. The synthesized pipeline maintains this semantics, yet allows concurrent execution of multiple overlapped transactions in different pipeline stages, where each stage performs a part of the next-state computation of each transaction. T-spec makes the state reading and writing events in a datapath explicit to enable T-piper to perform exact read-after-write (RAW) hazard analysis between the overlapped transactions. T-piper can automatically generate the pipeline control not only to ensure the correctness of the pipelined executions but also to minimize (using forwarding and speculation) the performance loss due to pipeline stalls in the presence of RAW dependencies. This paper reports design case studies applying T-spec and T-piper to reduced instruction set computing and complex instruction set computing processor pipeline development. In the latter, we report the results from a rapid design space exploration of 60 generated x86-subset pipelines, varying in pipeline depth, forwarding, and speculative execution, all starting from a single T-spec.

Explore More