Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rolf Kassa is active.

Publication


Featured researches published by Rolf Kassa.


field programmable gate arrays | 2007

An FPGA-based Pentium® in a complete desktop system

Shih-Lien L. Lu; Peter Yiannacouras; Rolf Kassa; Michael Konow; Taeweon Suh

Software simulation has been the predominant method for architects to evaluate microprocessor research proposals. There are three tenets in modeling new designs with software models: simulation speed, model accuracy and model completeness. The increasing complexity of the processor and accelerated trend to have multiple processors on a chip are putting burden on simulators to achieve all tenets mentioned, including accurately capturing OS effects. In this work we perform preliminary experimentation/prototyping with an emulation system which overcomes the tension to satisfy all three requirements. The system is an original Socket-7 based desktop processor system with typical hardware peripherals running modern operating systems such as Fedora Core 4 and Windows XP; however we have inserted a Xilinx Virtex-4 in place of the processor that should sit in the motherboard and have used the Virtex-4 to host a complete version of the Pentium® microprocessor (which consumes less than half its resources). We can therefore apply architectural changes to the processor and evaluate their effects on the complete desktop system. We use this FPGA-based emulation system to conduct preliminary architectural experiments including growing the branch target buffer and the level 1 caches. In addition, we experimented with interfacing hardware accelerators such as DES and AES engines which resulted in 27x speedups.


international symposium on microarchitecture | 2009

Architecting a chunk-based memory race recorder in modern CMPs

Gilles Pokam; Cristiano Pereira; Klaus Danne; Rolf Kassa; Ali-Reza Adl-Tabatabai

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay. This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.


international symposium on computer architecture | 2013

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Gilles Pokam; Klaus Danne; Cristiano Pereira; Rolf Kassa; Tim Kranich; Shiliang Hu; Justin E. Gottschlich; Nima Honarmand; Nathan Dautenhahn; Samuel T. King; Josep Torrellas

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality. This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel. This papers focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.


ACM Transactions on Reconfigurable Technology and Systems | 2008

A Desktop Computer with a Reconfigurable Pentium

Shih-Lien L. Lu; Peter Yiannacouras; Taeweon Suh; Rolf Kassa; Michael Konow

Advancements in reconfigurable technologies, specifically FPGAs, have yielded faster, more power-efficient reconfigurable devices with enormous capacities. In our work, we provide testament to the impressive capacity of recent FPGAs by hosting a complete Pentium® in a single FPGA chip. In addition we demonstrate how FPGAs can be used for microprocessor design space exploration while overcoming the tension between simulation speed, model accuracy, and model completeness found in traditional software simulator environments. Specifically, we perform preliminary experimentation/prototyping with an original Socket 7 based desktop processor system with typical hardware peripherals running modern operating systems such as Fedora Core 4 and Windows XP; however we have inserted a Xilinx Virtex-4 in place of the processor that should sit in the motherboard and have used the Virtex-4 to host a complete version of the Pentium® microprocessor (which consumes less than half its resources). We can therefore apply architectural changes to the processor and evaluate their effects on the complete desktop system. We use this FPGA-based emulation system to conduct preliminary architectural experiments including growing the branch target buffer and the level 1 caches. In addition, we experimented with interfacing hardware accelerators such as DES and AES engines which resulted in a 27x speedup.


field-programmable logic and applications | 2010

An FPGA Based Hybrid Processor Emulation Platform

Qigang Wang; Rolf Kassa; Wenbo Shen; Nelson Ijih; Bhushan Chitlur; Michael Konow; Dong Liu; Arthur Sheiman; Prabhat Gupta

This paper introduces a flexible, hybrid emulation platform for processor related emulation. It is based on a modern Xeon server and FPGA. With Intel processors implemented in FPGA and plugged into the Xeon servers processor socket, and with the Xeon BIOS modified to accommodate different cores, this platform is able to boot OS and run applications while interacting with the Xeon servers native hardware components. It enables researchers to do architectural changes in FPGA and quickly evaluate their effect on the whole platform. This platform is an ideal vehicle for research on processor technology, SoC architecture, reconfigurable computing, heterogeneous core architecture, etc.


Archive | 2012

PROCESSOR WITH MEMORY RACE RECORDER TO RECORD THREAD INTERLEAVINGS IN MULTI-THREADED SOFTWARE

Tim Kranich; Gilles Pokam; Justin E. Gottschlich; Klaus Danne; Rolf Kassa; Shiliang Hu; Cristiano Pereira


Archive | 2013

Methods and systems for performing a replay execution

Justin E. Gottschlich; Klaus Danne; Cristiano Pereira; Gilles Pokam; Rolf Kassa; Shiliang Hu; Tim Kranich


Archive | 2013

VISUALIZING RECORDED EXECUTIONS OF MULTI-THREADED SOFTWARE PROGRAMS FOR PERFORMANCE AND CORRECTNESS

Justin E. Gottschlich; Gilles Pokam; Cristiano Pereira; Klaus Danne; Hu Shiliang; Rolf Kassa


Archive | 2014

Processor With Transactional Capability and Logging Circuitry To Report Transactional Operations

Rolf Kassa; Justin E. Gottschlich; Shiliang Hu; Gilles Pokam; Robert C. Knauerhase


Archive | 2012

Replay execution of instructions in thread chunks in the chunk order recorded during previous execution

Justin E. Gottschlich; Klaus Danne; Cristiano Pereira; Gilles Pokam; Rolf Kassa; Shiliang Hu; Tim Kranich

Collaboration


Dive into the Rolf Kassa's collaboration.

Researchain Logo
Decentralizing Knowledge