Stefan Kraemer
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Kraemer.
design automation conference | 2005
Kingshuk Karuri; M.A. Al Faruque; Stefan Kraemer; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
Current application specific instruction set processor (ASIP) design methodologies are mostly based on iterative architecture exploration that uses Architecture Description Languages (ADLs) and retargetable software development tools. However, for improved design efficiency, additional pre-architecture exploration tools are required to help narrow-down the huge design space and making coarse-grained instruction set architecture (ISA) decisions before detailed ADL modeling. Extensive application code profiling is the key in such early design stages. Based on a novel code instrumentation technology, we present a micro-profiling approach that fills the current gap between source-level and instruction-level profilers and combines their advantages w.r.t. speed and accuracy. We show how the micro-profiler is embedded into an advanced ASIP design flow and justify its use in a case study to design an MP3 decoder ASIP.
design automation conference | 2008
Lei Gao; Kingshuk Karuri; Stefan Kraemer; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
With the growing number of programmable processing elements in todays Multiprocessor System-on-Chip (MPSoC) designs, the synergy required for the development of the hardware architecture and the software running on them is also increasing. In MPSoC development environment, changes in the hardware architecture can bring in extensive re-partitioning or re-parallelization of the software architecture. Fast and accurate functional simulation and performance estimation techniques are needed to cope with this co-design problem at the early phases of MPSoC design space exploration. The current paper addresses this issue by introducing a framework which combines hybrid simulation, cache simulation and online trace-driven replay techniques to accurately predict performance of programmable elements in an MPSoC environment. The resulting simulation technique can easily cope with the continuous re-organizations of software architectures during an Instruction Set Simulator (ISS) based design process. Experimental results show that this framework can improve system simulation speed by 3-5X on average while achieving accuracy closely comparable to traditional ISSes.
international conference on hardware/software codesign and system synthesis | 2007
Stefan Kraemer; Lei Gao; Jan Henrik Weinstock; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
Instruction Set Simulation (ISS) is widely used in system evaluation and software development for embedded processors. Despite the significant advancements in the ISS technology, it still suffers from low simulation speed compared to real hardware. Especially for embedded software developers simulation speed close to real time is important in order to efficiently develop complex software. In this paper a novel, retargetable, hybrid simulation framework (HySim) is presented which allows switching between native code execution and ISS-based simulation. To reach a certain state of an application as fast as possible, all platform-independent parts of the application are directly executed on the host, while the platform dependent code executes on the ISS. During the native code execution a performance estimation is conducted. A case study shows that speed-ups ranging from 7× to 72× can be achieved without compromising debugging accuracy. The performance estimation during native code execution shows an average error of 9.5%.
compilers, architecture, and synthesis for embedded systems | 2007
Lei Gao; Stefan Kraemer; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
Instruction Set Simulators (ISSes) are important tools for cross-platform software development. The simulation speed is a major concern and many approaches have been proposed to improve the performance of ISSes. A prevalent technique is compiled simulation, which translates target programs into host instructions. But orders of magnitude of speed deterioration is inevitable since the difference between target and host Instruction Set Architectures (ISAs) can be large. An alternative is to emulate the program without sticking to binary compatibility. The performance problem is solved by using native execution. However, these emulators either require a special programming language, or a given Application Programming Interface (API). Last but not least, it is not trivial to integrate an emulator into a system simulator (which provides devices, external memory, etc., that the embedded programmers do care). In this paper, we propose a fast and generic hybrid simulation approach using virtualization technique to accelerate simulation and simulator-based debugging of C programs. A novel virtual coprocessor (VCP) is introduced as a processing element which executes C functions at high speed. This approach is C89 compliant and compatible with third party libraries and platform dependent code. It is also retargetable and can be integrated with existing ISSes. Two different ISAs are supported at present: MIPS and mAgic DSP. The average execution speed of the coprocessor is about 100 million simulated instructions per second.
international symposium on system-on-chip | 2009
Stefan Kraemer; Rainer Leupers; Dietmar Petras; Thomas Philipp
The ability to restore a Virtual Platform from a previously saved simulation state can considerably shorten the typical edit-compile-debug cycle for software developers and therefore enhance productivity. This paper presents a Checkpoint/Restore solution specifically tailored towards the needs of SystemC-based Virtual Platforms. Apart from restoring the simulation process from a checkpoint image, it also takes care of re-attaching debuggers and interactive GUIs to the restored Virtual Platform. The checkpointing is handled automatically for most of the SystemC modules, only the usage of host OS resources requires user provision. Two concrete code examples demonstrate that the required changes to an existing Virtual Platform are a simple developer task consisting of minor source code modifications. A case study based on the SHAPES Virtual Platform is conducted to investigate the applicability of the proposed framework in a realistic system environment.
design, automation, and test in europe | 2007
Stefan Kraemer; Rainer Leupers; Gerd Ascheid; Heinrich Meyr
SIMD instructions are used to speed up multimedia applications in high performance embedded computing. Vendors often use proprietary platforms which are incompatible with others. Therefore, porting software is a very complex and time consuming task. Moreover, lots of existing embedded processors do not have SIMD extensions at all. But they do provide a wide data path which is 32-bit or wider. Usually, multimedia applications work on short data types of 8 or 16-bit. Thus, only the lower bits of the data path are used and therefore only a fraction of the available computing power is exploited for such algorithms. This paper discusses the possibility to make use of the upper bits of the data path by emulating true SIMD instructions. These instructions are implemented purely in software using a high level language such as C. Therefore, the application can be modified by making use of source code transformations which are inherently portable. The benefit of this approach is that the computing resources are used more efficiently without compromising the portability of the code. Experiments have shown that a significant speedup can be obtained by this approach
application-specific systems, architectures, and processors | 2005
Mohammad Mostafizur; Rahman Mozumdar; Kingshuk Karuri; Anupam Chattopadhyay; Stefan Kraemer; Hanno Scharwaechter; Heinrich Meyr; Gerd Ascheid; Rainer Leupers
The growth of the Internet in the last decade has made current networking applications immensely complex. Systems running such applications need special architectural support to meet the tight constraints of power and performance. This paper presents a case study of architecture exploration and optimization of an application specific instruction set processor (ASIP) for networking applications. The case study particularly focuses on the effects of instruction set customization for applications from different layers of the protocol stack. Using a state-of-the-art VLIW processor as the starting template, and architecture description language (ADL) based architecture exploration tools, this case study suggests possible instruction set and architectural modifications that can speed-up some networking applications up to 6.8 times. Moreover, this paper also shows that there exist very few similarities between diverse networking applications. Our results suggest that, it is extremely difficult to have a common set of architectural features for efficient network protocol processing and, ASIPs with specialized instruction sets can become viable solutions for such an application domain.
International Journal of Embedded and Real-time Communication Systems | 2011
Rainer Leupers; Stefan Kraemer; Dietmar Petras; Thomas Philipp; Andreas Hoffmann
The ability to restore a virtual platform from a previously saved simulation state can considerably shorten the typical edit-compile-debug cycle for software developers and therefore enhance productivity. For SystemC based virtual platforms VP, dedicated checkpoint/restore C/R solutions are required, taking into account the specific characteristics of such platforms. Apart from restoring the simulation process from a checkpoint image, the proposed checkpoint solution also takes care of re-attaching debuggers and interactive GUIs to the restored virtual platform. The checkpointing is handled automatically for most of the SystemC modules, only the usage of host OS resources requires user provision. A process checkpointing based C/R has been selected in order to minimize the adaption required for existing VPs at the expense of large checkpoint sizes. This drawback is overcome by introducing an online compression to the checkpoint process. A case study based on the SHAPES Virtual Platform is conducted to investigate the applicability of the proposed framework as well as the impact of checkpoint compression in a realistic system environment.
Archive | 2010
Rainer Leupers; Stefan Kraemer; Lei Gao; Christoph Schumacher
Multi-processor systems-on-chip (MPSoCs) are gaining a lot of attraction due to their good performance to power ratio. In order to cope with the complexity of such systems, early availability of full system simulation is of high importance. The simulation time is increasing with the growing number of processors. Therefore, scalable simulation techniques are required to mitigate this problem. Two new concepts to increase the simulation speed are becoming popular. First, raising the abstraction level increases simulation speed at the expense of a lower simulation accuracy. Second, exploiting all available processor cores in today’s host systems increases the simulation speed without sacrificing the accuracy. Depending on the individual use case, one technique alone or a mixture of both techniques can be applied to create a fast simulation environment which is suitable for design space exploration, software development, performance estimation, and debugging.
Audio Engineering Society Conference: 32nd International Conference: DSP For Loudspeakers | 2007
Iuliana Bacivarov; Piergiovanni Bazzana; Michael Beckinger; Jianjiang Ceng; Wolfgang Haid; Kai Huang; Stefan Kraemer; Rainer Leupers; Pier Stanislao Paolucci; Thomas Sporer; Lothar Thiele; P. Vicini