Is this you? Create Your Porfile

Ralf Koenig

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralf Koenig is active.

Explore More

Publication

Featured researches published by Ralf Koenig.

design, automation, and test in europe | 2010

KAHRISMA: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture

Ralf Koenig; Lars Bauer; Timo Stripf; Muhammad Shafique; Waheed Ahmed; Juergen Becker; Jörg Henkel

Facing the requirements of next generation applications, current approaches of embedded systems design will soon hit the limit where they may no longer perform efficiently. The unpredictable nature and diverse processing behavior of future applications requires to transgress the barrier of tailor-made, application-/domain-specific embedded system designs. As a consequence, next generation architectures for embedded systems have to react much more flexible to unforeseeable run-time scenarios. In this paper we present our innovative processor architecture concept KAHRISMA (KArlsruhes Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array). It tightly integrates coarse- and fine-grained run-time reconfigurable fabrics that can incorporate to realize hardware acceleration for computationally complex algorithms. Furthermore, the fabrics can be combined to realize different Instruction Set Architectures that may execute in parallel. With the help of an encrypted H.264 en-/decoding case study we demonstrate that our novel KAHRISMA architecture will deliver the required flexibility to design future-proof embedded systems that are not limited to a certain computational domain.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

A Scalable Microarchitecture Design that Enables Dynamic Code Execution for Variable-Issue Clustered Processors

Ralf Koenig; Timo Stripf; Jan Heisswolf; Juergen Becker

The dynamic run-time complexity of embedded applications is steadily increasing. Currently, only specialized Multiprocessor System-on-Chip (MPSoC) architectures can deliver the required processing power as well as energy efficiency. Although todays MPSoCs incorporate different, potentially reconfigurable cores, their ability to dynamically balance exploitable instruction-, data-, and thread-level parallelism is still very limited. In this paper, we present a novel coarse-grained reconfigurable architecture that can be adapted to operate on different computation granularities and types of parallelism at run time, depending on the applications needs. Our contributions comprise different micro architectural techniques realizing dynamic operation execution for Run-time Scalable Issue Width (RSIW) processor instances. These enable to adapt on demand the issue width of out-of-order RSIW processor instances. Our results show that significant performance improvements can be obtained by our dynamic operation execution technique compared to atomic instruction execution.

design, automation, and test in europe | 2012

A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture

Timo Stripf; Ralf Koenig; Juergen Becker

Processor architectures that are capable to reconfigure their instruction set and instruction format dynamically at run time offer a new flexibility exploiting instruction level parallelism vs. thread level parallelism. Based on the characteristics of an application or thread the instruction set architecture (ISA) can be adapted to increase performance or reduce resource/power consumption. To benefit from this run-time flexibility automatic selection of an appropriate ISA for each function of a given application is envisioned. This demands a cycle-accurate simulator that is capable of measuring the performance characteristics of an ISA dependent on the target application. However, simulation speed of a cycle-accurate simulator of our reconfigurable VLIW-like processor instances featuring dynamic operation execution would become relatively slow due to the superscalar-like microarchitecture. Within this paper we address this problem by presenting our cycle-approximate simulator approach containing a heuristic dynamic operation execution and memory model that provides a good trade-off between performance and accuracy. Additionally, the simulator features measurement of instruction level parallelism (ILP) that could be theoretically exploited by VLIW processor instances running on our architecture. The theoretical ILP could be used as an indicator for the ISA selection process without the need to simulate any combination of the different ISAs and applications.

international conference on embedded computer systems: architectures, modeling, and simulation | 2011

Architecture design space exploration of run-time scalable issue-width processors

Ralf Koenig; Timo Stripf; Jan Heisswolf; Juergen Becker

Reconfigurable chip multiprocessors realizing very long instruction word (VLIW) processors of dynamically-scalable issue width enable resource-aware adaptation to diverse processing requirements. The execution performance of such clustered VLIW processors is significantly influenced by different design parameters of the fundamental processing cores. In this paper we present a design space exploration addressing the following design parameters: the register file size, number of issue slots, inter cluster move bandwidth, and latency. We thereby investigate the quantitative performance impact of each parameter as well their interdependency for 18 benchmarks of different processing domains. Our results show that the cluster configuration significantly influences the processing performance: the performance loss compared to theirs unclustered architectures can be as low as 2% but also may exceed 100%.

asia and south pacific design automation conference | 2012

Hardware prototyping of novel invasive multicore architectures

Jürgen Becker; Stephanie Friederich; Jan Heisswolf; Ralf Koenig; David May

The sustained advance in technology will enable integrating hundreds of processing cores on a single die in near future. However, it already can be foreseen that the management of the resources of such large systems will not scale in the same way as the hardware using todays entirely software based and centralized management approaches. The invasive paradigm addresses this problem and proposes concepts to enable resource awareness and scalability - especially focusing the resource management perspective - in future multicore systems. These concepts are based on distributed and software-hardware partitioned resource management strategies. High level management decision that are made by software thereby trigger lower level management strategies that are autonomously carried out in hardware. Sufficiently accurate modeling of the overall invasive system is required to study and optimize such a decentralized, software-hardware partitioned control loop where decisions significantly depend on runtime dynamic effects. Software based simulation cannot deliver the required speed or accuracy making FPGA based prototyping of invasive systems necessary. This paper describes our prototyping concepts and discusses possible implementation alternatives for invasive multicore architectures.

computational science and engineering | 2012

A Compilation- and Simulation-Oriented Architecture Description Language for Multicore Systems

Timo Stripf; Oliver Oey; Thomas Bruckschloegl; Ralf Koenig; George Goulas; Panayiotis Alefragis; Nikolaos S. Voros; Jordy Potman; Kim Sunesen; Steven Derrien; Olivier Sentieys; Juergen Becker

Todays reconfigurable multicore architectures become more and more complex. They consist of several processing units, not necessarily identical, different interconnecting modules, memories and possibly other components. Programming such kind of architectures requires deep knowledge of the underlying hardware and is thus very time consuming and error prone. On the other hand, automated tool chains that target multicore architectures are typically tailored to one specific architecture type and require a platform-specific programming model. Within the EU FP7 project Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) we address this shortcoming by a flexible tool chain featuring platform-independence on the architecture level as well as on the programming model. Thus, the tool chain is kept retarget able by using a novel architecture description language (ADL) for multiprocessor system on chip devices. Applications are expressed using the Scilab programming language allowing the end user to develop optimized programs without specific knowledge of the target architectures. Thereby, the ADL guides the code generation of the integrated tool flow through coarse- and fine grain parallelism extraction, parallel code optimizations and multicore simulations.

design, automation, and test in europe | 2008

A novel recursive algorithm for bit-efficient realization of arbitrary length inverse modified cosine transforms

Ralf Koenig; Timo Stripf; Juergen Becker

In this paper a novel approach for inverse modified cosine transform (IMDCT) computation is presented, based on a recursive algorithm. Due to its nature, this IMDCT calculation can be performed on a reduced bit width datapath without loss of accuracy, compared to alternative recursive architectures. Combined with the regular structure, the approach allows for a much more area efficient VLSI implementation compared to existing systems. Due to its bit efficiency this approach is attractive to be implemented on reconfigurable architectures of the DSP domain as well.

reconfigurable communication centric systems on chip | 2012

A flexible approach for compiling scilab to reconfigurable multi-core embedded systems

Timo Stripf; Oliver Oey; Thomas Bruckschloegl; Ralf Koenig; Michael Huebner; Jürgen Becker; George Goulas; Panayiotis Alefragis; Nikolaos S. Voros; Gerard K. Rauwerda; Daniel Menard; Olivier Sentieys; Nikolaos Kavvadias; Grigoris Dimitroulakos; Kostas Masselos; Diana Goehringer; Thomas Perschke; Dimitrios Kritharidis; Nikolaos Mitas

The mapping process of high performance embedded applications to todays reconfigurable multiprocessor System-on-Chip devices suffers from a complex toolchain and programming process. Thus, the efficient programming of such architectures in terms of achievable performance and power consumption is limited to experts only. Enabling them to nonexperts requires a simplified programming process that hides the complexity of the underlying hardware - introduced by software parallelism of multiple cores and the flexibility of reconfigurable architectures - to the end user. The Architecture oriented paraLlelization for high performance embedded Multi-core systems using scilAb (ALMA) European project aims to bridge these hurdles through the introduction and exploitation of a Scilab- and architecture-description-language-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high level of abstraction. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development costs, and shorter time-to-market.

international conference on embedded computer systems: architectures, modeling, and simulation | 2011