Steffen Köhler
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steffen Köhler.
international parallel and distributed processing symposium | 2002
Steffen Köhler; Jens Braunes; Sergej Sawitzki; Rainer G. Spallek
High code efficiency (operations per instruction) combined with a high degree of instruction level parallelism can rarely be obtained by hardwired microprocessor designs for a broad application domain. The implementation of reconfigurable execution units is a promising way to enhance code efficiency and microprocessor performance. However, the unit reconfiguration process introduces an additional dimension to the code generation phase, which complicates scheduling and may lead to code deficiencies if resource conflicts occure. This paper discusses code generation issues for a runtime-reconfigurable VLIW processor model, which combines fixed and flexible functional units (FU) in one template. Reconfigurable units (RFU) can be adapted to the application demands exploiting more coarse-grain parallelism than common instruction-level FUs. A case study illustrates the extraction of conditions for reconfigurable instructions proves scheduling possibilities for a set of common DSP benchmark algorithms. The software environment described includes a retargetable, parallelizing C compiler based on the SUIF compiler kit and a simulator, which can be used for identifying application-specific SIMD-instruction candidates and for evaluating the runtime behavior of the created object code.
reconfigurable computing and fpgas | 2005
Martin Zabel; Steffen Köhler; M. Zimmerling; T.B. Preuber; Rainer G. Spallek
This work introduces a new digital signal processor (DSP) architecture concept, which provides increased instruction-level parallelism (ILP), flexibility and scalability compared to state-of-the-art DSPs. The concept can be characterized by an enhanced RISC microprocessor with a tightly coupled reconfigurable ALU array, a vector load/store unit and a control flow manipulation unit. These units implement coarse-grain reconfigurable structures by means of switchable contexts. In contrast to previous work, context activation is performed event-driven according to the instruction pointer of the RISC microprocessor. The synchronous operation of the context-controlled functional units enables an ILP comparable to complex VLIW/SIMD processors, without introducing additional instruction overhead. The reconfigurable units can be adapted to the application demands exploiting parallelism more coarse-grain than common instruction-level functional units. To evaluate the concept, we present a parametrizable template model of the DSP architecture based on a standard ARM7 RISC microprocessor. The DSP model includes an architecture description based on our own ADL/simulation environment and a VHDL RTL model for the purpose of FPGA prototype evaluation. Further, we show detailed quantitative performance and utilization evaluation results related to the ALU array geometry, memory transfer bandwidth and the number of configuration contexts. First experiments executing DSP algorithms have indicated that the proposed architecture can exploit more of the potential application parallelism at a reasonable amount of hardware costs compared to conventional digital signal processors
automation, robotics and control systems | 2004
Jens Braunes; Steffen Köhler; Rainer G. Spallek
Coarse-grain reconfigurable processors become more and more an alternative to FPGA based fine-grain reconfigurable devices due to their reduction of configuration overhead. This provides a higher degree of flexibility for dynamically reconfigurable systems design. But, to make them more interesting for industrial applications, suitable frameworks supporting design space exploration as well as the automatic generation of dedicated design tools are still missing.
field-programmable logic and applications | 2013
Patrick Lehmann; Thomas Frank; Oliver Knodel; Steffen Köhler; Thomas B. Preußer; Rainer G. Spallek
Field-Programmable Gate Arrays, which are widely used as prototyping platforms, are intruding the domain of custom-specific high-performance hardware accelerators, which operate highly efficiently by exploiting bit- and word-level parallelism. One opportunity to feed these FPGA accelerators with Gbytes of data is the direct attachment of mass-storage devices through a Serial-ATA link. State-of-the-art SATA controllers are designed and optimized for microprocessor-based systems with a random memory access pattern. Our approach, named Weasel, introduces a modularized, platform-independent and streaming-optimized SATA controller, which supports link speeds up to 6 Gbit/s. We demonstrate how to customize the given ATA standard and how to design a generic interface for different vendor-specific multi-gigabit transceivers. Implementations of the platform-independent interface for the Xilinx Virtex-5 and the Altera StratixII GX devices prove our concept. Finally, our measurements using hard-disk and solid-state drives prove a sustained throughput of 540 Mbytes/s over a SATA 6 Gbit/s link achievable. This is close to the theoretical maximum, which is constrained by the attached devices as by the speed of their flash memory.
field programmable logic and applications | 2001
Sergej Sawitzki; Steffen Köhler; Rainer G. Spallek
During the last decade it has been shown that reconfigurable computing systems are able to compete with their nonreconfigurable counterparts in terms of performance, functional density or power dissipation. A couple of concept and prototyping studies have introduced the reconfigurability within general purpose microprocessor world. This paper introduces a prototyping environment for the design of simple reconfigurable microprocessors. The work differs from the previous approaches in the fact that a systematical way (concerning both hardware and software sides) to design, test and debug a class of reconfigurable computing cores instead of one particular application is discussed. First experiments with a simple 8 bit prototype have shown that the reconfiguration allows performance gains by a factor 2-28 for different applications. The study has discovered some directions for further architectural improvements.
international parallel processing symposium | 1999
Steffen Köhler; Sergej Sawitzki; Achim Gratz; Rainer G. Spallek
This paper compares selected digital signal processing algorithms on a variety of computing platforms in terms of achievable performance and cost. The experiments were carried out on a standard PC platform, DSP, a RISC microcontroller and on Xilinx XC4013XL FPGA. Our results confirm that general purpose microprocessors are not well suited to these tasks. Both DSP and FPGA achieve higher performance and/or better cost/performance ratios at the expense of lesser generality and a more complicated development cycle. The porting of the algorithms to DSP and FPGA requires about the same amount of work, whereby the cost/performance ratio of the reconfigurable FPGA solution is very attractive.
field-programmable logic and applications | 2004
Steffen Köhler; Jens Braunes; Thomas B. Preußer; Martin Zabel; Rainer G. Spallek
This work introduces a new concept of enhancing a RISC microprocessor with a tightly coupled reconfigurable ALU array, a vector load/store unit and a control flow manipulation unit. These units implement coarse-grain reconfigurable structures by means of switchable contexts. Context activation is performed event-driven according to the instruction pointer of the RISC microprocessor. The synchronous operation of the context controlled functional units enables instruction level parallelism (ILP) comparable to complex VLIW processors, without introducing instruction overhead. The reconfigurable units can be adapted to the application demands exploiting parallelism more coarse-grain than common instruction-level functional units. To evaluate the concept, a standard ARM RISC microprocessor was chosen to be tightly coupled to these reconfigurable units. Architecture description and simulation were performed using RECAST, a reconfiguration-enabled architecture description language and simulation tool-set. The software environment also includes a retargetable, parallelizing C compiler based on the SUIF compiler kit. First experiments executing DSP algorithms have indicated, that the proposed architecture can exploit more of the potential application parallelism than conventional VLIW processors.
applied reconfigurable computing | 2008
Steffen Köhler; Jan Schirok; Jens Braunes; Rainer G. Spallek
In this paper, we examine the efficiency of the ARRIVE architecture, a coarse-grain reconfigurable datapath extension to an embedded RISC microprocessor. It is considered platform specific, optimized for the media and communication processing domain. Detailed chip area requirements are obtained through the mapping to an UMC 0.18μm standard cell ASIC process layout. Furthermore, we present hardware utilization and power simulation results of six media/communication benchmark applications based on post-layout process information. As a result, we can recognize increased area efficiency (
GI-Jahrestagung | 2017
Steffen Köhler; Rainer G. Spallek
\frac{operations}{mm^2\cdot s}
field programmable logic and applications | 2002
Sebastian Friebe; Steffen Köhler; Rainer G. Spallek; Henrik Juhr; Klaus Künanz
) and power efficiency (