Dmitrij Kissler
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dmitrij Kissler.
field-programmable technology | 2006
Dmitrij Kissler; Frank Hannig; Alexey Kupriyanov; Jürgen Teich
In this paper a new class of highly parameterizable coarse-grained reconfigurable architectures called weakly programmable processor arrays is discussed. The main advantages of the proposed architecture template are the possibility of partial and differential reconfiguration and the systematical classification of different architectural parameters which allow to trade-off flexibility and hardware cost. The applicability of our approach is tested in a case study with different interconnect topologies on an FPGA platform. The results show substantial flexibility gains with only marginal additional hardware cost
software and compilers for embedded systems | 2007
Alexey Kupriyanov; Dmitrij Kissler; Frank Hannig; Jürgen Teich
In this paper we present a new approach for generating high-speed optimized event-driven instruction set level simulators for adaptive massively parallel processor architectures. The simulator generator is part of a methodology for the systematic mapping, evaluation, and exploration of massively parallel processor architectures that are designed for special purpose applications in the world of embedded computers. The generation of high-speed cycle-accurate simulators is of utmost importance here, because they are directly used both for parallel processor architecture debugging and evaluation purposes, as well as during time-consuming architecture/compiler co-exploration. We developed a modeling environment which automatically generates a C++ simulation model either from a graphical input or directly from an XML-based architecture description. Here, we focus on the underlying event-driven simulation model and present our modeling environment, in particular the features of the graphical parallel processor architecture editor and the automatic instruction set level simulator generator. Finally, in a case-study, we demonstrate the pertinence of our approach by simulating different processor arrays. The superior performance of the generated simulators compared to existing simulators and simulator generation approaches is shown.
IEEE Embedded Systems Letters | 2011
Dmitrij Kissler; Daniel Gran; Zoran Salcic; Frank Hannig; Jürgen Teich
This letter presents a systematic approach to efficiently handle a very large number of power domains in modern coarse-grained reconfigurable arrays in order to tightly match the different computational demands of processed algorithms with corresponding power consumption. It is based on a new highly scalable and generic power control network and additionally uses the state-of-the-art common power format based front-to-backend design methodology for a fully automated implementation. The power management is transparent to the user and is seamlessly integrated into the overall reconfiguration process: reconfiguration-controlled power gating. Furthermore, for the first time, a coarse-grained reconfigurable case study design with as many as 24 switchable power domains with detailed results on power savings and overheads is presented. The application of the proposed technique results in 60% active leakage and 90% standby leakage power reduction for several digital signal processing algorithms.
Microprocessors and Microsystems | 2009
Hritam Dutta; Dmitrij Kissler; Frank Hannig; Alexey Kupriyanov; Jürgen Teich; Bernard Pottier
New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor.
automation, robotics and control systems | 2007
Alexey Kupriyanov; Frank Hannig; Dmitrij Kissler; Jürgen Teich; Julien Lallet; Olivier Sentieys; Sébastien Pillement
In this paper, we present a new concept for modeling of interconnection networks in the field of massively parallel processor embedded architectures. The main focus of the paper is on two interconnection concepts, namely, interconnect-wrapper and DyRIBox definitions of reconfigurable interconnection networks. We compare both interconnection concepts against each other and formally prove their equality. Both concepts allow to model many different reconfigurable inter-processor networks efficiently. Furthermore, we point out how to define the interconnect using an architecture description language for massively parallel processor architectures called MAML. Finally, we demonstrate the pertinence of our approach by modeling and evaluation of different reconfigurable interconnect topologies.
power and timing modeling optimization and simulation | 2009
Dmitrij Kissler; Andreas Strawetz; Frank Hannig; Jürgen Teich
Coarse-grained reconfigurable architectures deliver high performance and energy efficiency for computationally intensive applications like mobile multimedia and wireless communication. This paper deals with the aspect of power-efficient dynamic reconfiguration control techniques in such architectures. Proper clock domain partitioning with custom clock gating combined with automatic clock gating resulted in a 35% total power reduction. This is more than a threefold as compared to the single clock gating techniques applied separately. The corresponding case study application with 0.064 mW/MHz and 124 MOPS/mW power efficiency outperforms the major coarse-grained and general purpose embedded processor architectures by a factor of 1.7 to 28.
Journal of Low Power Electronics | 2011
Dmitrij Kissler; Frank Hannig; Jürgen Teich
The presented evaluation framework allows extremely fast but still accurate power, area, and latency characterization of different design alternatives in a multidimensional design space of highly parameterized coarse-grained reconfigurable processor arrays. For the first time, we propose to use a relational database system, managing table-based, probabilistic macro-models, constructed with the help of a new non-uniform parameter sampling technique for the average power estimation of corresponding processor arrays on the architectural level. This leads to power estimation speeds in the milliseconds range within 10% estimation error compared to a state-of-the-art commercial gate-level post-layout power estimator. Furthermore, our approach fully accounts for such important power reduction techniques, like clock gating and operand isolation, which are commonly ignored otherwise. The feasibility and accuracy were tested in several case study implementations in a commercial 90 nm standard cell library. Experimental results show a superior scalability of the proposed technique: heterogeneous 100-core coarse-grained processor array with ≈0.5*10 6 logic gates circuit complexity, implementing a signal processing algorithm, can be analyzed for power and area within less than a minute on a standard consumer PC. Since currently there exists no published architecture-level power/area estimation framework for coarse-grained, software-programmable architectures, our work tries to address this shortcoming.
Processor Description Languages#R##N#Applications and Methodologies | 2008
Alexey Kupriyanov; Frank Hannig; Dmitrij Kissler; Jürgen Teich
Publisher Summary This chapter focuses on machine markup language (MAML), an architecture description language (ADL) used for modeling and simulation of both single and multiprocessor architectures. It is based on XML, which allows the characterization of the resources of complex processor architectures at both structural and behavioral levels in a convenient manner. The MAML has its roots in designing application-specific instruction set processors and an MAML description contains a clearly arranged list of the architectures resources such as functional units, pipeline stages, and register files; operation sets such as binding possibilities of operations to functional units and operand directions; communication structures such as buses and ports; and timing behavior such as latency of operations and behavior of multicycle operations. The extracted parameters are used for a fast interactive cycle-accurate simulation and for compiler retargeting. Finally, the processor architecture described within MAML is automatically synthesized for rapid prototyping.
international symposium on system-on-chip | 2006
Dmitrij Kissler; Frank Hannig; Alexey Kupriyanov; Jürgen Teich
Growing complexity and speed requirements in modern application areas such as wireless communication and multimedia in embedded devices demand for flexible and efficient parallel hardware architectures. The inherent parallelism in these application fields has to be reflected at the hardware level to achieve high performance. Coarse-grained reconfigurable architectures support a high degree of parallelism at multiple levels. In this paper technology-independent hardware cost analysis for a new class of highly parameterizable coarse-grained reconfigurable architectures called weakly programmable processor arrays is performed
field-programmable logic and applications | 2008
Sven Eisenhardt; Thomas Schweizer; J.A. de Oliveira Filho; Tobias Oppold; Wolfgang Rosenstiel; Alexander Thomas; Jürgen Becker; Frank Hannig; Dmitrij Kissler; Hritam Dutta; Juergen Teich; Heiko Hinkelmann; Peter Zipf; Manfred Glesner
In the last years, aside from fine-grained reconfigurable architectures such as FPGAs, coarse-grained reconfigurable architectures (CGRAs), which typically have building blocks of a fixed bit-width (8 bit, 16 bit, etc.), have gained in importance in academia as well as in industry. CGRAs are usually used for domain-specific computations and have advantages over traditional FPGAs in terms of area and power cost, performance, and reconfiguration time. Thus, architectures with coarse-grained reconfiguration features have also been studied in projects (Sec. 1, 2, 4) within the priority program Reconfigurable Computing Systems and the project CoMap (Sec. 3), which are all sponsored by the German science foundation.