Rainer Leupers
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rainer Leupers.
design automation conference | 2002
Achim Nohl; Gunnar Braun; Oliver Schliebusch; Rainer Leupers; Heinrich Meyr; Andreas Hoffmann
Today, designers of next-generation embedded processors and software are increasingly faced with short product lifetimes. The resulting time-to-market constraints are contradicting the continually growing processor complexity. Nevertheless, an extensive design-space exploration and product verification is indispensable for a successful market launch. In the last decade, instruction-set simulators have become an essential development tool for the design of new programmable architectures. Consequently, the simulator performance is a key factor for the overall design efficiency. Motivated by the extremely poor performance of commonly used interpretive simulators, research work on fast compiled instruction-set simulation was started ten years ago. However, due to the restrictiveness of the compiled technique, it has not been able to push through in commercial products. In this paper, we tie up with our previous research on retargetable, compiled simulation techniques, and provide a discussion about their benefits and limitations using a particular compiled scheme, static scheduling, as an example. As a conclusion, we eventually present a novel retargetable simulation technique, which combines the performance of traditional compiled simulators with the flexibility of interpretive simulation. This technique is not limited to any class of architectures or applications and can be utilized from architecture exploration up to end-user software development. We demonstrate workflow and applicability of the so-called just-in-time cache-compiled simulation technique by means of state-of-the-art real-world architectures.
international conference on computer aided design | 1996
Rainer Leupers; Peter Marwedel
This paper presents DSP code optimization techniques, which originate from dedicated memory address generation hardware. We define a generic model of DSP address generation units. Based on this model, we present efficient heuristics for computing memory layouts for program variables, which optimize utilization of parallel address generation units. Improvements and generalizations of previous work are described, and the efficacy of the proposed algorithms is demonstrated through experimental evaluation.
Archive | 2002
Andreas Hoffmann; Heinrich Meyr; Rainer Leupers
Foreword. Preface. 1: Introduction. 1. Processor Categories. 2. Advent of ASIPs in System-on-Chip Design. 3. Organization of this Book. 2: Traditional Asip Design Methodology. 1. Related Work. 2. Motivation of this Work. 3: Processor Models For Asip Design. 1. LISA Language. 2. Model Requirements of Tools. 3. Abstraction Levels. 4. Concluding Remarks. 4: Lisa Processor Design Platform. 1. Hardware Designer Platform. 2. Software Designer Platform. 3. System Integrator Platform. 4. Concluding Remarks. 5: Architecture Exploration. 1. From Specification to Implementation. 2. Architecture Exploration Using LISA. 3. Concluding Remarks. 6: Architecture Implementation. 1. The ICORE Architecture. 2. Architecture Generation from LISA. 3. Case Study. 4. Concluding Remarks. 7: Software Tools For Application Design. 1. Code Generation Tools. 2. Simulation. 3. Debugging. 4. Case Studies. 5. Concluding Remarks. 8: System Integration And Verification. 1. Platform-Based Design. 2. Enabling Platform-Based Design. 3. Software Simulator Integration. 4. Case Study: CoCentric System Studio. 5. Concluding Remarks. 9: Summary And Outlook. 1. Processor Modeling. 2. Architecture Exploration. 3. Software Development Tools. 4. Architecture Implementation. 5. Concluding Remarks. Appendices: Abbreviations.Grammar of the LISA Language. Sample ARM7 LISA Model. The ICORE Architecture. List of Figures. List of Examples. List of Tables. Bibliography. About the Authors.
design automation conference | 2008
Jianjiang Ceng; Jeronimo Castrillon; Weihua Sheng; Hanno Scharwächter; Rainer Leupers; Gerd Ascheid; Heinrich Meyr; Tsuyoshi Isshiki; Hiroaki Kunieda
In the past few years, MPSoC has become the most popular solution for embedded computing. However, the challenge of programming MPSoCs also comes as the biggest side-effect of the solution. Especially, when designers have to face the legacy C code accumulated through the years, the tool support is mostly unsatisfactory. In this paper, we propose an integrated framework, MAPS, which aims at parallelizing C applications for MPSoC platforms. It extracts coarse-grained parallelism on a novel granularity level. A set of tools have been developed for the framework. We will introduce the major components and their functionalities. Two case studies will be given, which demonstrate the use of MAPS on two different kinds of applications. In both cases the proposed framework helps the programmer to extract parallelism efficiently.
Archive | 1997
Rainer Leupers
Foreword. Preface. 1. Introduction. 2. Processor Modelling. 3. Instruction-Set Extraction. 4. Code Generation. 5. Instruction-Level Parallelism. 6. The Record Compiler. 7. Conclusions. References. Index.
design, automation, and test in europe | 2005
Torsten Kempf; Malte Doerper; Rainer Leupers; Gerd Ascheid; Heinrich Meyr; Tim Kogel; Bart Vanthournout
Heterogeneous multi-processor SoC (MP-SoC) platforms bear the potential to optimize conflicting performance, flexibility and energy efficiency constraints as imposed by demanding signal processing and networking applications. However, in order to take advantage of the available processing and communication resources, an optimal mapping of the application tasks on to the platform resources is of crucial importance. We propose a SystemC-based simulation framework, which enables the quantitative evaluation of application-to-platform mappings by means of an executable performance model. The key element of our approach is a configurable event-driven virtual processing unit to capture the timing behavior of multi-processor/multi-threaded MP-SoC platforms. The framework features an XML-based declarative construction mechanism of the performance model to accelerate navigation significantly in large design spaces. The capabilities of the proposed framework in terms of design space exploration is presented by a case study of a commercially available MP-SoC platform for networking applications. Focussing on the application to architecture mapping, our introduced framework highlights the potential for optimization of an efficient design space exploration environment.
design, automation, and test in europe | 2006
Torsten Kempf; Kingshuk Karuri; Stefan Wallentowitz; Gerd Ascheid; Rainer Leupers; Heinrich Meyr
The increasing demands of high-performance in embedded applications under shortening time-to-market has prompted system architects in recent time to opt for multi-processor systems-on-chip (MP-SoCs) employing several programmable devices. The programmable cores provide a high amount of flexibility and reusability, and can be optimized to the requirements of the application to deliver high-performance as well. Since application software forms the basis of such designs, the need to tune the underlying SoC architecture for extracting maximum performance from the software code has become imperative. In this paper, we propose a framework that enables software development, verification and evaluation from the very beginning of MP-SoC design cycle. Unlike traditional SoC design flows where software design starts only after the initial SoC architecture is ready, our framework allows a co-development of the hardware and the software components in a tightly coupled loop where the hardware can be refined by considering the requirements of the software in a stepwise manner. The key element of this framework is the integration of a fine-grained software instrumentation tool into a system-level-design (SLD) environment to obtain accurate software performance and memory access statistics. The accuracy of such statistics is comparable to that obtained through instruction set simulation (ISS), while the execution speed of the instrumented software is almost an order of magnitude faster than ISS. Such a combined design approach assists system architects to optimize both the hardware and the software through fast exploration cycles, and can result in far shorter design cycles and high productivity. We demonstrate the generality and the efficiency of our methodology with two case studies selected from two most prominent and computationally intensive embedded application domains
international conference on parallel architectures and compilation techniques | 2000
Rainer Leupers
Recent digital signal processors (DSPs) show a homogeneous VLTW-like data path architecture, which allows C compilers to generate efficient code. However, still some special restrictions have to be obeyed in code generation for VLIW DSPs. In order to reduce the number of register file ports needed to provide data for multiple functional units working in parallel, the DSP data path may be clustered into several sub-paths, with very limited capabilities of exchanging values between the different clusters. An example is the well-known Texas Instruments C6201 DSP. For such an architecture, the tasks of scheduling and partitioning instructions between the clusters are highly interdependent. This paper presents a new instruction scheduling approach, which in contrast to earlier work, integrates partitioning and scheduling into a single technique, so as to achieve a high code quality. We show experimentally that the proposed technique is capable of generating more efficient code than a commercial code generator for the TI C6201.
european design and test conference | 1997
Rainer Leupers; Peter Marwedel
Besides high code quality, a primary issue in embedded code generation is retargetability of code generators. This paper presents techniques for automatic generation of code selectors from externally specified processor models. In contrast to previous work, our retargetable compiler RECORD does not require tool-specific modelling formalisms, but starts from general HDL processor models. From an HDL model, all processor aspects needed for code generation are automatically derived. As demonstrated by experimental results, short turnaround times for retargeting are achieved, which permits study of the HW/SW trade-off between processor architectures and program execution speed.
Archive | 2013
Shuvra S. Bhattacharyya; Ed F. Deprettere; Rainer Leupers; Jarmo Takala
Handbook of Signal Processing Systemsis organized in three parts. The first part motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; the second part discusses architectures for implementing these applications; the third part focuses on compilers and simulation tools, describes models of computation and their associated design tools and methodologies. This handbook is an essential tool for professionals in many fields and researchers of all levels.