Maximilian Odendahl | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maximilian Odendahl is active.

Explore More

Publication

Featured researches published by Maximilian Odendahl.

parallel computing | 2014

A compiler infrastructure for embedded heterogeneous MPSoCs

Weihua Sheng; Stefan Schürmans; Maximilian Odendahl; Mark Bertsch; Vitaliy Volevach; Rainer Leupers; Gerd Ascheid

Programming heterogeneous MPSoCs (Multi-Processor Systems on Chip) is a grand challenge for embedded SoC providers and users today. In this paper, we argue the need for and significance of positioning the language and tool design from the perspective of practicality to address this challenge. We motivate, describe and justify such a practical design of a compilation framework for heterogeneous MPSoCs targeting the domain of streaming applications, named MAPS (MPSoC Application Programming Studio). MAPS defines a clean, light-weight C language extension to capture streaming programming models. A retargetable source-to-source compiler is developed to provide key capabilities to construct practical compilation frameworks for real-world, complex MPSoC platforms. Our results have shown that MAPS is a promising compiler infrastructure that enables programming of heterogeneous MPSoCs and increases productivity of MPSoC software developers.

international symposium on system-on-chip | 2011

Automatic calibration of streaming applications for software mapping exploration

Weihua Sheng; Stefan Schürmans; Maximilian Odendahl; Rainer Leupers; Gerd Ascheid

This article investigates how to construct fast and accurate MPSoC virtual platforms to enable software mapping exploration. The proposed framework can fully automate the calibration of abstract MPSoC virtual platforms for mapping streaming applications.

2012 5th European DSP Education and Research Conference (EDERC) | 2012

Automated code generation of streaming applications for C6000 multicore DSPS

Maximilian Odendahl; Weihua Sheng; Miguel Angel Aguilar; Rainer Leupers; Gerd Ascheid

With the increasing complexity of modern, state-of-the-art Multiprocessor Systems on Chip (MPSoCs), recent trends in embedded software design show a rising interest in using dataflow models of computation for parallel programming. These models of computation do not only match the requirements of streaming applications found in the telecommunication, wireless and multimedia domain, but also provide an easier entry path in dealing with parallelism by reusing large parts of the sequential programming paradigm. As a consequence, it is highly desired to have an automated approach for bridging the gap between an applications specification using a dataflow model of computation and the actual binary which is executed on the hardware. In this paper, we present a toolflow which completely automates the process of code generation for Texas Instruments C6000 high performing, multicore, Digital Signal Processor (DSP) platforms. The mapping of individual processes to specific cores is achieved by a user-defined mapping file. This automated code generation for streaming applications opens up a wide range of possibilities as a research and educational platform.

design, automation, and test in europe | 2014

Optimized buffer allocation in multicore platforms

Maximilian Odendahl; Andrés Goens; Rainer Leupers; Gerd Ascheid; Benjamin Ries; Berthold Vöcking; Tomas Henriksson

With the availability of advanced MPSoC and emerging Dynamic RAM (DRAM) interface technologies, an optimal allocation of logical data buffers to physical memory cannot be handled manually anymore due to the huge design space. An allocation does not only need to decide between an on-or off-chip memory, but also needs to take an increasing number of available memory channels, different bandwidth capacities and several routing possibilities into account. We formalize this problem and introduce a Mixed Integer Linear Programming (MILP) model based on two different optimization criteria. We implement the MILP model into a retargetable tool and present a case study with representative data of the Long-Term-Evolution (LTE) standard to show the real-life applicability of our approach.

international symposium on system-on-chip | 2013

Split-cost communication model for improved MPSoC application mapping

Maximilian Odendahl; Jeronimo Castrillon; Vitaliy Volevach; Rainer Leupers; Gerd Ascheid

Automated mapping of dataflow applications to state-of-the-art, heterogeneous Multiprocessor Systems on Chip (MPSoCs) with complex interconnects and communication means is an ongoing research endeavor. We implement, measure and analyze three different communication libraries for a representative, off-the-shelf platform of this kind. The results of the analysis are used to show the need of a new cost model to properly characterize inter-task communication. Afterwards, this paper presents an algorithm to solve the mapping problem jointly for computation and communication using this cost model. A case study with four real streaming applications shows that the obtained mapping is able to reduce the execution time. Compared to a mapping decision where all channels are mapped to shared memory, the makespan fell down up to 10% due to an automated selection of a more appropriate communication library.

Journal of Systems Architecture | 2016

An optimal allocation of memory buffers for complex multicore platforms

Andrés Goens; Jeronimo Castrillon; Maximilian Odendahl; Rainer Leupers

In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear programming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.

international parallel and distributed processing symposium | 2015

Buffer Allocation Based On-Chip Memory Optimization for Many-Core Platforms

Maximilian Odendahl; Andrés Goens; Rainer Leupers; Gerd Ascheid; Tomas Henriksson

The problem of finding an optimal allocation of logical data buffers to memory has emerged as a new research challenge due to the increased complexity of applications and new emerging Dynamic RAM (DRAM) interface technologies. This new opportunity of a large off-chip memory accessible by an ample bandwidth allows to reduce the on-chip Static RAM (SRAM) significantly and save production cost of future manycore platforms. We thus propose changes to an existing work that allows to uniformly reduce the on-chip memory size for a given application. We additionally introduce a novel linear programming model to automatically generate all necessary on chip memory sizes for a given application based on an optimal allocation of data buffers. An extension allows to further reduce the required on-chip memory in multi-application scenarios. We conduct a case study to validate all our models and show the applicability of our approach.

high performance computing and communications | 2015

Extraction of Kahn Process Networks from While Loops in Embedded Software

Miguel Angel Aguilar; Juan Fernando Eusse; Rainer Leupers; Gerd Ascheid; Maximilian Odendahl

Many embedded applications such as multimedia, signal processing and wireless communications present a streaming processing behavior. In order to take full advantage of modern multi-and many-core embedded platforms, these applications have to be parallelized by describing them in a given parallel Model of Computation (MoC). One of the most prominent MoCs is Kahn Process Network (KPN) as it allows to express multiple forms of parallelism and it is suitable for efficient mapping and scheduling onto parallel embedded platforms. However, describing streaming applications manually in a KPN is a challenging task. Especially, since they spend most of their execution time in loops with unbounded number of iterations. These loops are in several cases implemented as while loops, which are difficult to analyze. In this paper, we present an approach to guide the derivation of KPNs from embedded streaming applications dominated by multiple types of while loops. We evaluate the applicability of our approach on an eight DSP core commercial embedded platform using realistic benchmarks. Results measured on the platform showed that we are able to speedup sequential benchmarks on average by a factor up to 4.3x and in the best case up to 7.7x. Additionally, to evaluate the effectiveness of our approach, we compared it against a state-of-the-art parallelization framework.

APPLEPIES 2013 | 2014

A New Space Digital Signal Processor Design

Massimiliano Donati; Sergio Saponara; Luca Fanucci; Walter Errico; Annamaria Colonna; Giuseppe Piscopiello; Giovanni Tuccio; Franco Bigongiari; Maximilian Odendahl; Rainer Leupers; Antonio Spada; Vincenzo Pii; Elena Cordiviola; Francesco Nuzzolo; Frederic Reiter

The increasing demand of on-board real-time processing represents one of the critical issues in forthcoming scientific and commercial European space missions. Faster and faster signal and image processing algorithms are required to accomplish planetary observation, surveillance, Synthetic Aperture Radar imaging and telecommunications, especially due to the importance of elaborate the sensing data before sending them to the Earth, in order to exploit effectively the bandwidth to the ground station. The only available space-qualified Digital Signal Processor (DSP) free of International Traffic in Arms Regulations restrictions (ATMEL TSC21020) faces a poor performance of 60 MFLOPs peak, and it is becoming inadequate to fulfill the computation demand of the space missions. For this reason, the development of a new generation of space-qualified DSP is well known in the European space community. The space-qualified DSP architecture proposed in this work fills the gap between the computational requirements and the available devices. Additionally, it has been implemented using technologies available in Europe without any restriction. The DSP processor leverages a pipelined and massively parallel core based on the Very Long Instruction Word paradigm, with 64 registers and 8 operational units. The rest of the System-on-Chip architecture consists in the instruction and the data cache memories, the memory controllers and two SpaceWire interfaces. The processor, implemented in CMOS 65 nm technology, reaches an operational frequency of 120 MHz and area occupation of around 350 Kgates. The correlated Software Development Environment (SDE), with compiler, assembler, linker, debugger and instruction-level simulator, allows for an easy programming of the device in C language.

ieee aess european conference on satellite telecommunications | 2012

A next generation digital signal processor for European space missions

Maximilian Odendahl; Sergey Yakoushkin; Rainer Leupers; Walter Errico; Massimiliano Donati; Luca Fanucci

Future European space missions are in critical need of a new digital signal processor (DSP) free of any International Traffic in Arms Regulations (ITAR) restrictions. We present a new, highly parallel, 8-slot Very Long Instruction Word (VLIW) DSP modeled in LISA, a high-level architecture description language. Necessary software development tools as well as a synthesizable hardware model are generated from this abstract model automatically, improving the productivity and stability significantly. To our best knowledge, it is the first processor aimed to be used in space modeled at a higher abstraction level than RTL. The synthesis of the generated VHDL code using a 180nm CMOS standard cell library results in an area of 380 kGates and shows an aggregated peak performance of 1.0 GOPS and 750 MFLOPS, outperforming the only available other option, Atmels TSC21020F processor, by an order of magnitude.

Explore More