Oliver Arnold
Dresden University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oliver Arnold.
international solid-state circuits conference | 2014
Benedikt Noethen; Oliver Arnold; Esther P. Adeva; Tobias Seifert; Erik Fischer; Steffen Kunze; Emil Matus; Gerhard P. Fettweis; Holger Eisenreich; Georg Ellguth; Stephan Hartmann; Sebastian Höppner; Stefan Schiefer; Jens-Uwe Schlüßler; Stefan Scholze; Dennis Walter; René Schüffny
Modern mobile communication systems face conflicting design constraints. On the one hand, the expanding variety of transmission modes calls for highly flexible solutions supporting the ever-growing number and diversity of application requirements. On the other hand, stringent power restrictions (e.g., at femto base stations and terminals) must be considered, while satisfying the demanding performance requirements. In order to cope with these issues, existing SDR platforms, e.g. [1-2], propose an MPSoC with a heterogeneous array of processing elements (PEs). MPSoC solutions provide programmability and parallelism yielding flexibility, processing performance and power efficiency. To schedule the resources and to apply power gating, a static approach is employed. In contrast, we present a heterogeneous MPSoC platform (Tomahawk2) with runtime scheduling and fine-grained hierarchical power management. This solution can fully adapt to the dynamically varying workload and semi-deterministic behavior in modern concurrent wireless applications. The proposed dynamic scheduler (CoreManager, CM) can be implemented either in software on a general-purpose processor or on a dedicated application-specific hardware unit. It is evident that the software approach offers the highest degree of flexibility; however, it may become a performance-bottleneck for complex applications. A high-throughput ASIC was presented in [3], but this solution does not permit scheduling algorithms to be adjusted. In this work, these limitations are overcome by implementing the CM on an ASIP.
ACM Transactions in Embedded Computing Systems | 2014
Oliver Arnold; Emil Matus; Benedikt Noethen; Markus Winter; Torsten Limberg; Gerhard P. Fettweis
Heterogeneity and parallelism in MPSoCs for 4G (and beyond) communications signal processing are inevitable in order to meet stringent power constraints and performance requirements. The question arises on how to cope with the problem of system programmability and runtime management incurred by the statically or even dynamically varying number and type of processing elements. This work addresses this challenge by proposing the concept of a heterogeneous many-core platform called Tomahawk. Apart from the definition of the system architecture, in this approach a unified framework including a model of computation, a programming interface and a dedicated runtime management unit called CoreManager is proposed. The increase of system complexity in terms of application parallelism and number of resources may lead to a dramatic increase of the management costs, hence causing performance degradation. For this reason, the efficient implementation of the CoreManager becomes a major issue in system design. This work compares the performance and capabilities of various CoreManager HW/SW solutions, based on ASIC, RISC and ASIP paradigms. The results demonstrate that the proposed ASIP-based solution approaches the performance of the ASIC realization, while preserving the full flexibility of the software (RISC-based) implementation.Heterogeneity and parallelism in MPSoCs for 4G (and beyond) communications signal processing are inevitable in order to meet stringent power constraints and performance requirements. The question arises on how to cope with the problem of system programmability and runtime management incurred by the statically or even dynamically varying number and type of processing elements. This work addresses this challenge by proposing the concept of a heterogeneous many-core platform called Tomahawk. Apart from the definition of the system architecture, in this approach a unified framework including a model of computation, a programming interface and a dedicated runtime management unit called CoreManager is proposed. The increase of system complexity in terms of application parallelism and number of resources may lead to a dramatic increase of the management costs, hence causing performance degradation. For this reason, the efficient implementation of the CoreManager becomes a major issue in system design. This work compares the performance and capabilities of various CoreManager HW/SW solutions, based on ASIC, RISC and ASIP paradigms. The results demonstrate that the proposed ASIP-based solution approaches the performance of the ASIC realization, while preserving the full flexibility of the software (RISC-based) implementation.
international conference on embedded computer systems: architectures, modeling, and simulation | 2011
Oliver Arnold; Gerhard P. Fettweis
This paper analyzes the impact of dynamic task scheduling, processing element allocation and data transfer management on system performance of heterogeneous MPSoCs. Therefore, all parts of a runtime scheduling unit are analyzed. Bottlenecks are identified and their complexity evaluated. Furthermore, traced information is processed with two newly introduced tools. The first one generates an annotated SDF3 file. The second one creates a static schedule which applies the same scheduling and allocation decision as the dynamic scheduler. The execution of the static schedule reduces the burden of task management to a minimum. The resulting static execution is compared with the execution of the dynamic schedule. Hence, runtime overhead of dynamic scheduling is unveiled.
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Oliver Arnold; Gerhard P. Fettweis
A new heterogeneous multiprocessor system with dynamic memory and power management for improved performance and power consumption is presented. Increased data locality is automatically revealed leading to enhanced memory access capabilities. Several applications can run in parallel sharing processing elements, memories as well as the interconnection network. Real time constraints are regarded by prioritization of processing element allocation, scheduling and data transfers. Scheduling and allocation is done dynamically according to runtime data dependency checking. We are able to show that execution times, bandwidth demands and power consumption are decreased. A tool flow is introduced for an easy generation of the hardware platform and software binaries for cycle accurate simulations. Further newly developed tools are available for power analysis, data transfer observation and task execution visualization.
ieee computer society annual symposium on vlsi | 2012
Oliver Arnold; Benedikt Noethen; Gerhard P. Fettweis
In this paper a heterogeneous Multiprocessor System on-Chip (MPSoC) is controlled by a dynamic task scheduling unit called Core Manager. The instruction set architecture of this unit is extended to improve performance for dynamic data dependency checking, task scheduling, processing element (PE) allocation and data transfer management. In order to analyze and compare different implementations and trade-offs a tool flow was developed. Area and timing results are provided as well. A significant performance improvement can be shown for all parts of the Core Manager.
international conference on management of data | 2014
Oliver Arnold; Sebastian Haas; Gerhard P. Fettweis; Benjamin Schlegel; Thomas Kissinger; Wolfgang Lehner
The key task of database systems is to efficiently manage large amounts of data. A high query throughput and a low query latency are essential for the success of a database system. Lately, research focused on exploiting hardware features like superscalar execution units, SIMD, or multiple cores to speed up processing. Apart from these software optimizations for given hardware, even tailor-made processing circuits running on FPGAs are built to run mostly stateless query plans with incredibly high throughput. A similar idea, which was already considered three decades ago, is to build tailor-made hardware like a database processor. Despite their superior performance, such application-specific processors were not considered to be beneficial because general-purpose processors eventually always caught up so that the high development costs did not pay off. In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors. We illustrate exemplarily how to create an instruction set extension for set-oriented database primitives. The resulting application-specific processor provides not only a high performance but it also enables very energy-efficient processing. Our processor requires in various configurations more than 960x less energy than a high-end x86 processor while providing the same performance.
2011 Semiconductor Conference Dresden | 2011
Oliver Arnold; Gerhard P. Fettweis
This paper introduces a failure aware dynamic task scheduling approach for unreliable heterogeneous MPSoCs. Global and local errors are sporadically injected in the system. Two dynamic task scheduling modes are newly introduced to compensate these errors, one for each error injection method. Error free processing elements are favored, faulty ones are isolated. In case of an error the erroneous task is detected and dynamically compensated to guarantee an error free execution. Different applications are used to prove the feasibility of this approach. The failure aware dynamic task scheduling approach assures an error free execution of all applications.
design, automation, and test in europe | 2009
Bastian Ristau; Torsten Limberg; Oliver Arnold; Gerhard P. Fettweis
In embedded computing we face a continuously growing algorithm complexity combined with a constantly rising number of applications running on a single system. Multi-core systems are becoming popular to cope with these requirements. Growing computational complexity is handled by increasing the number of cores and core types within one system - leading to heterogeneous many-core MPSoCs in the near future. One key challenge in designing such systems is to determine the number of cores required to meet performance, power and area constraints. In this paper we present a methodology that helps dimensioning these systems via a novel parallelism analysis methodology within seconds. The presented methodology has an average performance estimation error of less than 4% compared to transaction level simulation.
design automation conference | 2016
Sebastian Haas; Oliver Arnold; Benedikt Nöthen; Stefan Scholze; Georg Ellguth; Andreas Dixius; Sebastian Höppner; Stefan Schiefer; Stephan Hartmann; Stephan Henker; Thomas Hocker; Jörg Schreiter; Holger Eisenreich; Jens-Uwe Schlüßler; Dennis Walter; Tobias Seifert; Friedrich Pauls; Mattis Hasler; Yong Chen; Hermann Hensel; Sadia Moriam; Emil Matus; Christian Mayr; René Schüffny; Gerhard P. Fettweis
This paper presents a heterogeneous database hardware accelerator MPSoC manufactured in 28 nm SLP CMOS. The 18 mm2 chip integrates a runtime task scheduling unit for energy-efficient query processing and hierarchical power management supported by an ultra-fast dynamic voltage and frequency scaling. Four processing elements, connected by a star-mesh network-on-chip, are accelerated by an instruction set extension tailored to fundamental dataintensive applications. We evaluate the MPSoC with typical database benchmarks focusing on scans and bitmap operations. When the processing elements operate on data stored in local memories, the chip consumes 250 mW and shows a 96x energy efficiency improvement compared to state-of-the-art platforms.
system on chip conference | 2014
Oliver Arnold; Benedikt Noethen; Gerhard P. Fettweis
In this paper a dynamic task scheduling unit for many-core systems with over 1000 cores is introduced. It is called CoreManager. It dynamically schedules thousands of tasks of several applications, allocates processing elements, controls the prefetching of data transfers, and explicitly manages the on-chip memories. For many-core systems a high task throughput and a low latency are essential for its success. Therefore, the CoreManager integrates a newly developed application-specific instruction set, called CM_ISA++, for a superior scheduling performance. The design of the instruction set of the CoreManager is presented, explained and the performance of each component is analyzed. Furthermore, the CoreManager is integrated and evaluated in a many-core system with 1008 processing elements. Our CoreManager implementation outperforms a RISC-based implementation by 193x in scheduling and 419x in processing element allocation performance. Consequently, scalability of the system as well as task throughput and latency is dramatically improved compared to RISC-based scheduling approaches.