Tobias Schwarzer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tobias Schwarzer is active.

Explore More

Publication

Featured researches published by Tobias Schwarzer.

Proceedings of the 6th ACM SIGPLAN Workshop on X10 | 2016

ActorX10: an actor library for X10

Sascha Roloff; Alexander Pöppl; Tobias Schwarzer; Stefan Wildermann; Michael Bader; Michael Glaß; Frank Hannig; Jürgen Teich

The APGAS programming model is a powerful computing paradigm for multi-core and massively parallel computer architectures. It allows for the dynamic creation and distribution of thousands of threads amongst hundreds of nodes in a cluster computer within a single application. For programs of such a complexity, appropriate higher level abstractions on computation and communication are necessary for performance analysis and optimization. In this work, we present actorX10, an X10 library of a formally specified actor model based on the APGAS principles. The realized actor model explicitly exposes communication paths and decouples these from the control flow of the concurrently executed application components. Our approach provides the right abstraction for a wide range of applications. Its capabilities and advantages are introduced and demonstrated for two applications from the embedded system and HPC domain, i.e., an object detection chain and a proxy application for the simulation of tsunami events.

software and compilers for embedded systems | 2015

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling

Tobias Schwarzer; Joachim Falk; Michael Glaß; Jürgen Teich; Christian Zebelein; Christian Haubelt

Application modeling using dynamic dataflow graphs is well-suited for multi-core platforms. However, there is often a mismatch between the fine granularity of the application and the platform. Tailoring this granularity to the platform promises performance gains by (a) reducing dynamic scheduling overhead and (b) exploiting compiler optimizations. In this paper, we propose a throughput-optimizing compilation approach that uses Quasi-Static Schedules (QSSs) to combine actors of static dataflow subgraphs. Our proposed approach combines core allocation, QSSs, and actor binding in a Design Space Exploration (DSE), optimizing the throughput for a number of available cores. During the DSE, each implementation candidate is compiled to and evaluated on the target hardware---here an Intel i7 and an ARM Cortex-A9. Experimental results including synthetic benchmarks as well as a real-world control application show that our proposed holistic compilation approach outperforms classic DSEs that are agnostic of QSS as well as a DSE that employs QSS as a post-processing step. Amongst others, we show a case where the compilation approach obtains a speedup of 9.91 x for a 4-core implementation, while a classic DSE only obtains a speedup of 2.12 x.

Microprocessors and Microsystems | 2015

Automatic communication-driven virtual prototyping and design for networked embedded systems

Joachim Falk; Tobias Schwarzer; Liyuan Zhang; Michael Glaß; Jürgen Teich

This work presents a communication-driven virtual prototyping approach integrated in an existing ESL design methodology to automatically synthesize, evaluate, and optimize a data-flow application for mixed hardware/software and even networked MPSoCs. While existing synthesis tools are suitable for individual subsystems (e.g., software tasks for CPUs, hardware accelerators), the problem of establishing the communication between different subsystems that may even be simulated at different levels of abstraction is still challenging. As a remedy, we introduce the concept of bridge components in our architecture model that, during virtual prototyping, serve as integrators between subsystems that may have different communication protocols and be simulated at different levels of abstraction (e.g., TLM, behavioral level, RTL). We propose to consider bridges throughout the complete ESL design flow: Already during Design Space Exploration (DSE), the characteristics of bridge components such as implementation cost and additional latency on the application can be taken into account. Moreover, we extend the exploration model of the DSE to include required communication-related design decisions, i.e., the mapping of binary code for software tasks and the selection of different synchronization patterns for the communication. For virtual prototyping of implementation candidates derived by the DSE, the bridge components enable to automatically disassemble the system into subsystems and hand each subsystem over to an individual synthesis tool. When integrating the subsystems together, our methodology also synthesizes the interfaces for all bridges which significantly simplifies system integration. As a proof of concept, we present (I) a distributed control application that is transformed into a virtual prototype consisting of six subsystems and (II) a data-flow application from the video processing domain transformed into a virtual prototype consisting of three subsystems. The resulting subsystems can be concurrently simulated at TLM, behavioral level, and RTL. The experiments give evidence of the proposed techniques applicability, the achieved productivity gain, and the resulting simulation performance at the considered levels of abstraction.

Information Technology | 2016

Invasive computing for timing-predictable stream processing on MPSoCs

Stefan Wildermann; Michael Bader; Lars Bauer; Marvin Damschen; Dirk Gabriel; Michael Gerndt; Michael Glaß; Jörg Henkel; Johny Paul; Alexander Pöppl; Sascha Roloff; Tobias Schwarzer; Gregor Snelting; Walter Stechele; Jürgen Teich; Andreas Weichslgartner; Andreas Zwinkau

Abstract Multi-Processor Systems-on-a-Chip (MPSoCs) provide sufficient computing power for many applications in scientific as well as embedded applications. Unfortunately, when real-time requirements need to be guaranteed, applications suffer from the interference with other applications, uncertainty of dynamic workload and state of the hardware. Composable application/architecture design and timing analysis is therefore a must for guaranteeing real-time applications to satisfy their timing requirements independent from dynamic workload. Here, Invasive Computing is used as the key enabler for compositional timing analysis on MPSoCs, as it provides the required isolation of resources allocated to each application. On the basis of this paradigm, this work proposes a hybrid application mapping methodology that combines design-time analysis of application mappings with run-time management. Design space exploration delivers several resource reservation configurations with verified real-time guarantees for individual applications. These timing properties can then be guaranteed at run-time, as long as dynamic resource allocations comply with the offline analyzed resource configurations. This article describes our methodology and presents programming, optimization, analysis, and hardware techniques for enforcing timing predictability. A case study illustrates the timing-predictable management of real-time computer vision applications in dynamic robot system scenarios.

Proceedings of the Second Internationsl Workshop on Extreme Scale Programming Models and Middleware | 2016

SWE-X10: simulating shallow water waves with lazy activation of patches using actorX10

Alexander Pöppl; Michael Bader; Tobias Schwarzer; Michael Glab

We present an efficient Finite Volume solver for the shallow water equations using an actor extension of the X10 programming language, ActorX10, as programming model. Each actor is assigned to a Cartesian patch of the computational grid. Using the actors finite state machine to control patch updates, we realize lazy activation of patches, only when a propagating wave enters the respective patch. Overlapping of communication and computation in the fully non-central actor-based control, as well as careful optimization (esp. vectorization) of kernels leads to high performance and parallel efficiency in shared and distributed memory. Benefits of lazy activation are demonstrated via reduced CPU hours for a benchmark scenario.

embedded systems for real time multimedia | 2015

Quasi-static scheduling of data flow graphs in the presence of limited channel capacities

Joachim Falk; Tobias Schwarzer; Michael Glab; Jürgen Teich; Christian Zebelein; Christian Haubelt

Signal processing algorithms as can be found in multimedia applications are often modeled by dynamic Data Flow Graphs (DFGs), especially when targeting heterogeneous multicore platforms. However, there is often a mismatch between the fine granularity of the application and the coarse granularity of the platform. Tailoring the granularity of the DFG to a given platform by employing Quasi-Static Schedules (QSSs) promises performance gains by reducing dynamic scheduling overhead and enabling optimizations targeting groups of actors instead of individual actors in isolation. Unfortunately, all approaches known from literature to compute QSSs implicitly assume DFGs with unbounded First In First Out (FIFO) channels. In contrast, mappings of DFGs to multi-core platforms must adhere to FIFO channels with limited capacities. In this paper, we present a novel FIFO channel capacity adjustment algorithm that enables QSSs to DFGs with limited channel capacities, thus, extending the scope of QSS refinements to general multi-core targets.

design, automation, and test in europe | 2014

Model-based actor multiplexing with application to complex communication protocols

Christian Zebelein; Christian Haubelt; Joachim Falk; Tobias Schwarzer; Jürgen Teich

We propose a dynamic scheduling approach for the concurrent execution of logical actor instances on a single synthesized actor instance. Based on a formal dataflow model of computation, the proposed approach can be applied to a wide range of applications in a model-based design flow. As case-study, we evaluate a bus-cycle-accurate SystemC RTL model based on an InfiniBand network adapter in a PCI Express system.

design automation conference | 2018

Architecture decomposition in system synthesis of heterogeneous many-core systems

Valentina Richthammer; Tobias Schwarzer; Stefan Wildermann; Jürgen Teich; Michael Glaß

Determining feasible application mappings for Design Space Exploration (DSE) and run-time embedding is a challenge for modern many-core systems. The underlying NP-complete system-synthesis problem faces tremendously complex problem instances due to the hundreds of heterogeneous processing elements, their communication infrastructure, and the resulting number of mapping possibilities. Thus, we propose to employ a search-space splitting (SSS) technique using architecture decomposition to increase the performance of existing design-time and run-time synthesis approaches. The technique first restricts the search for application embeddings to selected sub-architectures at substantially reduced complexity; therefore, the complete architecture needs to be searched only in case no embedding is found on any sub-system. Furthermore, we introduce a basic learning mechanism to detect promising sub-architectures and subsequently restrict the search to those. We exemplify the SSS for a SAT-based and a problem-specific backtracking-based system synthesis as part of DSE for NoC-based many-core systems. Experimental results show drastically reduced execution times (≈ 15–50 × on a 24×24 architecture) and an enhanced quality of the embedding, since less mappings (≈ 20–40 ×, compared to the non-decomposing procedures) need to be discarded due to a timeout.

digital systems design | 2014

Communication-Driven Automatic Virtual Prototyping for Networked Embedded Systems

Liyuan Zhang; Joachim Falk; Tobias Schwarzer; Michael Glass; Jürgen Teich

Today, parts of an ESL model can be automatically synthesized to a low-level implementation, e. g., via high-level synthesis. However, to build a complete working virtual prototype directly from a given ESL model, one still has to perform several design steps manually. The work-at-hand tackles this problem by introducing bridge components already in the ESL model. These components influence Design Space Exploration (DSE) by adding their characteristics like cost and latency into evaluation. The complete system is divided into several subsystems connected through bridges, we call this process communication-driven decomposition. Once an optimized implementation solution is found by DSE and selected by the designer, every subsystem of this ESL model is handed over to the individual synthesis tool. Here, if two subsystems will be synthesized by different tools, the bridge connecting these two subsystems will be automatically duplicated into two instances and assigned to each subsystem. Then, synthesis tools generate code for each subsystem (including the bridge inside each subsystem). In the last step, the system integration process merges the corresponding bridge pairs together to build a complete virtual prototype. To automate the proposed design flow, we have developed a framework that automatically divides an ESL model into subsystems and synthesizes the interfaces for all bridges which strongly simplifies system integration. The designer is therefore free from the interface realization. Hence, the overall design development cycle is shortened. As a proof of concept, a distributed control application is presented to give evidence of the proposed techniques applicability and the achieved productivity gain.

forum on specification and design languages | 2013