Is this you? Create Your Porfile

Sascha Roloff

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sascha Roloff is active.

Explore More

Publication

Featured researches published by Sascha Roloff.

software and compilers for embedded systems | 2011

Resource-aware programming and simulation of MPSoC architectures through extension of X10

Frank Hannig; Sascha Roloff; Gregor Snelting; Jürgen Teich; Andreas Zwinkau

The efficient use of future MPSoCs with 1000 or more processor cores requires new means of resource-aware programming to deal with increasing imperfections such as process variation, fault rates, aging effects, and power as well as thermal problems. In this paper, we apply a new approach called invasive computing that enables an application programmer to spread computations to processors deliberately and on purpose at certain points of the program. Such decisions can be made depending on the degree of application parallelism and the state of the underlying resources such as utilization, load, and temperature. The introduced programming constructs for resource-aware programming are embedded into the parallel computing language X10 as developed by IBM using a library-based approach. Moreover, we show how individual heterogeneous MPSoC architectures may be modeled for subsequent functional simulation by defining compute resources such as processors themselves by lightweight threads that are executed in parallel together with the application threads by the X10 run-time system. Thus, the state changes of each hardware resource may be simulated including temperature, aging, and other useful monitor functionality to provide a first high-level programming test-bed for invasive computing.

Proceedings of the 6th ACM SIGPLAN Workshop on X10 | 2016

ActorX10: an actor library for X10

Sascha Roloff; Alexander Pöppl; Tobias Schwarzer; Stefan Wildermann; Michael Bader; Michael Glaß; Frank Hannig; Jürgen Teich

The APGAS programming model is a powerful computing paradigm for multi-core and massively parallel computer architectures. It allows for the dynamic creation and distribution of thousands of threads amongst hundreds of nodes in a cluster computer within a single application. For programs of such a complexity, appropriate higher level abstractions on computation and communication are necessary for performance analysis and optimization. In this work, we present actorX10, an X10 library of a formally specified actor model based on the APGAS principles. The realized actor model explicitly exposes communication paths and decouples these from the control flow of the concurrently executed application components. Our approach provides the right abstraction for a wide range of applications. Its capabilities and advantages are introduced and demonstrated for two applications from the embedded system and HPC domain, i.e., an object detection chain and a proxy application for the simulation of tsunami events.

software and compilers for embedded systems | 2012

Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation

Sascha Roloff; Frank Hannig; Jürgen Teich

Many domain-specific MPSoCs are heterogeneous and tiled by nature. For evaluating important architectural decisions such as tile structure and core selection within each tile for future 100--1000 core designs, fast and flexible simulation approaches are mandatory. Thus, cycle-accurate simulation techniques or co-simulation approaches using simulator coupling are improper. In this paper, we evaluate heterogeneous tiled MPSoCs using a timing-approximate simulation approach. This simulation approach takes particularly into account applications with highly dynamic thread and workload distributions and resource-aware program behavior. Here, the application itself may decide which set of resources is claimed in dependence on run-time status information of the resources (e. g., temperature, load). In order to verify performance goals of the heterogeneous MPSoC apart from functional correctness, we propose a timing-approximate simulation approach, which is based on a discrete-event host-compiled simulation and a time-warping mechanism to scale the elapsed execution times on the simulation host to the simulated target. It allows the investigation of phases of thread (re-)distribution and resource-awareness with an appropriate accuracy. For selected case studies, it is shown how architectural parameters may be varied very fast enabling the exploration of different designs for cost, performance, and other design objectives.

asia and south pacific design automation conference | 2012

Approximate time functional simulation of resource-aware programming concepts for heterogeneous MPSoCs

Sascha Roloff; Frank Hannig; Jürgen Teich

The design and the programming of heterogeneous future MPSoCs including thousands of processor cores is a hard challenge. Means are necessary to program and simulate the dynamic behavior of such systems in order to dimension the hardware design and to verify the software functionality as well as performance goals. Cycle-accurate simulation of multiple parallel applications simultaneously running on different cores of the architecture would be much too slow and is not the desired level of detail. In this paper, we therefore present a novel high-level simulation approach which tackles the complexity and the heterogeneity of such systems and enables the investigation of a new computing paradigm called invasive computing. Here, the workload and its distribution are not known at compile-time but are highly dynamic and have to be adapted to the status (load, temperature, etc.) of the underlying architecture at run-time. We propose an approach for the modeling of tiled MPSoC architectures and the simulation of resource-aware programming concepts on these. This approach delivers important timing information about the parallel execution and also is taking into account the computational properties of possibly different types of cores.

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) | 2016

Language and Compilation of Parallel Programs for *-Predictable MPSoC Execution Using Invasive Computing

Jürgen Teich; Michael Glab; Sascha Roloff; Gregor Snelting; Andreas Weichslgartner; Stefan Wildermann

The predictability of execution qualities including timeliness, power consumption, and fault-tolerability is of utmost importance for the successful introduction of multi-core architectures in embedded systems requiring guarantees rather than best effort behavior. Examples are real-time and/or safety-critical parallel applications. In particular for future many-core architectures, analysis tools for proving such properties to hold for a given application irrespective of other workload either suffer from computational complexity. Or, sound bounds are of no practical interest due to severe interferences of resources and software at multiple levels. In view of abundant computational and memory resources becoming available, we propose to apply the principles of invasive computing to avoid sharing of resources at run time as much as possible. We subsequently show that statically proven quality guarantees may be enforced on many multi-core architectures by a presented hybrid mapping approach. Rather than fixed resource mappings, this approach provides only constellations of resource allocations to the run-time system that searches for such constellations and assigns the invader a suitable claim of resources, if possible. We have implemented this hybrid approach and the interface to the language InvadeX10, a library-based extension of the X10 programming language. In this extension, so-called requirements on execution qualities such as deadlines (e.g., in the form of latency constraints) may be annotated to individual programs or even program segments. These are then translated into satisfying resource constellations that need to be found at run time prior to admitting a parallel application to start, respectively continue in view of required execution quality requirements. We give a real-world case study from the domain of heterogeneous robot vision to demonstrate the capabilities of this approach to guarantee statically analyzed best and worst-case timing requirements on latency and throughput.

design automation conference | 2015

Execution-driven parallel simulation of PGAS applications on heterogeneous tiled architectures

Sascha Roloff; David Schafhauser; Frank Hannig; Jürgen Teich

We present a parallel execution-driven simulator for the efficient simulation of heterogeneous tile-based multi-core architectures. Here, the architecture is composed of several tiles connected via a network-on-chip and each tile contains local memory as well as several possibly different types of compute resources. Partitioned Global Address Space (PGAS) is a programming model matching very well the needs for programming of such modern multi-core architectures. In order to provide performance estimations for parallel software and enable architecture design space exploration, fast functional and timing simulation techniques are required. Thus, we present a simulator that meets this requirement by combining a fast direct-execution simulation approach with different parallelization strategies. Here, we propose four novel parallel discrete-event simulation techniques, which map thread-level parallelism within the applications to core-level parallelism on the target architecture and back to thread-level parallelism on the host machine. In order to achieve this, the correct synchronization and activation of the host threads is necessary being the main focus of this paper. Experiments with parallel real-world applications are used to compare the different techniques against each other and demonstrate that 10.4 times faster simulations than a sequential simulation can be achieved on a 12-core Intel Xeon processor.

Information Technology | 2016

Invasive computing for timing-predictable stream processing on MPSoCs

Stefan Wildermann; Michael Bader; Lars Bauer; Marvin Damschen; Dirk Gabriel; Michael Gerndt; Michael Glaß; Jörg Henkel; Johny Paul; Alexander Pöppl; Sascha Roloff; Tobias Schwarzer; Gregor Snelting; Walter Stechele; Jürgen Teich; Andreas Weichslgartner; Andreas Zwinkau

Abstract Multi-Processor Systems-on-a-Chip (MPSoCs) provide sufficient computing power for many applications in scientific as well as embedded applications. Unfortunately, when real-time requirements need to be guaranteed, applications suffer from the interference with other applications, uncertainty of dynamic workload and state of the hardware. Composable application/architecture design and timing analysis is therefore a must for guaranteeing real-time applications to satisfy their timing requirements independent from dynamic workload. Here, Invasive Computing is used as the key enabler for compositional timing analysis on MPSoCs, as it provides the required isolation of resources allocated to each application. On the basis of this paradigm, this work proposes a hybrid application mapping methodology that combines design-time analysis of application mappings with run-time management. Design space exploration delivers several resource reservation configurations with verified real-time guarantees for individual applications. These timing properties can then be guaranteed at run-time, as long as dynamic resource allocations comply with the offline analyzed resource configurations. This article describes our methodology and presents programming, optimization, analysis, and hardware techniques for enforcing timing predictability. A case study illustrates the timing-predictable management of real-time computer vision applications in dynamic robot system scenarios.

embedded systems for real time multimedia | 2015

Invasive computing for predictable stream processing: a simulation-based case study

Sascha Roloff; Stefan Wildermann; Frank Hannig; Jürgen Teich

Heterogeneous many-core systems enable the integration of more and more applications into a single system. Executing multiple applications in the same system inevitably leads to resource sharing, e.g., when accessing on-chip communication and memory. This poses a challenge when applications are expected to guarantee user requirements regarding timing, reliability, security, etc. In this paper, we review a design methodology that (a) allows an application designer to model a stream processing application and user requirements, and then (b) automatically generates a set of resource requirements that guarantee the fulfillment of these user requirements. Techniques from the Invasive Computing paradigm enable the program-driven dynamic reservation of resources according to these generated resource requirements.We demonstrate that this provides means for predictable execution of stream processing applications by evaluating a simulation-based case study.

software and compilers for embedded systems | 2013

NoC simulation in heterogeneous architectures for PGAS programming model

Sascha Roloff; Andreas Weichslgartner; Jan Heißwolf; Frank Hannig; Jürgen Teich

Multi- and many-core systems become more and more mainstream and therefore new communication infrastructures like Networks-on-Chip (NoC) and new programming languages like IBMs X10 with its partitioned global address space (PGAS) are introduced. In this paper we present an X10-based simulator, which is capable to simulate the network traffic that occurs inside the X10 program. This holistic approach enables to simulate the functionality and the indicated traffic together, in contrast to pure network simulators where usually only synthetic traffic or traces are used. We explain how the communication overhead is extracted from the X10 run-time and how to simulate the NoC behavior. In experiments we show that the proposed simulator is up to 10 x faster than a comparable SystemC-based simulator and at the same time preserves high accuracy. Furthermore, we present a quality and simulation speed tradeoff by using different simulation modes for a set of real world parallel applications.

embedded systems for real time multimedia | 2017

High performance network-on-chip simulation by interval-based timing predictions

Sascha Roloff; Frank Hannig; Jürgen Teich

Current multi- and many-core computer architectures heavily use Network-on-Chip (NoC communication in order to meet the increased bandwidth demands between the processors and for reasons of scalability. For the proper analysis of concurrency utilization, and workload distribution of parallel multi-media applications running on such NoC-based architectures, high-speed simulation techniques are required. Apart from accurate timing simulation of compute resources, it is of utmost importance also to accurately model the delays caused by the packet-based network communication in order to reliably verify performance numbers, or to identify any bottlenecks of the underlying architecture, or to study workload distribution techniques or routing algorithms. In this paper, we present a novel simulation approach for NoCs that allows to simulate such communication delays equally accurate but much faster in average than on a flit-by-flit basis. We propose novel algorithmic and analytical techniques that predict the transmission intervals dynamically based on the arrival of communication requests, actual congestion in the NoC, routing information, packet lengths, and other parameters. According to such predictions, the simulation time may in many cases be automatically advanced, thus reducing the number of events to process in the simulator to a large extent. The presented NoC simulation technique has been integrated into a system-level multi-core architecture simulator. Experiments in running parallel real-world and multi-media applications on a simulated scalable NoC architecture show that we are able to achieve speedups of three orders of magnitude compared to cycle-accurate NoC simulators, while preserving a timing accuracy of above 95%.

Explore More