Sebastian Weis
University of Augsburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sebastian Weis.
Microprocessors and Microsystems | 2014
Roberto Giorgi; Rosa M. Badia; François Bodin; Albert Cohen; Paraskevas Evripidou; Paolo Faraboschi; Bernhard Fechner; Guang R. Gao; Arne Garbade; Rahulkumar Gayatri; Sylvain Girbal; Daniel Goodman; Behram Khan; Souad Koliai; Joshua Landwehr; Nhat Minh Lê; Feng Li; Mikel Luján; Avi Mendelson; Laurent Morin; Nacho Navarro; Tomasz Patejko; Antoniu Pop; Pedro Trancoso; Theo Ungerer; Ian Watson; Sebastian Weis; Stéphane Zuckerman; Mateo Valero
The improvements in semiconductor technologies are gradually enabling extreme-scale systems such as teradevices (i.e., chips composed by 1000 billion of transistors), most likely by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future and Emerging Technology (FET) large-scale project funded by the European Union, which addresses such challenges at once by leveraging the dataflow principles. This paper presents an overview of the research carried out by the TERAFLUX partners and some preliminary results. Our platform comprises 1000+ general purpose cores per chip in order to properly explore the above challenges. An architectural template has been proposed and applications have been ported to the platform. Programming models, compilation tools, and reliability techniques have been developed. The evaluation is carried out by leveraging on modifications of the HP-Labs COTSon simulator.
digital systems design | 2013
Marco Solinas; Rosa M. Badia; François Bodin; Albert Cohen; Paraskevas Evripidou; Paolo Faraboschi; Bernhard Fechner; Guang R. Gao; Arne Garbade; Sylvain Girbal; Daniel Goodman; Behran Khan; Souad Koliai; Feng Li; Mikel Luján; Laurent Morin; Avi Mendelson; Nacho Navarro; Antoniu Pop; Pedro Trancoso; Theo Ungerer; Mateo Valero; Sebastian Weis; Ian Watson; Stéphane Zuckermann; Roberto Giorgi
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future and Emerging Technology (FET) large-scale project funded by the European Union, which addresses such challenges at once by leveraging the dataflow principles. This paper describes the project and provides an overview of the research carried out by the TERAFLUX consortium.
International Journal of Parallel Programming | 2016
Sebastian Weis; Arne Garbade; Bernhard Fechner; Avi Mendelson; Roberto Giorgi; Theo Ungerer
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores on a single die, requests new execution paradigms. Coarse-grained dataflow execution models are able to exploit such parallelism, since they combine side-effect free execution and reduced synchronization overhead. However, the terascale transistor integration of such future chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means dynamic fault-tolerance mechanisms have to be an essential part of such future system. In this paper, we present a fault tolerant architecture for a coarse-grained dataflow system, leveraging the inherent features of the dataflow execution model. In detail, we provide methods to dynamically detect and manage permanent, intermittent, and transient faults during runtime. Furthermore, we exploit the dataflow execution model for a thread-level recovery scheme. Our results showed that redundant execution of dataflow threads can efficiently make use of underutilized resources in a multi-core, while the overhead in a fully utilized system stays reasonable. Moreover, thread-level recovery suffered from moderate overhead, even in the case of high fault rates.
2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing | 2011
Sebastian Weis; Arne Garbade; Julian Wolf; Bernhard Fechner; Avi Mendelson; Roberto Giorgi; Theo Ungerer
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. To exploit this parallelism, threaded dataflow execution models are promising, since they provide side-effect free execution and reduced synchronization overhead. But the terascale transistor integration of such chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means reliability techniques have to be an essential part of such future systems, too.In this paper, we conceptualize a fault tolerant architecture for a scalable threadeddataflow system. We provide methods to detect permanent, intermittent, and transientfaults during the execution. Furthermore, we propose a recovery technique for dataflow threads.
parallel, distributed and network-based processing | 2013
Arne Garbade; Sebastian Weis; Sebastian Schlingmann; Bernhard Fechner; Theo Ungerer
Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have been developed to mitigate the influence of faults to the system. In this paper, we investigate the message overhead for fault detection monitoring with decentralized Fault Detection Units in a unified 2D-mesh and assess the resulting delays of application messages. We investigate routing algorithms for different message types and demonstrate 19% reduction of the impact of fault detection messages on application messages. We also show the limitations of prioritized fault detection messages for different application message packet injection rates.
parallel, distributed and network-based processing | 2011
Sebastian Schlingmann; Arne Garbade; Sebastian Weis; Theo Ungerer
Future many-core chips are envisioned to feature up to a thousand cores on a chip. With an increasing number of cores on a chip the problem of distributing load gets more prevalent. Even if a piece of software is designed to exploit parallelism it is not an easy to place parallel tasks on the cores to achieve maximum performance. This paper proposes the connectivity-sensitive algorithm for static task-placement onto a 2D mesh of interconnected cores. The decreased feature sizes of future VLSI chips will increase the number of permanent and transient faults. To accommodate partially faulty hardware the algorithm is designed to allow placement on irregular core structures, in particular, meshes with faulty nodes and links. The quality of the placement is measured by comparing the results to two baseline algorithms in terms of communication efficiency.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013
Arne Garbade; Sebastian Weis; Sebastian Schlingmann; Bernhard Fechner; Theo Ungerer
This paper presents a novel fault localization approach for NoCs by leveraging so called timed heartbeat messages. While these messages are periodically sent to report health states of processor cores to a fault detection unit, information about the network health state (topology) can be extracted from their timing behavior. We show how this health state information can be easily extracted from the message arrival times and give an estimation of the expected costs for this technique.
defect and fault tolerance in vlsi and nanotechnology systems | 2014
Florian Haas; Sebastian Weis; Stefan Metzlaff; Theo Ungerer
Safety-critical systems demand increasing computational power, which requests high-performance embedded systems. While commercial-of-the-shelf (COTS) processors offer high computational performance for a low price, they do not provide hardware support for fault-tolerant execution. However, pure software-based fault-tolerance methods entail high design complexity and runtime overhead. In this paper, we present an efficient software/hardware-based redundant execution scheme for a COTS ×86 processor, which exploits the Transactional Synchronization Extensions (TSX) introduced with the Intel Haswell microarchitecture. Our approach extends a static binary instrumentation tool to insert fault-tolerant transactions and fault-detection instructions at function granularity. TSX hardware support is used for error containment and recovery. The average runtime overhead for selected SPEC2006 benchmarks was only 49% compared to a non-fault-tolerant execution.
international conference on high performance computing and simulation | 2013
Bernhard Fechner; Arne Garbade; Sebastian Weis; Theo Ungerer
The enormous growth in integration density enables to build processors with more and more cores on a single die, but also makes them orders of magnitude more vulnerable to faults due to voltage fluctuation, radiation, and process variations [4] etc. Since this trend will continue in the future, fault-tolerance mechanisms must be an essential part of such future systems if the computations are to be carried out on a reliable basis. Already, chip manufacturers have taken measures to handle faults in current multi-core processors such as error correcting codes for busses, caches etc. With a huge number of cores, common strategies like dual modular and triple modular redundant processing [5] along with massive parallel computing are possible. Threaded dataflow execution models are one way to exploit the parallelism of future 1000 core systems. Current GPU architectures reflect that [3]. The side-effect free execution of threads within the dataflow execution model can not only be used to provide massive parallel computational capacity, but also enables simple and efficient rollback mechanisms [16]. In this paper, we describe fault detection and tolerance mechanisms investigated within the TERAFLUX EC project [17], which offers a solution to exploit the massive parallelism offered by dataflow architectures at all abstraction levels.
real-time networks and systems | 2013
Stefan Metzlaff; Sebastian Weis; Theo Ungerer
In this paper, we utilise transactional memory (TM) to limit interferences of concurrent hard real-time (HRT) and best-effort (BE) tasks in a shared memory multi-core. We first propose a way to calculate the worst-case execution time (WCET) bound of HRT transactions when the set of concurrent transactions is known. In the next step we enhance our TM contention manager to prioritise transactions depending on their real-time requirements. With our approach it is possible to bound the interferences of any BE transaction and thus ensure a predictable execution of concurrently running HRT transactions. Our evaluation shows that the impact of BE tasks on the WCET bound of HRT tasks is minimal, while allowing them to share data.