Claus Braun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Claus Braun is active.

Explore More

Publication

Featured researches published by Claus Braun.

international test conference | 2013

Module diversification: Fault tolerance and aging mitigation for runtime reconfigurable architectures

Hongyan Zhang; Lars Bauer; Michael A. Kochte; Eric Schneider; Claus Braun; Michael E. Imhof; Hans-Joachim Wunderlich; Jörg Henkel

Runtime reconfigurable architectures based on Field-Programmable Gate Arrays (FPGAs) are attractive for realizing complex applications. However, being manufactured in latest semiconductor process technologies, FPGAs are increasingly prone to aging effects, which reduce the reliability of such systems and must be tackled by aging mitigation and application of fault tolerance techniques. This paper presents module diversification, a novel design method that creates different configurations for runtime reconfigurable modules. Our method provides fault tolerance by creating the minimal number of configurations such that for any faulty Configurable Logic Block (CLB) there is at least one configuration that does not use that CLB. Additionally, we determine the fraction of time that each configuration should be used to balance the stress and to mitigate the aging process in FPGA-based runtime reconfigurable systems. The generated configurations significantly improve reliability by fault-tolerance and aging mitigation.

IEEE Transactions on Computers | 2013

Test Strategies for Reliable Runtime Reconfigurable Architectures

Lars Bauer; Claus Braun; Michael E. Imhof; Michael A. Kochte; Eric Schneider; Hongyan Zhang; Jörg Henkel; Hans-Joachim Wunderlich

Field-programmable gate array (FPGA)-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. The reliability of FPGAs, being manufactured in latest technologies, is threatened by soft errors, as well as aging effects and latent defects. To ensure reliable reconfiguration, it is mandatory to guarantee the correct operation of the reconfigurable fabric. This can be achieved by periodic or on-demand online testing. This paper presents a reliable system architecture for runtime-reconfigurable systems, which integrates two nonconcurrent online test strategies: preconfiguration online tests (PRET) and postconfiguration online tests (PORT). The PRET checks that the reconfigurable hardware is free of faults by periodic or on-demand tests. The PORT has two objectives: It tests reconfigured hardware units after reconfiguration to check that the configuration process completed correctly and it validates the expected functionality. During operation, PORT is used to periodically check the reconfigured hardware units for malfunctions in the programmable logic. Altogether, this paper presents PRET, PORT, and the system integration of such test schemes into a runtime-reconfigurable system, including the resource management and test scheduling. Experimental results show that the integration of online testing in reconfigurable systems incurs only minimum impact on performance while delivering high fault coverage and low test latency.

international on-line testing symposium | 2012

Transparent structural online test for reconfigurable systems

Mohamed S. Abdelfattah; Lars Bauer; Claus Braun; Michael E. Imhof; Michael A. Kochte; Hongyan Zhang; Jörg Henkel; Hans-Joachim Wunderlich

FPGA-based reconfigurable systems allow the online adaptation to dynamically changing runtime requirements. However, the reliability of modern FPGAs is threatened by latent defects and aging effects. Hence, it is mandatory to ensure the reliable operation of the FPGAs reconfigurable fabric. This can be achieved by periodic or on-demand online testing. In this paper, a system-integrated, transparent structural online test method for runtime reconfigurable systems is proposed. The required tests are scheduled like functional workloads, and thorough optimizations of the test overhead reduce the performance impact. The proposed scheme has been implemented on a reconfigurable system. The results demonstrate that thorough testing of the reconfigurable fabric can be achieved at negligible performance impact on the application.

international on-line testing symposium | 2013

Efficacy and efficiency of algorithm-based fault-tolerance on GPUs

Hans-Joachim Wunderlich; Claus Braun; Sebastian Halder

Computer simulations drive innovations in science and industry, and they are gaining more and more importance. However, their high computational demand generates extraordinary challenges for computing systems. Typical high-performance computing systems, which provide sufficient performance and high reliability, are extremely expensive. Modern GPUs offer high performance at very low costs, and they enable simulation applications on the desktop. However, they are increasingly prone to transient effects and other reliability threats. To fulfill the strict reliability requirements in scientific computing and simulation technology, appropriate fault tolerance measures have to be integrated into simulation applications for GPUs. Algorithm-Based Fault Tolerance on GPUs has the potential to meet these requirements. In this work we investigate the efficiency and the efficacy of ABFT for matrix operations on GPUs. We compare ABFT against fault tolerance schemes that are based on redundant computations and we evaluate its error detection capabilities.

dependable systems and networks | 2014

A-ABFT: Autonomous Algorithm-Based Fault Tolerance for Matrix Multiplications on Graphics Processing Units

Claus Braun; Sebastian Halder; Hans-Joachim Wunderlich

Graphics processing units (GPUs) enable large-scale scientific applications and simulations on the desktop. To allow scientific computing on GPUs with high performance and reliability requirements, the application of software-based fault tolerance is attractive. Algorithm-Based Fault Tolerance (ABFT) protects important scientific operations like matrix multiplications. However, the application to floating-point operations necessitates the runtime classification of errors into inevitable rounding errors, allowed compute errors in the magnitude of such rounding errors, and into critical errors that are larger than those and not tolerable. Hence, an ABFT scheme needs suitable rounding error bounds to detect errors reliably. The determination of such error bounds is a highly challenging task, especially since it has to be integrated tightly into the algorithm and executed autonomously with low performance overhead. In this work, A-ABFT for matrix multiplications on GPUs is introduced, which is a new, parallel ABFT scheme that determines rounding error bounds autonomously at runtime with low performance overhead and high error coverage.

international conference on computer design | 2012

Acceleration of Monte-Carlo molecular simulations on hybrid computing architectures

Claus Braun; Stefan Holst; Hans-Joachim Wunderlich; Juan Manuel Castillo; Joachim Gross

Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87× despite the additional communication overheads.

defect and fault tolerance in vlsi and nanotechnology systems | 2015

Low-overhead fault-tolerance for the preconditioned conjugate gradient solver

Alexander Scholl; Claus Braun; Michael A. Kochte; Hans-Joachim Wunderlich

Linear system solvers are an integral part for many different compute-intensive applications and they benefit from the compute power of heterogeneous computer architectures. However, the growing spectrum of reliability threats for such nano-scaled CMOS devices makes the integration of fault tolerance mandatory. The preconditioned conjugate gradient (PCG) method is one widely used solver as it finds solutions typically faster compared to direct methods. Although this iterative approach is able to tolerate certain errors, latest research shows that the PCG solver is still vulnerable to transient effects. Even single errors, for instance, caused by marginal hardware, harsh environments, or particle radiation, can considerably affect execution times, or lead to silent data corruption. In this work, a novel fault-tolerant PCG solver with extremely low runtime overhead is proposed. Since the error detection method does not involve expensive operations, it scales very well with increasing problem sizes. In case of errors, the method selects between three different correction methods according to the identified error. Experimental results show a runtime overhead for error detection ranging only from 0.04% to 1.70%.

international on-line testing symposium | 2015

Efficient on-line fault-tolerance for the preconditioned conjugate gradient method

Alexander Scholl; Claus Braun; Michael A. Kochte; Hans-Joachim Wunderlich

Linear system solvers are key components of many scientific applications and they can benefit significantly from modern heterogeneous computer architectures. However, such nano-scaled CMOS devices face an increasing number of reliability threats, which make the integration of fault tolerance mandatory. The preconditioned conjugate gradient method (PCG) is a very popular solver since it typically finds solutions faster than direct methods, and it is less vulnerable to transient effects. However, as latest research shows, the vulnerability is still considerable. Even single errors caused, for instance, by marginal hardware, harsh operating conditions or particle radiation can increase execution times considerably or corrupt solutions without indication. In this work, a novel and highly efficient fault-tolerant PCG method is presented. The method applies only two inner products to reliably detect errors. In case of errors, the method automatically selects between roll-back and efficient on-line correction. This significantly reduces the error detection overhead and expensive re-computations.

european test symposium | 2010

Algorithm-based fault tolerance for many-core architectures

Claus Braun; Hans-Joachim Wunderlich

Modern many-core architectures with hundreds of cores provide a high computational potential. This makes them particularly interesting for scientific high-performance computing and simulation technology. Like all nano scaled semiconductor devices, many-core processors are prone to reliability harming factors like variations and soft errors. One way to improve the reliability of such systems is software-based hardware fault tolerance. Here, the software is able to detect and correct errors introduced by the hardware. In this work, we propose a software-based approach to improve the reliability of matrix operations on many-core processors. These operations are key components in many scientific applications.

dependable systems and networks | 2016

Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations

Alexander Scholl; Claus Braun; Michael A. Kochte; Hans-Joachim Wunderlich

We propose a fault tolerance approach for sparse matrix operations that detects and implicitly locates errors in the results for efficient local correction. This approach reduces the runtime overhead for fault tolerance and provides high error coverage. Existing algorithm-based fault tolerance approaches for sparse matrix operations detect and correct errors, but they often rely on expensive error localization steps. General checkpointing schemes can induce large recovery cost for high error rates. For sparse matrix-vector multiplications, experimental results show an average reduction in runtime overhead of 43.8%, while the error coverage is on average improved by 52.2% compared to related work. The practical applicability is demonstrated in a case study using the iterative Preconditioned Conjugate Gradient solver. When scaling the error rate by four orders of magnitude, the average runtime overhead increases only by 31.3% compared to low error rates.

Explore More