Christoph Borchert
Technical University of Dortmund
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christoph Borchert.
dependable systems and networks | 2013
Christoph Borchert; Horst Schirmeier; Olaf Spinczyk
Recent studies indicate that the number of system failures caused by main memory errors is much higher than expected. In contrast to the commonly used hardware-based countermeasures, for example using ECC memory, software-based fault-tolerance measures are much more flexible and can exploit application knowledge, such as the criticality of specific data structures. This paper presents a software-based memory error protection approach, which we used to harden the eCos operating system in a case study. The main benefits of our approach are the flexibility to choose from an extensible toolbox of easily pluggable error detection and correction schemes as well as its very low runtime overhead, which totals in a range of 0.09-1.7 %. The implementation is based on aspect-oriented programming and exploits the object-oriented program structure of eCos to identify well-suited code locations for the insertion of generative fault-tolerance measures.
dependable systems and networks | 2015
Horst Schirmeier; Christoph Borchert; Olaf Spinczyk
Since the first identification of physical causes for soft errors in memory circuits, fault injection (FI) has grown into a standard methodology to assess the fault resilience of computer systems. A variety of FI techniques trying to mimic these physical causes has been developed to measure and compare program susceptibility to soft errors. In this paper, we analyze the process of evaluating programs, which are hardened by software-based hardware fault-tolerance mechanisms, under a uniformly distributed soft-error model. We identify three pitfalls in FI result interpretation widespread in the literature, even published in renowned conference proceedings. Using a simple machine model and transient single-bit faults in memory, we find counterexamples that reveal the unfitness of common practices in the field, and substantiate our findings with real-world examples. In particular, we demonstrate that the fault coverage metric must be abolished for comparing programs. Instead, we propose to use extrapolated absolute failure counts as a valid comparison metric.
international symposium on object/component/service-oriented real-time distributed computing | 2014
Martin Hoffmann; Christoph Borchert; Christian Dietrich; Horst Schirmeier; Rüdiger Kapitza; Olaf Spinczyk; Daniel Lohmann
Developers of embedded (real-time) systems can choose from a variety of operating systems. While some embedded operating systems provide very flexible APIs, e.g., a POSIX-compliant interface for run-time management, others have a completely static structure, which is generated at compile time by utilizing detailed application knowledge. A prominent example for the latter class from the domain of automotive operating systems is OSEK/OS and its successor AUTOSAR/OS. As we have shown in previous work, the design of the operating system has a strong impact on its vulnerability for system failure caused by hardware faults. This observation is gaining importance, because there is an ongoing trend towards low-power and low-cost, yet less reliable, hardware. This work quantifies the difference in vulnerability for soft errors in main memory of a flexible (dynamic) operating systems (eCos) and a static system (CiAO), which has an OSEK-compliant structure. We also analyze the additional degree of robustness that is achieved by hardening an operating system with software-based and hardware-based fault-tolerance measures and the corresponding costs. Covering this design space gives developers a better chance for good design decisions with respect to the trade-off between fault tolerance, resource consumption, and interface convenience. Our results indicate that with a combination of hardware- and software-based fault-tolerance measures, silent data corruptions in both operating systems can be reduced to below one percent (compared to eCos). However, the analyzed fault-tolerance mechanisms are expensive for the dynamic system, whereas the statically designed operating system can be hardened at much lower price.
international conference on mobile systems, applications, and services | 2012
Christoph Borchert; Daniel Lohmann; Olaf Spinczyk
Internet protocols are constantly gaining relevance for the domain of mobile and embedded systems. However, building complex network protocol stacks for small resource-constrained devices is more than just porting a reference implementation. Due to the cost pressure in this area especially the memory footprint has to be minimized. Therefore, embedded TCP/IP implementations tend to be statically configurable with respect to the concrete application scenario. This paper describes our software engineering approach for building CiAO/IP - a tailorable TCP/IP stack for small embedded systems, which pushes the limits of static configurability while retaining source code maintainability. Our evaluation results show that CiAO/IP thereby outperforms both lwIP and uIP in terms of code size (up to 90% less than uIP), throughput (up to 20% higher than lwIP), energy consumption (at least 40% lower than uIP) and, most importantly, tailorability.
programming languages and operating systems | 2015
Christoph Borchert; Olaf Spinczyk
Transient hardware faults in computer systems have become widespread as shrinking structures and low supply voltages reduce the amount of energy needed to trigger a fault. This paper describes the latest improvements of a software-based fault-tolerance mechanism called Generic Object Protection (GOP). It is based on Aspect-Orientied Programming in AspectC++ and has been used in a case study to harden the L4/Fiasco.OC microkernel. As a result, the improved GOP avoids 60% of kernel failures at an acceptable overhead of 19% code size and less than 1% runtime. The GOP improvements use static whole-program analysis and have been implemented in a prototypical manner. As an outlook, the paper presents envisioned language extensions providing whole-program control-flow and data-flow analyses in future AspectC++ versions.Transient hardware faults in computer systems have become widespread as shrinking structures and low supply voltages reduce the amount of energy needed to trigger a fault. This paper describes the latest improvements of a software-based fault-tolerance mechanism called Generic Object Protection (GOP). It is based on Aspect-Orientied Programming in AspectC++ and has been used in a case study to harden the L4/Fiasco.OC microkernel. As a result, the improved GOP avoids 60% of kernel failures at an acceptable overhead of 19% code size and less than 1% runtime. The GOP improvements use static whole-program analysis and have been implemented in a prototypical manner. As an outlook, the paper presents envisioned language extensions providing whole-program control-flow and data-flow analyses in future AspectC++ versions.
Information Technology | 2015
Muhammad Shafique; Philip Axer; Christoph Borchert; Jian-Jia Chen; Kuan-Hsun Chen; Björn Döbel; Rolf Ernst; Hermann Härtig; Andreas Heinig; Rüdiger Kapitza; Florian Kriebel; Daniel Lohmann; Peter Marwedel; Semeen Rehman; Florian Schmoll; Olaf Spinczyk
Abstract This paper presents a multi-layer software reliability approach that leverages multiple software layers (e. g., programming language, compiler, and operating system) to improve the overall system reliability considering unreliable or partly-reliable hardware. We present a comprehensive design flow that integrates multiple software layers while accounting for the knowledge from lower hardware layers. We show how multiple software layers synergistically operate to achieve a high degree of reliability.
dependable systems and networks | 2014
Arthur Martens; Christoph Borchert; Tobias Oliver Geiβler; Daniel Lohmann; Olaf Spinczyk; Rüdiger Kapitza
State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.
automation, robotics and control systems | 2017
Thiago Santini; Christoph Borchert; Christian Dietrich; Horst Schirmeier; Martin Hoffmann; Olaf Spinczyk; Daniel Lohmann; Flávio Rech Wagner; Paolo Rech
For decades, radiation-induced failures have been a known issue for aero-space systems, in which redundancy mechanisms are employed as a protection method. Due to the shrinking of structures and operating voltages, these failures are increasingly becoming an issue even for terrestrial applications. Unfortunately, redundancy increases costs, area usage, and power consumption, which can hinder its utilization in cost- and power-sensitive safety-critical applications, such as automotive. To overcome this limitation, multiple software-based approaches have been proposed, which assume the existence of an underlying error-free operating system. In this paper, we investigate the radiation reliability of two dependability-oriented real-time operating systems, namely, the popular eCos operating system hardened through aspect-oriented programming methods, and dOSEK, an embedded kernel designed from the ground up having reliability as a major concern. Both operating systems were evaluated through extensive neutron-beam testings on a 28 nm ARM-based state-of-the-art system-on-chip, and their fault tolerance mechanisms reached reductions in the overall cross-sections relative to their baselines up to 91% and 74%, respectively.
Operating Systems Review | 2016
Christoph Borchert; Olaf Spinczyk
Transient hardware faults in computer systems have become widespread as shrinking structures and low supply voltages reduce the amount of energy needed to trigger a fault. This paper describes the latest improvements of a software-based fault-tolerance mechanism called Generic Object Protection (GOP). It is based on Aspect-Orientied Programming in AspectC++ and has been used in a case study to harden the L4/Fiasco.OC microkernel. As a result, the improved GOP avoids 60% of kernel failures at an acceptable overhead of 19% code size and less than 1% runtime. The GOP improvements use static whole-program analysis and have been implemented in a prototypical manner. As an outlook, the paper presents envisioned language extensions providing whole-program control-flow and data-flow analyses in future AspectC++ versions.
international conference on computer safety reliability and security | 2014
Horst Schirmeier; Christoph Borchert; Olaf Spinczyk
Recent studies suggest that future microprocessors need low-cost fault-tolerance solutions for reliable operation. Several competing software-implemented error-detection methods have been shown to increase the overall resiliency when applied to critical spots in the system. Fault injection (FI) is a common approach to assess a systems vulnerability to hardware faults. In an FI campaign comprising multiple runs of an application benchmark, each run simulates the impact of a fault in a specific hardware location at a specific point in time. Unfortunately, exhaustive FI campaigns covering all possible fault locations are infeasible even for small target applications. Commonly used sampling techniques, while sufficient to measure overall resilience improvements, lack the level of detail and accuracy needed for the identification of critical spots, such as important variables or program phases. Many faults are sampled out, leaving the developer without any information on the application parts they would have targeted. We present a methodology and tool implementation that application-specifically reduces experimentation efforts, allows to freely trade the number of FI runs for result accuracy, and provides information on all possible fault locations. After training a set of Pareto-optimal heuristics, the experimenting user is enabled to specify a maximum number of FI experiments. A detailed evaluation with a set of benchmarks running on the eCos embedded OS, including MiBenchs automotive benchmark category, emphasizes the applicability and effectiveness of our approach: For example, when the user chooses to run only 1.5% of all FI experiments, the average result accuracy is still 99.84%.