2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks – Supplemental Volume (DSN-S) | 2019
Short-Liveness of Error Propagation in Kernel Can Improve Operating Systems Availability
Abstract
The reliability of operating systems is crucial to achieving high availability of computer systems. Unfortunately, Linux, a widely used operating system, is far from bug-free. Some recent studies point out error propagation is very short in the kernel and thus most data in the kernel are not corrupt even when a failure occurs. This paper explores the possibility of exploiting the property of short-liveness of error propagation in the kernel to improve the operating system availability. Our novel design of the memory management scheme allows us to recover the kernel by removing inconsistent data structures corrupted during error propagations.