High Velocity Kernel File Systems with Bento
Samantha Miller, Kaiyuan Zhang, Mengqi Chen, Ryan Jennings, Ang Chen, Danyang Zhuo, Tom Anderson
aa r X i v : . [ c s . O S ] M a y High Velocity Kernel File Systems with Bento
Samantha Miller Kaiyuan Zhang Danyang Zhuo † Thomas AndersonUniversity of Washington † Duke University
Abstract
High development velocity is critical for modern cloud sys-tems. However, rapid development and release cycles havemostly skipped operating systems. Modifications to behaviorin Linux, the most widely used server operating system in thecloud, must be done slowly to minimize risk of introducingbugs, be limited in scope, or be implemented in userspacewith a potential performance penalty.We propose Bento, a framework for high velocity develop-ment of Linux kernel file systems. Bento is inspired by therecent availability of type-safe, non-garbage collected lan-guages like Rust. It interposes a thin layer between kernelcalls to the file system and file system calls back to the ker-nel, exposing alternative interfaces to enable kernel file sys-tems written in safe Rust. Future work will provide supportfor online upgrades, userspace debugging, and composablefile systems. We evaluate Bento by using it to implement thexv6 file system and comparing against baselines written us-ing the kernel VFS layer and FUSE. We find that the Bentofile system achieves comparable performance to the VFS ver-sion and much better performance than the FUSE version.We also evaluate against ext4 on the macrobenchmarks andfind that ext4 performs between 33% and 3.2 × better thanthe Bento xv6 file system. High development velocity has become a widespread talis-man for cloud software development [28]. Many popularcloud systems roll out new software releases on a weekly oreven daily basis, to give users faster access to new features,to gain insight into priorities for further development, andto reduce integration costs. While this design pattern mayseem inappropriate for mission critical software, cloud ven-dors have shown it is practical to use short release cycles formany high reliability services, including databases [8] andnetwork stacks [10, 13, 22].Rapid release cycles have largely skipped the operating system kernel development community, however. Linux isthe most widely used server operating system for the cloud,but new versions drop only every few months, with majorchanges limited to once every few years. Of course, Linux isopen source, and so anyone is free to iterate more rapidly, atthe cost of the later pain of reintegration with the mainlinedevelopment tree.The Linux community has adopted several approaches toimproving feature velocity, none entirely successful. One ap-proach is to try to future-proof the kernel by adding featuresbefore they are needed. We can see that in action with thepopular Docker container manager [11]. Docker leveragesseveral recently-added Linux kernel features, but in the pro-cess exposed a number of potentially critical flaws in thosekernel services that could compromise the security of the en-tire operating system. Alternately, we can move kernel ser-vices to user level, such as with the FUSE file system ab-straction [14] and Open vSwitch [26]. However, these canimpose a prohibitively high performance penalty [1, 32], ne-cessitating a kernel caching layer [5] that poses its own setof tradeoffs. Certain Linux kernel interfaces can be rapidlyreconfigured with eBPF [30] scripts, but only for small lin-ear snippets of code. The widespread belief that in the fu-ture, all high performance operations must “bypass the ker-nel” is an illustration of how operating systems are losingthe race [3, 25, 36].Our goal is to enable high velocity development for high-performance, general-purpose operating system kernel exten-sions. Our trust model is that of a slightly harried kernel de-veloper, rather than an untrusted application developer. Wewant to provide a way for kernel developers to add kernel fea-tures in a manner that isolates bugs to within the extensionand also allows for dynamic replacement of that function-ality without the need to restart applications [35]. To be aswidely applicable as possible, we focus on enabling rapid de-velopment of Linux kernel code, rather than to assume a newcode base designed for extensibility, such as Exokernels [12],Spin [4], or Barrelfish [35]. To be concrete, we restrict our-selves, at first, to file system extensibility. We leave dynamic1eplacement of file systems to future work.Our approach is inspired by the recent availability of type-safe, non-garbage collected, performant languages like Rust.Writing kernel extensions in Rust eliminates a class of cross-module bugs that could compromise kernel security, withoutthe performance overhead of running at user-level or the re-strictions on extension behavior imposed by eBPF. However,supporting compatibility with existing operating systems andfeatures for high development velocity such as dynamic mod-ule replacement, debugging, and code reuse, is challenging.For this work, we focus on high velocity kernel file sys-tems. We have built a framework, called Bento, for inject-ing general-purpose file systems and file systems extensions,written in Rust, into Linux. Surprisingly, Linux’s existingpluggable file system interface, VFS, is poorly suited to ourneeds, as it assumes shared data structures can pass freelyacross the extension interface, complicating compile-timetype checking. Instead, Bento interposes a thin layer for callsinto the file system, and calls from the file system backinto the kernel, providing safety, high performance, general-ity, and compatibility with Linux. While our architecture isdesigned to be compatible with graceful online upgrades ofrunning file systems along with support for other features forhigh development velocity, we leave that for future work.We have used Bento to implement the xv6 file system torun in the Linux kernel. We have additionally implementedbaseline versions, one written in C against the VFS layerand one (written in Rust) running in userspace using FUSE.We found that our framework has performance very similarto, and sometimes better than, the VFS C version while theFUSE version performed much worse than both.In this paper, we make the following contributions: • We design and implement Bento, a framework that en-ables high-velocity development of safe, performant filesystems in Linux. • We present techniques for allowing safe Rust code torun in the Linux kernel and access kernel functionality. • We implement a file system using Bento and evaluateits performance characteristics.
One of the existing barriers to fast evolution in Linux comesfrom buggy code. New code often introduces bugs, disincen-tivizing fast evolution for mission-critical pieces of code likeoperating systems. Kernel code is particularly affected bythis because kernel bugs are often difficult to find and canhave severe non-local consequences. In particular, memorybugs, such as memory reuse and dangling pointers, can have Bug Number Effect on KernelUse Before Allocate 6 Likely oops
Double Free 4 Undefined
NULL
Dereference 5 oops
Use After Free 3 Likely oops
Over Allocation 1 OverutilizationOut of Bounds 4 Likely oops
Dangling Pointer 1 Likely oops
Missing Free 18 Memory LeakReference Count Leak 7 Memory LeakOther Memory 1 VariableDeadlock 5 DeadlockRace Condition 5 VariableOther Concurrency 1 VariableUnchecked Error Value 5 VariableOther Type Error 8 VariableTable 1: Count of analyzed bugs with effects of each bug,categorized as memory, concurrency, or type.catastrophic consequences on the reliability of the system,potentially even leading to security violations.To understand the properties of bugs in existing Linux ker-nel extensions, we analyzed bug reports for three extensionsused by Docker: AppArmor for security, Open vSwitch Dat-apath for networking, and Overlay FS for file system support.We analyzed all bug-fix git commits from 2014-2018 and cat-egorized them by the type of bug that was fixed.Our analysis focused on what we call low-level bugs: bugsthat are unrelated to the specific logic of the extensions.These bugs can be caught without knowing specific correct-ness properties needed by the extension. This is opposedto semantic bugs which are caused by violations of high-level correctness properties. Low level bugs made up 50%of the total bugs. We divided the low-level bugs into threecategories: memory bugs, concurrency bugs, and type errors.Memory bugs refer to incorrect usage of memory, including
NULL pointer dereferences, out-of-bounds errors, and mem-ory leaks. Concurrency bugs are caused by incorrect concur-rency patterns, such as deadlocks and race conditions. Typeerrors are caused by incorrect usage of kernel types, most of-ten by interpreting error values as valid data.The results of the analysis are shown in Table 1. We foundthat 68% of these bugs were memory bugs. Of the memorybugs, 50% were a type of memory leak. Many of the bugs oc-curred along error handling pathways, often due to incorrectchecking of returned values (unchecked error values) or miss-ing cleanup (memory leaks,
NULL pointer dereferences, etc.).Based on our analysis of these low-level bugs, 93% wouldbe prevented by using Rust. The remaining 7% of low-levelbugs were primarily deadlocks.Many of the bugs could have serious impacts on the in-tegrity of the system. Of the identified low-level bugs, 26%2afety Performance Generality Online UpgradeVFS ✗ ✓ ✓ ✗
FUSE ✓ ✗ ✓ ✗ eBPF ✓ ✓ ✗ ✗
Bento ✓ ✓ ✓ tbdTable 2: A comparison of Linux file system extensibilitymechanisms. None of Linux’s existing mechanisms provideall the desired features.of the bugs caused a kernel oops which either kills the of-fending process or panics the kernel. An additional 34% ofthe analyzed bugs bugs would result in a memory leak, poten-tially leading the system to run out of memory and openingup the system to DoS attacks.
Linux has several existing techniques to support rapid evo-lution of file system functionality. These include the Vir-tual File System (or VFS) layer built into Linux, FUSE foruserspace file systems, and eBPF for running small portionsof a user space code safely in the kernel. However, none ofthese approaches provide all of the properties we need forhigh velocity development. A summary is shown in Table 2,and details are discussed below. Note that compatibility withexisting Linux code is implicit in all of these approaches andin Bento.
VFS:
Linux provides a mechanism for adding new file sys-tems called the Virtual File System (or VFS) layer. This layerdefines a set of function pointers to be implemented by newfile system modules and calls these functions inside relatedsystem calls. It is used by all major file systems in Linux.This interface prioritizes generality and performance, al-lowing file systems maximum flexibility when interactingwith core kernel components. The resulting interface is com-plex and has few guardrails, making it difficult for develop-ers to implement new functionality without introducing bugs.While a new file system can be loaded dynamically, an exist-ing file system cannot be modified except by mount/unmountand quiescing application use of the file system. Likewise,debugging support is limited.
FUSE:
Filesystem in Userspace, or FUSE [14], enablesrunning file system code in userspace, via a small kernelVFS layer that forwards operations to the userspace imple-mentation. Thus, FUSE is able to achieve safety and general-ity, along with the ability to use normal user-level debuggers.This comes at a cost, however. All file system operations passthrough VFS and the FUSE kernel driver before being pack-aged up and copied to userspace, reducing performance byup to 83% [32]. Despite this slowdown, FUSE is frequently used for prototyping new file systems, especially in circum-stances where performance is not critical. FUSE does notprovide a mechanism for transparent online modification ofrunning file systems, although such a system could theoreti-cally be implemented at user level. eBPF:
Another approach to safe extensibility in Linux isthe eBPF (extended Berkeley Packed Filter) [23], an in-kernel virtual machine that allows short extensions with lim-ited control flow and written in a restricted language to berun at predefined points in the kernel. While the main-lineLinux kernel doesn’t support eBPF for file systems, a project(ExtFUSE [5]) has provided support for parts of a FUSE filesystem to be run in the kernel using eBPF. For kernel codethat can fit within thee eBPF model, this provides safe exten-sibility without significant performance overhead. However,the restrictions placed on eBPF extensions make it very diffi-cult to implement whole file systems or even significant filesystem extensions using eBPF. ExtFUSE does not supportdynamic reconfiguration.
The goal of Bento is to provide for high-velocity develop-ment of Linux file systems. To make our design goals con-crete, consider the OverlayFS extension to Linux used byDocker. OverlayFS allows for the name space of a file sys-tem to be layered on top of another, allowing containers tobe configured with a base file system plus changes. Or con-sider improving the support for non-volatile memory (NVM)to Linux. Systems such as Strata [17] have shown that pre-pending an operation log stored in NVM can dramaticallyimprove write performance while reducing vulnerability toapplication-level bugs. These operation logs can be repli-cated for high availability [2].Finally, consider what would be needed to add data prove-nance to Linux - the ability to track all of the data sources andexecutable images that could have affected a particular out-put file [31]. If a data source becomes invalid (e.g., becauseof a change to sensor calibration), provenance can be used totrack down what derived data needs to be regenerated. Fur-ther, old versions of data files may need to be retained (andlater garbage collected) if they are part of the provenance oflive output files.In all three cases, the functionality needs to work with ex-isting, unmodified Linux binaries, has complex internal logicand data structures, is performance-sensitive, benefits fromongoing development, and to be deployable, must not com-promise the security of the rest of the operating system. Weassume the developer is well-intentioned but a bit clumsy - itis not our intent to prevent malicious insider attacks for newlydeveloped code.3hus, our framework must support several, seemingly con-flicting, goals: • Safety:
Any bugs in a newly installed file system shouldbe limited, as much as possible, to applications or con-tainers that use that file system. These bugs should bekept to a minimum. • Performance:
Performance should be similar to thatachievable by the same functionality implemented di-rectly in the kernel. • Generality:
There is a large variety of file system de-signs that developers might want to implement. Theframework should not limit the types of file systems thatcan be developed. • Compatibility:
New functionality should be deploy-able to existing, unmodified Linux binaries without re-compiling or relinking, and without substantial changesto Linux’s internal architecture. • Development velocity:
The framework should supportdynamic upgrades to running file system code, transpar-ently to applications, except for a small delay. Further,code should be easily migratable between user level andthe kernel, to enable use of modern debugging and soft-ware analysis tools. This last goal is supported architec-turally by our approach, but experimental demonstrationis beyond the scope of this paper.Our high level approach for Bento is to enable writing filesystems in a safe, non-garbage collected language, specifi-cally Rust. This is able to provide the first three goals detailedabove. Rust’s strict type system is able to provide safety,eliminating certain classes of bugs such as
NULL pointer def-erences or use-after-free bugs. Since Rust is compiled like Cand does not use garbage collection, it has performance simi-lar to C and does not suffer from performance unpredictabil-ity caused by garbage collectors. Rust is a general purposeprogramming language and provides the necessary general-ity to enabling writing a wide variety of file systems.To realize this approach, we need to address several chal-lenges. Compatibility with existing operating systems andonline upgrades, the other two goals for this work, are not in-herently provided by writing file systems in Rust. Bento mustprovide additional support in order to achieve these proper-ties. However, challenges arise when trying to provide thatsupport.
In order for a Rust file system to execute in the Linux ker-nel, there must be a way for the Rust file system to inter-act with the C kernel. A naive approach is just compilingthe Rust file systems into a binary format and load it into the kernel. Rust is designed to interface with code written inother languages, particularly C, easily using its Foreign Func-tion Interface. Rust code can call functions written in C andvice versa, and Rust data structures can be tagged so theyuse C-style memory layout. In fact, without considering anyother factors, running Rust code in the Linux kernel is fairlystraightforward.However, this naive approach does not maintain the safetyof the Rust file systems. Rust code that calls external func-tions or dereferences raw pointers must be tagged as unsafe.Rust’s type system is not able to provide the same guaranteesabout unsafe code, e.g.
NULL pointer dereferences and out-of-bounds accesses are possible, so unsafe code cannot providethe safety we require for Bento. Simple techniques for intro-ducing safety, such as wrapping C functions in safe wrappersor replacing pointers with references, are not enough to fullyprovide safety due to fundamental challenges caused by ker-nel design patterns, which we now describe. We assume thatthe kernel is correct.
One challenge is caused by memory management for datastructures passed across the boundary between the file sys-tem and the kernel. Rust is able to provide memory safetyand automatic memory management by doing compile-timetracking of data structures. However, the VFS interface re-quires that some data structures created by the file system bepassed across the kernel/file system boundary and back again.Since the Rust compiler is not able to analyze the code out-side the file system, it is not able to verify the safety of tak-ing ownership of data structures from the kernel. Therefore,the VFS file system interface cannot be implemented in safeRust.
Another challenge stems from the file system’s need to ac-cess services provided by the kernel. However, the interfacesexposed by kernel services are not designed for Rust’s safetyguarantees, so kernel services cannot necessarily be exposedsafely to Rust file systems without modifications. To allowthe file system to use kernel services safely, Bento must trans-late the unsafe kernel-provided interfaces into interfaces thatcan be used by the file systems safely.
Online upgrades, updating a file system without bringing itoffline, also is not provided by writing the file systems in asafe language. In Linux today, file system module upgradesis done by shutting down all services relying on the file sys-tem, unmounting the file system, removing the module, in-serting the new module, mounting the new file system, and4hen restarting all services. In order to support online up-grades, additional functionality must be added to enable up-dating to a new version of the file system without requiringthe file system or services running on top of it to be shutdown. Trying to implement that functionality in Linux givesrise to the following challenges.
The memory management pattern described in §3.1.1, wheredata structures created by the file system are passed to the ker-nel, also introduces challenges for online upgrades. Since thekernel holds data structures backed by file system memory,and the file system has no way to control when that memoryshould be reclaimed. If the file system were updated whenthere were outstanding data structures held by the kernel,those kernel pointers would become invalid. To avoid thiscase, the file system must wait for the kernel to have com-pleted all operations on the file system and have returned allshared-ownership data structures to the file system. There isno guarantee of this happening until the file system is un-mounted, so upgrades cannot be done online.
Another challenge is caused by the need to track data struc-tures that the file system is currently using, both data struc-tures from kernel services and in-memory data used by thefile system. For example, a running file system will executeblock I/O or possibly network operations and may be usingkernel data structures for those operations when the upgradeoccurs. The file system could also have internal, in-memorystate such as which blocks need to be written to a commitlog or a cache of on-disk data structures. If the file systemupdates without transferring any of its in-use data structures,potentially bad behavior can occur. In the best case, cachesof on-disk data structures need to be rebuilt, and performancetemporarily suffers. In the worst case, correctness conditionscould be violated if the file system requires long-lived state.Since the existing techniques for upgrades in Linux assumethat the file system will be completely shut down during theupgrade, there are no mechanisms to transfer data structures.
The ability to quickly and effectively debug code is criticalfor fast development in practice. Kernel code is notoriouslydifficult to debug because of the often non-local effects ofkernel bugs and the potential for a buggy operating systemto interfere with the process of debugging. In order to enableeffective debugging, we propose allowing file systems writ-ten using Bento to be run in userspace without requiring codemodifications.
To support running the same code in the kernel and inuserspace, we must provide an API that can be implementedin both. All APIs, both for Bento to call file system functionsand for the file system to access necessary services, must bethe same in both the kernel and userspace. Providing compat-ibility with Linux will not necessarily provide this becausethe interfaces provided by kernel services may not be com-patible with the system call interface.
The ability to reuse code is also important for developmentvelocity. This is particularly relevant for file systems becausethere are many circumstances when a user would want tomodify the behavior of an underlying file system, such asenabling encryption or tracking data provenance. In Linuxtoday, developers can implement these types of file systemsby stacking layers of file systems (e.g., the ecryptfs file sys-tem can be layered on top of another file system to add en-cryption). The higher layer file systems call top-level VFSfunctions to access the lower file systems as if the relevantsystem call had been executed. This support for stackable,or composable, file systems allows developers to provide ser-vices as file system modules that can be used with any exist-ing file system.
Linux’s existing model for composable file systems can besupported by exposing the top-level VFS functions to Bentofile systems. However, it is not clear that this is the best so-lution. Calling top-level VFS functions can add overhead toeach call to a lower file system, resulting in potentially largeoverhead if several file systems are layered on top of one an-other. Bento may be able to provide a different interface forsupporting composable file systems that does not introducethis overhead but still provides the necessary flexibility.
The design of Bento is shown in Figure 1. Shaded portionsare the framework. The framework runs as a thin layer thatsits between the unmodified Linux kernel and kernel-levelfile systems designed for our framework. The Linux kernelis unmodified other than the introduction of Bento. Like theVFS layer, Bento defines a set of function calls that file sys-tems must implement and provides a mechanism for file sys-tems to register themselves with the framework by providingthe necessary function pointers. Unlike the VFS layer, Bentois designed to support file systems written in Rust, a type-safelanguage that provides memory safety and data race freedom.5 hallenge Solution Problem Description Detailed Solution
Unsafe Shared Memory Management Restricted Memory Sharing §3.1.1, §3.2.1 §4.3Unsafe Kernel Interfaces Safe Abstractions Around Kernel Services §3.1.2 §4.5Transferring Objects During Upgrade Online Upgrade Component §3.2.2 §4.8Table 3: Summary of Challenges and the Associated SolutionsVFSBentoFSFile SystemBentoKSKernel Services ① File Operations API ② Kernel Services API (a) Kernel Bento
PosixBentoFS-UserFile SystemBentoKS-Userclib (b) UserspaceBento for Debug-ging
Figure 1: The design of BentoTable 3 shows a summary of the challenges and solutionsin Bento. Bento currently consists of two components. Onecomponent of the framework interposes between the VFScalls and the file system, handling calls into the file system.This component provides the file operations API, translatingfrom the VFS interface. The other component interposes be-low the file system, handling calls out of the file system intothe kernel. This component provides wrappers around kerneldata structures and functions, allowing the Rust file system tosafely access relevant kernel functionality. For file systems,this primarily handles block I/O.
To write a file system using Bento, developers write a safeRust kernel module using the provided APIs and insert thatmodule into their running Linux kernel like any other kernelmodule. File system functions are exposed to the operatingsystem by implementing the file operations API and provid-ing those function pointers to Bento when the file system isinserted. When file system functions need to access kernelfunctionality, they can do so by calling the safe Rust func-tions provided by the kernel services API.
The VFS layer and the patterns it introduces cause fundamen-tal challenges to safety when handling memory managementof shared data structures, in particular inodes. Inodes are allo-cated and destroyed using functions implemented in the filesystem and called by the kernel. When the kernel needs anew inode, it requests one from the file system which allo-cates the inode using its own memory pool. When the ker-nel is finished with the inode, it returns the inode to the filesystem so the memory can be reclaimed. Giving ownershipfrom Rust to C can be implemented in Rust by leaking thememory behind the data structure; this is safe because leak-ing memory does not violate Rust’s memory safety, but isnot ideal. Taking ownership from C to Rust cannot be im-plemented safely. Rust must trust that the data structure willnot be used anymore and was originally allocated by Rust.Since these properties cannot be validated by the Rust com-piler, this is inherently unsafe.In order to enable safe file systems, Bento must provide adifferent interface than the VFS layer for file system opera-tions. Calls from the VFS layer are intercepted by BentoFSand translated into this new interface, shown in Figure 1 at ① .This interface calls from BentoFS to the file system, so theinterface must be designed so it can be implemented safely.To support this, we define a model that our interface mustfollow. Our interface follows what we call an “ownership model”,borrowing the terminology from Rust. In this model, own-ership of an object can never be passed across the interface,but objects can be “borrowed”. For each object, one side ofthe interface is responsible for both the lifetime management(tracking when the object is no longer needed) and memorymanagement. To share an object, the caller passes a referenceto the object to the callee. This does not pass ownership (thecallee has no control over the underlying memory) but doesallow the callee to access the object. This is analogous to aborrow in Rust and similarly can be mutable or immutable, al-lowing modification of the object or not, respectively. To thefile system developer, this is just writing typical Rust code.This model implies a contract between the caller and thecallee. The caller is responsible for ensuring that the object isnot freed while it has been borrowed, that the object is valid,and that only one mutable borrow exists at one time. The6allee is responsible for only accessing an object during theborrow window, accessing objects correctly (i.e., no pointerarithmetic), and only mutating objects during a mutable bor-row.In this case, the callee is the file system, written in Rust.All of the callee’s responsibilities are checked by the Rustcompiled when using safe Rust, so the file system is guaran-teed to uphold the model. Our framework is the caller andmust be carefully designed to fulfill its side of the contract.This ownership model can be viewed as a relaxed versionof what is needed across address space boundaries where nomemory can be shared. This observation led us to leveragethe FUSE kernel module and the FUSE low-level API whendeveloping BentoFS and the file operations API. The file op-erations API is a Rust version of FUSE low-level API aug-mented with a reference to the super_block data structureneeded for file system block operations.This model should not introduce significant performanceoverhead. This loan/borrow model is only used to check com-pile time properties, so does not add performance overheadat runtime. The performance impact of the interface changeis more difficult to predict, but should still be low. The designinterface does not increase the functionality needed to imple-ment a file system, it just splits the behavior implemented bya VFS file system between BentoFS and the file system.
File systems need access to kernel functionality implementedoutside the file system, such as block I/O for access to theunderlying storage device. These kernel interfaces, like thosein the VFS layer, are not designed to abide by type-safetyproperties and so cannot be directly used in the file system.In order to enable use of necessary kernel services, BentoKSprovides safe abstractions around kernel data structures andfunctions.As an example, we will focus on the kernel block I/O func-tions. File systems in Linux access block devices using thebuffer cache. In this API, a file system that needs to read orwrite to a block device calls sb_bread , passing in a pointerto the super_block data structure and a block number. Thisfunction returns a buffer_head data structure representingthe requested block. The block’s data is represented as apointer and size in the buffer_head and the file system canread and/or write to this memory region. When the file sys-tem is done using the buffer_head , it must call brelse orbuffers can be leaked.The widespread use of pointers and pointer manipulationin the Linux kernel make this challenging. Safe Rust disal-lows dereferencing raw pointers because the compiler cannotcheck the validity of the memory being pointed to. Rust in-stead relies on typed references that cannot be offset, cast tononequivalent types, or
NULL safely. However, many kernelinterfaces rely on pointers, so these interfaces cannot be used by the file system safely.
In order to access kernel functionality, the file system mustbe able to use kernel data structures, both for calling kernelfunctions and for making use of objects provided to the filesystem by the file operations API described above. The ker-nel operates on pointers, but directly exposing these pointersto the file system results in safety errors. If the block I/Ofunctions exposed to the file system accept a pointer to thesuperblock, no guarantees can be made about the memorylayout underlying that pointer.We use a capability-based model to safely expose ker-nel pointers to the file system where pointers are replacedby capability-style types defined in Bento. These types givethe file system the right to access to the fields of the datastructure and to call functions that are exposed by that type.Creation of these capability-types is limited; they cannot besafely cast from other types, and initialization is predefinedand sometimes entirely disallowed. Bento converts betweenthe capability type and the analogous kernel type. For exam-ple, the file system often receives the
SuperBlock capabil-ity type from the file operations API to represent the kernel super_block data structure. It can use the
SuperBlock ca-pability type to read fields of the kernel super_block andcall kernel functions like sb_bread for block I/O that requirea kernel super_block . The
SuperBlock type cannot be cre-ated by the file system, so having this type is proof that thefile system has access to a valid kernel super_block . Bentocan then safely convert the capability type to a pointer anddirectly access kernel functions.The capability types are compile-time wrappers aroundpointers so the Rust compiler can enforce correctness prop-erties at compile time. It is assumed that the kernel passes invalid pointers, so no properties need to be checked at runtimeand no runtime overhead is added.
Bento must also provide wrapping abstractions around ker-nel services so they can be used safely by the file system.To enable file systems written in safe Rust, Bento must pro-vide safe abstractions wrapping kernel services. These ab-stractions can be used by the file system like any other Rustdata structures and functions.To be concrete, we address the example discussedabove. We provide a safe abstraction to wrap the kernel buffer_head . We implement a method on the
BufferHead wrapper to convert the separate pointer and size fields for thecontained memory region into a sized memory region thatcan be used safely. That method must use unsafe code tomake a sized memory region out of the unsized pointer andsize fields, but the file system can call the method safely. To7revent accidental memory leaks, we call the brelse func-tion in the drop method of the
BufferHead wrapper, whichis called when the wrapper goes out of scope. With this,buffer management has the same properties as memory man-agement in Rust: memory leaks are possible but difficult.These abstractions can, in some cases, add a small amountof performance overhead. If a kernel function has require-ments on its arguments, the wrapping method will mostlikely need to perform a runtime check to ensure that therequirements are held. This overhead should be small sincechecks are not performed often and are simple.
In order to enable online upgrades, Bento will provide a me-diating layer that maintains any state that needs to be pre-served through the upgrade, such as long-lived kernel datastructures like a network connection for a networked file sys-tem or internal file system state like an in-memory cache ofon-disk data structures. Bento is already a runtime in the ker-nel, so it can easily be extended to include the necessary func-tionality.This component will need to have a data structure transfermechanism so important data structures can be passed fromthe old version of the file system to the new version duringthe upgrade. Kernel data structures can already be trackedby Bento through the kernel services API, and functionalitycan be added to support transferring these data structures.To transfer file system internal data structures, the onlineupgrade component will extend Bento’s interface with newfunctions for storing in-memory state and initializing fromthat provided state. When the old version of the file systemis about to be stopped, the online upgrade component willcall the file system’s provided function. This function willperform any necessary shutdown, such as flushing state, andwill return in-memory state that should be transferred. Thisstate will then be passed to the new version of the file systemwhen it starts up so it can restore the necessary in-memorystate.
To support easy debugging, Bento will enable developers torun the same code in userspace and in the kernel and so useuserspace debuggers. To enable this, Bento will provide alter-nate implementations of the BentoFS and BentoKS compo-nents that interface with userlevel interfaces, specifically thePOSIX API instead of VFS and C library functions insteadof kernel services. Since the interfaces exposed by the kerneland by userspace libraries are different, it is not obvious thatthe APIs written for the kernel will be able to be implementedwithout modification. We will analyze and implement this aspart of our future work.
Bento is built in Linux kernel version 4.15. It is implementedas a Linux kernel module in 1409 lines of Rust code for Ben-toKS and 7409 lines of C code for BentoFS.
Writing a kernel module in Rust is different than writinguserspace Rust code. The basic structure of our kernel mod-ule is borrowed from tsgates/rust.ko on Github. The kernelmodule is compiled as a static library which is then linkedwith any required C code to generate the kernel module (a.ko file). This kernel module can then be inserted into thekernel as normal by any sudo user. Kernel code in Rust, likeall kernel code, cannot use the standard library, but the Rustcore library can still be used. We found that we had to ad-ditionally limit the Rust implementation to code that can’tcause a panic.The Rust portions of the Bento kernel module must inter-face with C code. Rust data structures can be tagged with to force the memory layout to match the C lay-out of the same structure, allowing the data structure to bepassed across the language boundary. Rust functions can becalled from C as long as they are tagged with ,preventing the Rust compiler from mangling the name ofthe function. Rust’s FFI (Foregin Function Interface) enablesRust code to seamlessly call functions implemented in C.The Rust code only needs to define the function interfacein an extern block and the functions will be linked at com-pile time. The Rust bindgen tool can be used to automaticallygenerate these bindings from C header files.
One of the primary jobs of Bento is to interpose between theVFS layer and the file system. As part of this translation, thefile operations component of Bento must handle the interac-tions with core kernel data structures that are expected of afile system written against the VFS layer.We use the FUSE kernel module and the FUSE low-levelinterface as starting points for BentoFS and the file opera-tions API. The FUSE kernel module must implement muchof the same functionality as BentoFS, so we use a modifiedversion of it to implement BentoFS.Unlike in FUSE, the file operations layer and the file sys-tem reside in the same address space and trust domain. Bentocan therefore communicate with the file system using func-tion calls. Our framework implements this like the VFS layer;function pointers to file system operations are stored in a datastructure that is provided to Bento when the file system ismounted and upgraded.8 .3 Implementing Safe Wrappers
The Rust capability types are implemented as a Rust typewith one field: a pointer to the relevant kernel type. This en-ables BentoFS to pass a pointer to the kernel data structure tothe file system functions with no overhead. BentoKS imple-ments methods of these capability types that the file systemcan use to safely access kernel functionality. These functionscan be called from the Rust file system on the capability typedata structures even though these were originally allocatedas C data structures.
We evaluate the performance of Bento to determine what,if any, overheads exist to using it. For this, we have imple-mented the file system from the xv6 teaching operating sys-tem and two variants: one written in C, running in the ker-nel using the VFS layer and one written in Rust, running inuserspace using FUSE. By comparing against the VFS layer,we can determine the overhead Bento introduces. By com-paring against FUSE, we can quantify the benefits of Bentorelative to a purely userlevel file system.Since xv6 is a toy operating system, it is missing opti-mizations that a commercial-grade optimizations would have.This can heavily impact the FUSE baseline because the un-optimized operations may be particularly expensive fromuserspace. The VFS baseline is also less optimized thanBento because Bento inherits optimizations from the FUSEkernel module while the VFS baseline was just written forthis evaluation. Therefore, the xv6 evaluation could be some-what unfairly optimistic to Bento when compared to the sameevaluation on a commercial-grade file system. We thereforealso compare against ext4 on the macrobenchmarks. Ext4 ismore optimized than the xv6 file system, but the performanceresults can still be compared to understand ballpark perfor-mance differences. Relatively small differences can indicatethat our results may be similar to those we would achieveon a commercial-grade file system. We mount ext4 with the data=journal option so it logs file data in the journal likethe xv6 file system.
In order to write the xv6 file system in Rust and run thebenchmarks, some changes needed to be made to the file sys-tem design. In all versions of the file system, we needed toadd locks around inode and block number allocations due torace conditions on the block device. We also added doubleindirect blocks to all three versions of the file system so filesup to 4GB could be created. In general, the Rust versionsinclude more locks than the C version and official xv6 repos-itory [34], specifically on global mutable variables that are only modified during initialization. Otherwise, the Rust filesystems are nearly identical to the C file systems.
As a baseline, we implement a VFS file system written inC. It is implemented in 1862 lines of C code. This filesys-tem is as close to our framework’s version as possible to en-able accurate comparison between the two approaches. Thisbaseline allows us to analyze any overhead that Bento mayintroduce over the VFS layer.The other baseline is a userlevel version using FUSE. Thisis 1744 lines of Rust and uses a Rust reimplementation ofthe FUSE userspace library [15] with minor changes suchas enabling the writeback cache. The code for this versionis nearly identical to the code written using our framework.Minor changes to the code are needed to swap out kernelservices for Rust user-level services, such as using the Ruststandard library mutex instead of the kernel semaphore. Ad-ditionally, block I/O from userspace is done by opening theLinux disk file using the
O_DIRECT flag. We note future workwill be able to run the same code in Bento and at userlevel.
The benchmarks were run on a machine with 8 × Intel Corei7 CPU, 31 GiB DDR4 RAM, and a 512GB Samsung PM981NVMe SSD. All benchmarks were run using the SSD asthe backing device. Due to the present health concerns, thebenchmarks were run through a virtual machine using PCIepassthrough to access the SSD.
From filebench, we run the single-threaded and 32-threadedread, write, file creation and file deletion microbenchmarksand the varmail and fileserver macrobenchmarks. To this, wealso measure untaring the Linux kernel. These benchmarkswere run on an NVME SSD and executed for one minutein all cases. The varmail macrobenchmark simulates a mailserver. It repeatedly generates file creates, file deletes, filereads and writes, and appends to an operation log, syncingafter writing. The fileserver macrobenchmark simulates a fileserving application.We see significant slowdown because of slow block I/Ofrom userspace, even through the
O_DIRECT file interface.Each block operation from userspace must pass across theuser/kernel boundary and through the VFS layer beforereaching the disk, adding 200-400ns to each operation. Ontop of that, the file interface imposes additional overheads.The file system must occasionally sync blocks to disk (suchas during log operations), but the file interface provides noway to sync parts of a file, so the whole disk file must be9eq-1t seq-32t rnd-1t rnd-32t0100200300 O p s / s ec ( x1000 ) Bento C-Kernel FUSEFigure 2: Read Performance (4KB), Ops/secsynced every time one block needs to be synced, makingfsyncs very costly.
The performance results for the read microbenchmarks areshown in Figure 2 and Figure 3. All figures include single-threaded and 32-threaded benchmarks for both sequentialread and random read. Figure 2 shows performance for 4KBreads in operations per second. The other graphs show perfor-mance for 32KB, 128KB, and 1024 KB reads in throughput,measured in MBps.All three versions of the file system show very similar per-formance results for all sizes of reads for both random andsequential reads. The similarity in performance is due to in-kernel caching and the small size of the file. All three ver-sions of the file system use the same technique for cachingread requests, implemented in the file system in the C-kernelversion, in the file operations layer in Bento, and in FUSEkernel module for the FUSE version. Since the file is smalland read requests are fast, the file is cached very quickly. Af-ter this, all requests hit the same in-kernel cache, and all ver-sions execute the exact same code. The xv6 file system can-not support files larger than 4GB, so we cannot run a bench-mark that evaluates the differences.
The performance results for the write microbenchmarks areshown in Figure 4. The graphs include single-threaded se-quential writes and single-threaded and 32-threaded randomwrites for 32KB, 128KB and 1024KB writes. Our evaluation 1 Thread 32 ThreadsBento 1126 1072C-Kernel 933 881FUSE 24 24Table 4: Create Microbenchmark Performance (Ops/sec)does not include 4KB writes because these often triggered asegmentation fault in Filebench. Performance is measured inthroughput in MBps. The performance of the FUSE file sys-tem was so low, these bars are nearly flush with the bottomof the graphs.The Bento file system shows similar performance to theC version of the file system, and both perform much betterthan the FUSE version of the file system. The versions writ-ten in Rust using Bento and in C in the kernel implementnearly identical behavior, so it is expected for them to havesimilar performance. The Bento file system performs some-what better than the VFS file system on large writes becauseBento, which inherits from the FUSE kernel module, uses amore optimized technique for writing pages. Bento uses the writepages method instead of writepage , allowing sequen-tial pages to be batched.We see significant slowdown because of slow block I/Ofrom userspace, even through the
O_DIRECT file interface.Each block operation from userspace must pass across theuser/kernel boundary and through the VFS layer beforereaching the disk, adding 200-400ns to each operation. Ontop of that, the file interface imposes additional overheads.The file system must occasionally sync blocks to disk (suchas during log operations), but the file interface provides noway to sync parts of a file, so the whole disk file must besynced every time one block needs to be synced, making fsync s very costly.
Performance for the file creation microbenchmark on an SSDis shown in Table 4 for both single threaded and 32-threadedcreates. In these benchmarks, Bento shows competitive per-formance to the C version in the kernel and much betterperformance than the FUSE version. File creation involvesmany small writes (and so syncs in the log), so the FUSEperformance is heavily impacted by slow syncs. FUSE showsless slowdown for creates than it does for writes. This occursbecause the create microbenchmarks spend a smaller percent-age of the time executing slow disk operations.
Performance results for the file deletion microbenchmark onan SSD are shown in Table 5 for both single-threaded and 32-threaded benchmarks. These results show similar trends as10ento C-Kernel FUSE s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (a) Reads (32KB) s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (b) Reads (128KB) s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (c) Reads (1024KB) Figure 3: Read Performance (32KB-1024KB), Throughput MBps (x1000)Bento C-Kernel FUSEseq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (a) Writes (32KB) seq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (b) Writes (128KB) seq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (c) Writes (1024KB) Figure 4: Write Performance, Throughput (MBps)11 Thread 32 ThreadsBento 7499 7502C-Kernel 7500 8253FUSE 118 116Table 5: Delete Microbenchmark Performance (Ops/sec)Varmail (ops/s) Fileserver (ops/s) Untar (s)Bento 320 3860 19.8C-Kernel 303 2947 31.6FUSE 24 7 3404.9Ext4 785 5172 6.2Table 6: Macrobenchmark Performancethe file creation microbenchmarks because both are metadataheavy benchmarks so generate many small writes.
Performance results for the varmail macrobenchmark on anSSD are shown in Table 6. The file system implemented us-ing Bento and the C version in the kernel have very similarperformance while the FUSE version shows much worse per-formance. Since this is a metadata-heavy macrobenchmark,the xv6 file system results are similar to the metadata-heavymicrobenchmarks (file creation and deletion). The FUSE ver-sion performs comparatively better on the varmal benchmarkthan it does on the other benchmarks because varmail exe-cutes fsyncs on files. While fsyncs are slower for the FUSEversion than they are for the other two xv6 file systems, theslowdown is not as large when whole files are being syncedinstead of individual blocks. On all three versions of the xv6file system, the fsyncs take up the majority of the runtime,so the performance properties of the fsyncs are reflected inthe overall performance numbers. For this benchmark, ext4performs about 2.5x faster than either of the in-kernel xv6implementations.
Performance results for the fileserver benchmark on the SSDare shown in Table 6. This benchmark involves many reads,writes, and file creates and deletes. Like the other bench-marks, these results show that the file system implementedusing Bento and the version using the VFS layer in the ker-nel have very similar performance, and both outperform theFUSE version. This benchmark is particularly affected by theFUSE slowdowns because it involves many writes and cre-ates, both of which introduce significant overhead. Ext4 onlyperforms only 33% better than the xv6 file system written us-ing Bento. At that point, ext4 appears to be bounded by the throughput of the SSD.
This benchmark (shown in Table 6) untars the Linux kernelonto the relevant file system, generating many file creationsand writes across many directories. Unlike the other bench-marks, this measures total execution time instead of opera-tions per second, so lower is better. This benchmark showssomewhat more performance difference between the Bentofile system and the VFS file system. This is likely causedby the same difference seen in the write microbenchmarks:Bento is able to batch sequential writes while the VFS imple-mentation of xv6 is not.
As future work, we will demonstrate development velocityby implementing and evaluating real-world file systems us-ing Bento. The simplicity of the xv6 file system was idealfor the proof-of-concept work, but a more full-featured filesystem will better demonstrate the velocity, generality, andlow performance impact of Bento. We plan to focus on a re-search file system that shows promise for practical use, bothproving that Bento can support a more complicated file sys-tem and providing a high-performance implementation of afile system that has demand in the community.
FUSE (Filesystem in Userspace) [14] is a framework that en-ables implementing a file system to run in userspace. TheFUSE framework consists of two pieces, a kernel driver thattranslates VFS calls to FUSE-internal requests that are sentto userspace and libFUSE, a userspace library that interfaceswith the user file system. Like Bento, FUSE targets safetyand ease-of-development for Linux file systems. It’s ableto provide these properties quite well by running code inuserspace and providing a simplified interface. However, run-ning the file system in userspace introduces extra kernelcrossings, leading to up to 83% overhead. This overheadis too severe for many applications, and production systemsrarely employ FUSE. Bento takes advantage of the interfacework done in FUSE, using the FUSE low-level interface anda modified version of the FUSE kernel driver.
The extended Berkeley Packet Filter (eBPF) [23] is anothertechnique for safe extensibility in Linux. It allows users toinsert limited pieces of code to run in the Linux kernel ata set of predefined locations with restricted permissions. As12mplied by the name, eBPF was originally aimed at networkpacket processing, but has since been expanded to supportbroader functionality. While the mainline eBPF code has nosupport for file systems, a project [5] has enabled writingparts of a stackable file system using eBPF. Broadly, eBPFprovides a high level of safety and good performance butcan’t easily support the large modules with complex logicand data structures that Bento targets. eBPF programs arelimited in size and type of operations. While eBPF pro-grams can be chained together with tail calls, maintainingstate across the tail calls is complicated at best.
Over the last few years, several papers have been publishedon verified operating systems and file systems [7, 24, 29].These projects use formal verification to ensure that an im-plementation of a system abides by some defined correct-ness properties. By enforcing proofs of correctness proper-ties, this technique is able to eliminate many bugs withoutnecessarily adding performance overhead. However, verifiedfile systems are still difficult to design and implement, requir-ing specialized knowledge. Additionally, there are currentlyno practical mechanisms to verifying concurrent code.
Software fault isolation is a technique for limiting the im-pact of faults in a module and has seen several implementa-tions [6, 33], including one for Linux modules [20]. Usingthis technique, faults in a protected module are unable to im-pact the correctness of surrounding code. SFI can have sig-nificant performance overhead in many cases, ranging any-where from 0 × to 4 × CPU overhead. This overhead man-ifests both while executing the isolated. module and whiletransitioning into and out of the module. Additionally, whileSFI can address some of the safety concerns when develop-ing Linux modules, it does not reduce the number of bugs inthe module, only ensuring that bugs in the module will beisolated to the module.
Other projects have also employed a high-level languagewith a strict type system to provide safety in the operat-ing system. Like Bento, the SPIN operating system [4] pro-vides safe extensibility by combining a safe, modular inter-face with modules implemented in a safe language, thoughSPIN designed the whole operating system around extensi-bility. Bento applies these techniques and the associated ben-efits to Linux. Other operating systems projects, such as Sin-gularity [16], Biscuit [9], and Redox [27], explore writingthe entire operating system in a high-level language. Addi-tionally, we’re not the first group to integrate Rust code into the Linux kernel. Other projects have implemented devicedrivers in Rust [18, 19].
The work we have done on Bento so far is the beginning ofa larger project. Over the next year, we plan to further workboth on Bento for file systems and for other interfaces acrossthe Linux kernel. Over the next six months, we plan to con-clude our work on the file system extensibility layer and sub-mit a paper on the topic. After that, we intend to apply theconcepts in this paper to other interfaces across the Linuxkernel, starting with networking.
On top of addressing the remaining challenges and imple-menting future work discussed thus far (online upgrades,debugging API, composable file systems, real-world evalu-ation), we will update to more recent kernel versions to takeadvantage of new functionality. Recent versions of the Linuxkernel have included a new abstraction for performing asyn-chronous I/O from userspace called io_uring . Using thisinterface for the I/O accesses from the FUSE version of thexv6 file system in the evaluation could result in better perfor-mance numbers, potentially decreasing the overhead seen byusing FUSE. More interestingly, our framework could hookinto the file I/O part of io_uring along with the VFS layer,allowing users to entirely bypass the VFS layer. We cannotcurrently use io_uring because our framework is built inLinux kernel version 4.15 and io_uring requires version5.1. Updating to the new kernel version and incorporating io_uring is part of our future work on this project.
While this project focuses on file systems, none of the goals,challenges, or techniques are unique to file systems; othertypes of extensions can also benefit from this design. Weplan to first focus on networking, particularly the TCP/IPfunctionality (or more broadly, OSI layers 3 and 4). Recentwork has shown demand for specialized, userspace networkstacks [21, 22] to improve performance for specific applica-tions. Similarly, exokernel operating systems [12] and kernel-bypass networking [25] seek to improve performance by us-ing optimized network stacks and/or avoiding the overheadof the Linux kernel network stack. However, kernel bypassnetworking has downsides, requiring the whole NIC to begiven to the userspace process and burning CPU cores forpolling. Applying the concepts from this projects to kernelnetworking could enable running specialized network stacksin the kernel, enabling the performance gains of running aspecialized network stack without the downsides of kernelbypass networking.13n the long term, we intend to apply the concepts from thisproject to interfaces across the Linux kernel. Many interfacescould benefit from improved safety and increased develop-ment velocity. Past work has used Rust to write drivers [18],but these drivers still have a significant amount of unsafecode. Other pieces of the kernel, such as scheduling algo-rithms or security modules, could be targets for future work.
In this paper we present Bento, a framework to improve de-velopment velocity in operating systems, focusing on Linuxkernel file systems for this work. We’ve identified severalproperties an extensibility framework must provide for highdevelopment velocity: safety, performance, generality, com-patibility with existing operating systems, and other fea-tures for fast development velocity such as online upgradesand code reuse. By taking advantage of Rust, a moderntype-safe, non-garbage-collected language, and enforcing re-stricted memory sharing between the file system and the ker-nel, safe abstractions around kernel services, and isolated filesystem modules, Bento is able to provide the first four ofthese properties for Linux kernel file systems, with the fifthcoming soon. We’ve implemented the xv6 file system usingBento and shown that it has similar performance to a VFSkernel file system using the same design. As future work,we’ll continue development on Bento, adding more featuressuch as support for online upgrades, and we’ll use Bentoto implement more full-featured file systems. Code will beavailable once this additional work has been completed.
References [1] Abutalib Aghayev, Sage Weil, Michael Kuchnik, MarkNelson, Gregory R. Ganger, and George Amvrosiadis.File systems unfit as distributed storage backends:Lessons from 10 years of ceph evolution. In
Proceed-ings of the 27th ACM Symposium on Operating SystemsPrinciples , SOSP ’19, page 353–369, New York, NY,USA, 2019. Association for Computing Machinery.[2] Thomas E. Anderson, Marco Canini, Jongyul Kim, De-jan Kostic, Youngjin Kwon, Simon Peter, Waleed Reda,Henry N. Schuh, and Emmett Witchel. Assise: Perfor-mance and availability via NVM colocation in a dis-tributed file system.
CoRR , abs/1910.05106, 2019.[3] Adam Belay, George Prekas, Ana Klimovic, SamuelGrossman, Christos Kozyrakis, and Edouard Bugnion.IX: A protected dataplane operating system for highthroughput and low latency. In , pages 49–65, Broomfield, CO, October2014. USENIX Association. [4] B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E.Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Ex-tensibility Safety and Performance in the SPIN Operat-ing System. In
SOSP , 1995.[5] Ashish Bijlani and Umakishore Ramachandran. Exten-sion framework for file systems in user space. In
Pro-ceedings of the 2019 USENIX Conference on UsenixAnnual Technical Conference , USENIX ATC ’19, page121–134, USA, 2019. USENIX Association.[6] Miguel Castro, Manuel Costa, Jean-Philippe Martin,Marcus Peinado, Periklis Akritidis, Austin Donnelly,Paul Barham, and Richard Black. Fast Byte-granularitySoftware Fault Isolation. In
SOSP , 2009.[7] Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chli-pala, M. Frans Kaashoek, and Nickolai Zeldovich. Us-ing Crash Hoare Logic for Certifying the FSCQ FileSystem. In
SOSP , 2015.[8] James C. Corbett, Jeffrey Dean, Michael Epstein,Andrew Fikes, Christopher Frost, JJ Furman, SanjayGhemawat, Andrey Gubarev, Christopher Heiser, Pe-ter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eu-gene Kogan, Hongyi Li, Alexander Lloyd, Sergey Mel-nik, David Mwaura, David Nagle, Sean Quinlan, Ra-jesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szyma-niak, Christopher Taylor, Ruth Wang, and Dale Wood-ford. Spanner: Google’s globally-distributed database.In , pages 261–264,Hollywood, CA, 2012. USENIX Association.[9] Cody Cutler, M. Frans Kaashoek, and Robert T. Morris.The benefits and costs of writing a POSIX kernel in ahigh-level language. In
OSDI , 2018.[10] Michael Dalton, David Schultz, Jacob Adriaens, AhsanArefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein,Enrique Cauich Zermeno, Erik Rubow, James Alexan-der Docauer, Jesse Alpert, Jing Ai, Jon Olson, KevinDeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis,Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krish-nan, Subbaiah Venkata, Yossi Richter, Uday Naik, andAmin Vahdat. Andromeda: Performance, isolation, andvelocity at scale in cloud network virtualization. In , pages 373–387,Renton, WA, April 2018. USENIX Association.[11] Docker. , 2018.[12] D. R. Engler, M. F. Kaashoek, and J. O’Toole, Jr.Exokernel: An Operating System Architecture forApplication-level Resource Management. In
SOSP ,1995.1413] Daniel Firestone, Andrew Putnam, Sambhrama Mund-kur, Derek Chiou, Alireza Dabagh, Mike Andrewartha,Hari Angepat, Vivek Bhanu, Adrian Caulfield, EricChung, Harish Kumar Chandrappa, Somesh Chatur-mohta, Matt Humphrey, Jack Lavier, Norman Lam,Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, GauthamPopuri, Shachar Raindel, Tejas Sapre, Mark Shaw,Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava,Anshuman Verma, Qasim Zuhair, Deepak Bansal, DougBurger, Kushagra Vaid, David A. Maltz, and AlbertGreenberg. Azure accelerated networking: Smartnicsin the public cloud. In , pages 51–66, Renton, WA, April 2018. USENIXAssociation.[14] Filesystem in Userspace. https://github.com/libfuse/libfuse , 2018.[15] Rust FUSE. https://github.com/zargony/fuse-rs .[16] Galen C. Hunt and James R. Larus. Singularity: Re-thinking the Software Stack.
SIGOPS OSR , 2007.[17] Youngjin Kwon, Henrique Fingler, Tyler Hunt, SimonPeter, Emmett Witchel, and Thomas E. Anderson.Strata: A cross media file system. In
Proceedings ofthe 26th Symposium on Operating Systems Principles,Shanghai, China, October 28-31, 2017 , pages 460–477.ACM, 2017.[18] Zhuohua Li, Jincheng Wang, Mingshen Sun, andJohn C.S. Lui. Securing the device drivers of your em-bedded systems: Framework and prototype. In
Proceed-ings of the 14th International Conference on Availabil-ity, Reliability and Security , ARES ’19, New York, NY,USA, 2019. Association for Computing Machinery.[19] Linux-kernel-module-rust. https://github.com/fishinabarrel/linux-kernel-module-rust .[20] Yandong Mao, Haogang Chen, Dong Zhou, Xi Wang,Nickolai Zeldovich, and M. Frans Kaashoek. SoftwareFault Isolation with API Integrity and Multi-principalModules. In
SOSP , 2011.[21] Ilias Marinos, Robert N.M. Watson, and Mark Hand-ley. Network stack specialization for performance.
SIG-COMM Comput. Commun. Rev. , 44(4):175–186, Au-gust 2014.[22] Michael Marty, Marc de Kruijf, Jacob Adriaens,Christopher Alfeld, Sean Bauer, Carlo Contavalli,Michael Dalton, Nandita Dukkipati, William C. Evans,Steve Gribble, and et al. Snap: A microkernel approachto host networking. In
Proceedings of the 27th ACMSymposium on Operating Systems Principles , SOSP ’19, page 399–413, New York, NY, USA, 2019. Asso-ciation for Computing Machinery.[23] Steven McCanne and Jacobson Van. The BSD PacketFilter: A New Architecture for User-level Packet Cap-ture. In
Winter USENIX , 1993.[24] Luke Nelson, Helgi Sigurbjarnarson, Kaiyuan Zhang,Dylan Johnson, James Bornholt, Emina Torlak, andXi Wang. Hyperkernel: Push-Button Verification of anOS Kernel. In
SOSP , 2017.[25] Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports,Doug Woos, Arvind Krishnamurthy, Thomas Anderson,and Timothy Roscoe. Arrakis: The operating system isthe control plane. In ,pages 1–16, Broomfield, CO, October 2014. USENIXAssociation.[26] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jack-son, Andy Zhou, Jarno Rajahalme, Jesse Gross, AlexWang, Jonathan Stringer, Pravin Shelar, Keith Amidon,and Martín Casado. The Design and Implementation ofOpen vSwitch. NSDI, 2015.[27] Redox. , 2018.[28] Chuck Rossi. Rapid release at massive scale.[29] Helgi Sigurbjarnarson, James Bornholt, Emina Torlak,and Xi Wang. Push-button Verification of File Systemsvia Crash Refinement. In
OSDI , 2016.[30] William Tu, Joe Stringer, Yifeng Sun, and Wei Yi-Hung.Bringing the Power of eBPF to Open vSwitch. In
LinuxPlumbers Conference , 2018.[31] Amin Vahdat and Thomas E. Anderson. Transparentresult caching. In . USENIX Association, 1998.[32] Bharath Kumar Reddy Vangoor, Vasily Tarasov, andErez Zadok. To fuse or not to fuse: Performance ofuser-space file systems. In
Proceedings of the 15thUsenix Conference on File and Storage Technologies ,FAST’17, page 59–72, USA, 2017. USENIX Associa-tion.[33] Robert Wahbe, Steven Lucco, Thomas E. Anderson,and Susan L. Graham. Efficient Software-based FaultIsolation. In
SOSP , 1993.[34] xv6 OS. https://github.com/mit-pdos/xv6-public .[35] Gerd Zellweger, Simon Gerber, Kornilios Kourtis, andTimothy Roscoe. Decoupling cores, kernels, and oper-ating systems. In Jason Flinn and Hank Levy, editors,15 , pages 17–31. USENIX Association,2014.[36] Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, and Anirudh Badam. I’m not dead yet!: Therole of the operating system in a kernel-bypass era. In