[PDF] High Velocity Kernel File Systems with Bento

Abstract

High development velocity is critical for modern systems. This is especially true for Linux file systems which are seeing increased pressure from new storage devices and new demands on storage systems. However, high velocity Linux kernel development is challenging due to the ease of introducing bugs, the difficulty of testing and debugging, and the lack of support for redeployment without service disruption. Existing approaches to high-velocity development of file systems for Linux have major downsides, such as the high performance penalty for FUSE file systems, slowing the deployment cycle for new file system functionality. We propose Bento, a framework for high velocity development of Linux kernel file systems. It enables file systems written in safe Rust to be installed in the Linux kernel, with errors largely sandboxed to the file system. Bento file systems can be replaced with no disruption to running applications, allowing daily or weekly upgrades in a cloud server setting. Bento also supports userspace debugging. We implement a simple file system using Bento and show that it performs similarly to VFS-native ext4 on a variety of benchmarks and outperforms a FUSE version by 7x on 'git clone'. We also show that we can dynamically add file provenance tracking to a running kernel file system with only 15ms of service interruption.

Full PDF

aa r X i v : . [ c s . O S ] M a y High Velocity Kernel File Systems with Bento

Samantha Miller Kaiyuan Zhang Danyang Zhuo † Thomas AndersonUniversity of Washington † Duke University

Abstract

High development velocity is critical for modern cloud sys-tems. However, rapid development and release cycles havemostly skipped operating systems. Modiﬁcations to behaviorin Linux, the most widely used server operating system in thecloud, must be done slowly to minimize risk of introducingbugs, be limited in scope, or be implemented in userspacewith a potential performance penalty.We propose Bento, a framework for high velocity develop-ment of Linux kernel ﬁle systems. Bento is inspired by therecent availability of type-safe, non-garbage collected lan-guages like Rust. It interposes a thin layer between kernelcalls to the ﬁle system and ﬁle system calls back to the ker-nel, exposing alternative interfaces to enable kernel ﬁle sys-tems written in safe Rust. Future work will provide supportfor online upgrades, userspace debugging, and composableﬁle systems. We evaluate Bento by using it to implement thexv6 ﬁle system and comparing against baselines written us-ing the kernel VFS layer and FUSE. We ﬁnd that the Bentoﬁle system achieves comparable performance to the VFS ver-sion and much better performance than the FUSE version.We also evaluate against ext4 on the macrobenchmarks andﬁnd that ext4 performs between 33% and 3.2 × better thanthe Bento xv6 ﬁle system. High development velocity has become a widespread talis-man for cloud software development [28]. Many popularcloud systems roll out new software releases on a weekly oreven daily basis, to give users faster access to new features,to gain insight into priorities for further development, andto reduce integration costs. While this design pattern mayseem inappropriate for mission critical software, cloud ven-dors have shown it is practical to use short release cycles formany high reliability services, including databases [8] andnetwork stacks [10, 13, 22].Rapid release cycles have largely skipped the operating system kernel development community, however. Linux isthe most widely used server operating system for the cloud,but new versions drop only every few months, with majorchanges limited to once every few years. Of course, Linux isopen source, and so anyone is free to iterate more rapidly, atthe cost of the later pain of reintegration with the mainlinedevelopment tree.The Linux community has adopted several approaches toimproving feature velocity, none entirely successful. One ap-proach is to try to future-proof the kernel by adding featuresbefore they are needed. We can see that in action with thepopular Docker container manager [11]. Docker leveragesseveral recently-added Linux kernel features, but in the pro-cess exposed a number of potentially critical ﬂaws in thosekernel services that could compromise the security of the en-tire operating system. Alternately, we can move kernel ser-vices to user level, such as with the FUSE ﬁle system ab-straction [14] and Open vSwitch [26]. However, these canimpose a prohibitively high performance penalty [1, 32], ne-cessitating a kernel caching layer [5] that poses its own setof tradeoffs. Certain Linux kernel interfaces can be rapidlyreconﬁgured with eBPF [30] scripts, but only for small lin-ear snippets of code. The widespread belief that in the fu-ture, all high performance operations must “bypass the ker-nel” is an illustration of how operating systems are losingthe race [3, 25, 36].Our goal is to enable high velocity development for high-performance, general-purpose operating system kernel exten-sions. Our trust model is that of a slightly harried kernel de-veloper, rather than an untrusted application developer. Wewant to provide a way for kernel developers to add kernel fea-tures in a manner that isolates bugs to within the extensionand also allows for dynamic replacement of that function-ality without the need to restart applications [35]. To be aswidely applicable as possible, we focus on enabling rapid de-velopment of Linux kernel code, rather than to assume a newcode base designed for extensibility, such as Exokernels [12],Spin [4], or Barrelﬁsh [35]. To be concrete, we restrict our-selves, at ﬁrst, to ﬁle system extensibility. We leave dynamic1eplacement of ﬁle systems to future work.Our approach is inspired by the recent availability of type-safe, non-garbage collected, performant languages like Rust.Writing kernel extensions in Rust eliminates a class of cross-module bugs that could compromise kernel security, withoutthe performance overhead of running at user-level or the re-strictions on extension behavior imposed by eBPF. However,supporting compatibility with existing operating systems andfeatures for high development velocity such as dynamic mod-ule replacement, debugging, and code reuse, is challenging.For this work, we focus on high velocity kernel ﬁle sys-tems. We have built a framework, called Bento, for inject-ing general-purpose ﬁle systems and ﬁle systems extensions,written in Rust, into Linux. Surprisingly, Linux’s existingpluggable ﬁle system interface, VFS, is poorly suited to ourneeds, as it assumes shared data structures can pass freelyacross the extension interface, complicating compile-timetype checking. Instead, Bento interposes a thin layer for callsinto the ﬁle system, and calls from the ﬁle system backinto the kernel, providing safety, high performance, general-ity, and compatibility with Linux. While our architecture isdesigned to be compatible with graceful online upgrades ofrunning ﬁle systems along with support for other features forhigh development velocity, we leave that for future work.We have used Bento to implement the xv6 ﬁle system torun in the Linux kernel. We have additionally implementedbaseline versions, one written in C against the VFS layerand one (written in Rust) running in userspace using FUSE.We found that our framework has performance very similarto, and sometimes better than, the VFS C version while theFUSE version performed much worse than both.In this paper, we make the following contributions: • We design and implement Bento, a framework that en-ables high-velocity development of safe, performant ﬁlesystems in Linux. • We present techniques for allowing safe Rust code torun in the Linux kernel and access kernel functionality. • We implement a ﬁle system using Bento and evaluateits performance characteristics.

One of the existing barriers to fast evolution in Linux comesfrom buggy code. New code often introduces bugs, disincen-tivizing fast evolution for mission-critical pieces of code likeoperating systems. Kernel code is particularly affected bythis because kernel bugs are often difﬁcult to ﬁnd and canhave severe non-local consequences. In particular, memorybugs, such as memory reuse and dangling pointers, can have Bug Number Effect on KernelUse Before Allocate 6 Likely oops

Double Free 4 Undeﬁned

NULL

Dereference 5 oops

Use After Free 3 Likely oops

Over Allocation 1 OverutilizationOut of Bounds 4 Likely oops

Dangling Pointer 1 Likely oops

Missing Free 18 Memory LeakReference Count Leak 7 Memory LeakOther Memory 1 VariableDeadlock 5 DeadlockRace Condition 5 VariableOther Concurrency 1 VariableUnchecked Error Value 5 VariableOther Type Error 8 VariableTable 1: Count of analyzed bugs with effects of each bug,categorized as memory, concurrency, or type.catastrophic consequences on the reliability of the system,potentially even leading to security violations.To understand the properties of bugs in existing Linux ker-nel extensions, we analyzed bug reports for three extensionsused by Docker: AppArmor for security, Open vSwitch Dat-apath for networking, and Overlay FS for ﬁle system support.We analyzed all bug-ﬁx git commits from 2014-2018 and cat-egorized them by the type of bug that was ﬁxed.Our analysis focused on what we call low-level bugs: bugsthat are unrelated to the speciﬁc logic of the extensions.These bugs can be caught without knowing speciﬁc correct-ness properties needed by the extension. This is opposedto semantic bugs which are caused by violations of high-level correctness properties. Low level bugs made up 50%of the total bugs. We divided the low-level bugs into threecategories: memory bugs, concurrency bugs, and type errors.Memory bugs refer to incorrect usage of memory, including

NULL pointer dereferences, out-of-bounds errors, and mem-ory leaks. Concurrency bugs are caused by incorrect concur-rency patterns, such as deadlocks and race conditions. Typeerrors are caused by incorrect usage of kernel types, most of-ten by interpreting error values as valid data.The results of the analysis are shown in Table 1. We foundthat 68% of these bugs were memory bugs. Of the memorybugs, 50% were a type of memory leak. Many of the bugs oc-curred along error handling pathways, often due to incorrectchecking of returned values (unchecked error values) or miss-ing cleanup (memory leaks,

NULL pointer dereferences, etc.).Based on our analysis of these low-level bugs, 93% wouldbe prevented by using Rust. The remaining 7% of low-levelbugs were primarily deadlocks.Many of the bugs could have serious impacts on the in-tegrity of the system. Of the identiﬁed low-level bugs, 26%2afety Performance Generality Online UpgradeVFS ✗ ✓ ✓ ✗

FUSE ✓ ✗ ✓ ✗ eBPF ✓ ✓ ✗ ✗

Bento ✓ ✓ ✓ tbdTable 2: A comparison of Linux ﬁle system extensibilitymechanisms. None of Linux’s existing mechanisms provideall the desired features.of the bugs caused a kernel oops which either kills the of-fending process or panics the kernel. An additional 34% ofthe analyzed bugs bugs would result in a memory leak, poten-tially leading the system to run out of memory and openingup the system to DoS attacks.

Linux has several existing techniques to support rapid evo-lution of ﬁle system functionality. These include the Vir-tual File System (or VFS) layer built into Linux, FUSE foruserspace ﬁle systems, and eBPF for running small portionsof a user space code safely in the kernel. However, none ofthese approaches provide all of the properties we need forhigh velocity development. A summary is shown in Table 2,and details are discussed below. Note that compatibility withexisting Linux code is implicit in all of these approaches andin Bento.

VFS:

Linux provides a mechanism for adding new ﬁle sys-tems called the Virtual File System (or VFS) layer. This layerdeﬁnes a set of function pointers to be implemented by newﬁle system modules and calls these functions inside relatedsystem calls. It is used by all major ﬁle systems in Linux.This interface prioritizes generality and performance, al-lowing ﬁle systems maximum ﬂexibility when interactingwith core kernel components. The resulting interface is com-plex and has few guardrails, making it difﬁcult for develop-ers to implement new functionality without introducing bugs.While a new ﬁle system can be loaded dynamically, an exist-ing ﬁle system cannot be modiﬁed except by mount/unmountand quiescing application use of the ﬁle system. Likewise,debugging support is limited.

FUSE:

Filesystem in Userspace, or FUSE [14], enablesrunning ﬁle system code in userspace, via a small kernelVFS layer that forwards operations to the userspace imple-mentation. Thus, FUSE is able to achieve safety and general-ity, along with the ability to use normal user-level debuggers.This comes at a cost, however. All ﬁle system operations passthrough VFS and the FUSE kernel driver before being pack-aged up and copied to userspace, reducing performance byup to 83% [32]. Despite this slowdown, FUSE is frequently used for prototyping new ﬁle systems, especially in circum-stances where performance is not critical. FUSE does notprovide a mechanism for transparent online modiﬁcation ofrunning ﬁle systems, although such a system could theoreti-cally be implemented at user level. eBPF:

Another approach to safe extensibility in Linux isthe eBPF (extended Berkeley Packed Filter) [23], an in-kernel virtual machine that allows short extensions with lim-ited control ﬂow and written in a restricted language to berun at predeﬁned points in the kernel. While the main-lineLinux kernel doesn’t support eBPF for ﬁle systems, a project(ExtFUSE [5]) has provided support for parts of a FUSE ﬁlesystem to be run in the kernel using eBPF. For kernel codethat can ﬁt within thee eBPF model, this provides safe exten-sibility without signiﬁcant performance overhead. However,the restrictions placed on eBPF extensions make it very difﬁ-cult to implement whole ﬁle systems or even signiﬁcant ﬁlesystem extensions using eBPF. ExtFUSE does not supportdynamic reconﬁguration.

The goal of Bento is to provide for high-velocity develop-ment of Linux ﬁle systems. To make our design goals con-crete, consider the OverlayFS extension to Linux used byDocker. OverlayFS allows for the name space of a ﬁle sys-tem to be layered on top of another, allowing containers tobe conﬁgured with a base ﬁle system plus changes. Or con-sider improving the support for non-volatile memory (NVM)to Linux. Systems such as Strata [17] have shown that pre-pending an operation log stored in NVM can dramaticallyimprove write performance while reducing vulnerability toapplication-level bugs. These operation logs can be repli-cated for high availability [2].Finally, consider what would be needed to add data prove-nance to Linux - the ability to track all of the data sources andexecutable images that could have affected a particular out-put ﬁle [31]. If a data source becomes invalid (e.g., becauseof a change to sensor calibration), provenance can be used totrack down what derived data needs to be regenerated. Fur-ther, old versions of data ﬁles may need to be retained (andlater garbage collected) if they are part of the provenance oflive output ﬁles.In all three cases, the functionality needs to work with ex-isting, unmodiﬁed Linux binaries, has complex internal logicand data structures, is performance-sensitive, beneﬁts fromongoing development, and to be deployable, must not com-promise the security of the rest of the operating system. Weassume the developer is well-intentioned but a bit clumsy - itis not our intent to prevent malicious insider attacks for newlydeveloped code.3hus, our framework must support several, seemingly con-ﬂicting, goals: • Safety:

Any bugs in a newly installed ﬁle system shouldbe limited, as much as possible, to applications or con-tainers that use that ﬁle system. These bugs should bekept to a minimum. • Performance:

Performance should be similar to thatachievable by the same functionality implemented di-rectly in the kernel. • Generality:

There is a large variety of ﬁle system de-signs that developers might want to implement. Theframework should not limit the types of ﬁle systems thatcan be developed. • Compatibility:

New functionality should be deploy-able to existing, unmodiﬁed Linux binaries without re-compiling or relinking, and without substantial changesto Linux’s internal architecture. • Development velocity:

The framework should supportdynamic upgrades to running ﬁle system code, transpar-ently to applications, except for a small delay. Further,code should be easily migratable between user level andthe kernel, to enable use of modern debugging and soft-ware analysis tools. This last goal is supported architec-turally by our approach, but experimental demonstrationis beyond the scope of this paper.Our high level approach for Bento is to enable writing ﬁlesystems in a safe, non-garbage collected language, speciﬁ-cally Rust. This is able to provide the ﬁrst three goals detailedabove. Rust’s strict type system is able to provide safety,eliminating certain classes of bugs such as

NULL pointer def-erences or use-after-free bugs. Since Rust is compiled like Cand does not use garbage collection, it has performance simi-lar to C and does not suffer from performance unpredictabil-ity caused by garbage collectors. Rust is a general purposeprogramming language and provides the necessary general-ity to enabling writing a wide variety of ﬁle systems.To realize this approach, we need to address several chal-lenges. Compatibility with existing operating systems andonline upgrades, the other two goals for this work, are not in-herently provided by writing ﬁle systems in Rust. Bento mustprovide additional support in order to achieve these proper-ties. However, challenges arise when trying to provide thatsupport.

In order for a Rust ﬁle system to execute in the Linux ker-nel, there must be a way for the Rust ﬁle system to inter-act with the C kernel. A naive approach is just compilingthe Rust ﬁle systems into a binary format and load it into the kernel. Rust is designed to interface with code written inother languages, particularly C, easily using its Foreign Func-tion Interface. Rust code can call functions written in C andvice versa, and Rust data structures can be tagged so theyuse C-style memory layout. In fact, without considering anyother factors, running Rust code in the Linux kernel is fairlystraightforward.However, this naive approach does not maintain the safetyof the Rust ﬁle systems. Rust code that calls external func-tions or dereferences raw pointers must be tagged as unsafe.Rust’s type system is not able to provide the same guaranteesabout unsafe code, e.g.

NULL pointer dereferences and out-of-bounds accesses are possible, so unsafe code cannot providethe safety we require for Bento. Simple techniques for intro-ducing safety, such as wrapping C functions in safe wrappersor replacing pointers with references, are not enough to fullyprovide safety due to fundamental challenges caused by ker-nel design patterns, which we now describe. We assume thatthe kernel is correct.

One challenge is caused by memory management for datastructures passed across the boundary between the ﬁle sys-tem and the kernel. Rust is able to provide memory safetyand automatic memory management by doing compile-timetracking of data structures. However, the VFS interface re-quires that some data structures created by the ﬁle system bepassed across the kernel/ﬁle system boundary and back again.Since the Rust compiler is not able to analyze the code out-side the ﬁle system, it is not able to verify the safety of tak-ing ownership of data structures from the kernel. Therefore,the VFS ﬁle system interface cannot be implemented in safeRust.

Another challenge stems from the ﬁle system’s need to ac-cess services provided by the kernel. However, the interfacesexposed by kernel services are not designed for Rust’s safetyguarantees, so kernel services cannot necessarily be exposedsafely to Rust ﬁle systems without modiﬁcations. To allowthe ﬁle system to use kernel services safely, Bento must trans-late the unsafe kernel-provided interfaces into interfaces thatcan be used by the ﬁle systems safely.

Online upgrades, updating a ﬁle system without bringing itofﬂine, also is not provided by writing the ﬁle systems in asafe language. In Linux today, ﬁle system module upgradesis done by shutting down all services relying on the ﬁle sys-tem, unmounting the ﬁle system, removing the module, in-serting the new module, mounting the new ﬁle system, and4hen restarting all services. In order to support online up-grades, additional functionality must be added to enable up-dating to a new version of the ﬁle system without requiringthe ﬁle system or services running on top of it to be shutdown. Trying to implement that functionality in Linux givesrise to the following challenges.

The memory management pattern described in §3.1.1, wheredata structures created by the ﬁle system are passed to the ker-nel, also introduces challenges for online upgrades. Since thekernel holds data structures backed by ﬁle system memory,and the ﬁle system has no way to control when that memoryshould be reclaimed. If the ﬁle system were updated whenthere were outstanding data structures held by the kernel,those kernel pointers would become invalid. To avoid thiscase, the ﬁle system must wait for the kernel to have com-pleted all operations on the ﬁle system and have returned allshared-ownership data structures to the ﬁle system. There isno guarantee of this happening until the ﬁle system is un-mounted, so upgrades cannot be done online.

Another challenge is caused by the need to track data struc-tures that the ﬁle system is currently using, both data struc-tures from kernel services and in-memory data used by theﬁle system. For example, a running ﬁle system will executeblock I/O or possibly network operations and may be usingkernel data structures for those operations when the upgradeoccurs. The ﬁle system could also have internal, in-memorystate such as which blocks need to be written to a commitlog or a cache of on-disk data structures. If the ﬁle systemupdates without transferring any of its in-use data structures,potentially bad behavior can occur. In the best case, cachesof on-disk data structures need to be rebuilt, and performancetemporarily suffers. In the worst case, correctness conditionscould be violated if the ﬁle system requires long-lived state.Since the existing techniques for upgrades in Linux assumethat the ﬁle system will be completely shut down during theupgrade, there are no mechanisms to transfer data structures.

The ability to quickly and effectively debug code is criticalfor fast development in practice. Kernel code is notoriouslydifﬁcult to debug because of the often non-local effects ofkernel bugs and the potential for a buggy operating systemto interfere with the process of debugging. In order to enableeffective debugging, we propose allowing ﬁle systems writ-ten using Bento to be run in userspace without requiring codemodiﬁcations.

To support running the same code in the kernel and inuserspace, we must provide an API that can be implementedin both. All APIs, both for Bento to call ﬁle system functionsand for the ﬁle system to access necessary services, must bethe same in both the kernel and userspace. Providing compat-ibility with Linux will not necessarily provide this becausethe interfaces provided by kernel services may not be com-patible with the system call interface.

The ability to reuse code is also important for developmentvelocity. This is particularly relevant for ﬁle systems becausethere are many circumstances when a user would want tomodify the behavior of an underlying ﬁle system, such asenabling encryption or tracking data provenance. In Linuxtoday, developers can implement these types of ﬁle systemsby stacking layers of ﬁle systems (e.g., the ecryptfs ﬁle sys-tem can be layered on top of another ﬁle system to add en-cryption). The higher layer ﬁle systems call top-level VFSfunctions to access the lower ﬁle systems as if the relevantsystem call had been executed. This support for stackable,or composable, ﬁle systems allows developers to provide ser-vices as ﬁle system modules that can be used with any exist-ing ﬁle system.

Linux’s existing model for composable ﬁle systems can besupported by exposing the top-level VFS functions to Bentoﬁle systems. However, it is not clear that this is the best so-lution. Calling top-level VFS functions can add overhead toeach call to a lower ﬁle system, resulting in potentially largeoverhead if several ﬁle systems are layered on top of one an-other. Bento may be able to provide a different interface forsupporting composable ﬁle systems that does not introducethis overhead but still provides the necessary ﬂexibility.

The design of Bento is shown in Figure 1. Shaded portionsare the framework. The framework runs as a thin layer thatsits between the unmodiﬁed Linux kernel and kernel-levelﬁle systems designed for our framework. The Linux kernelis unmodiﬁed other than the introduction of Bento. Like theVFS layer, Bento deﬁnes a set of function calls that ﬁle sys-tems must implement and provides a mechanism for ﬁle sys-tems to register themselves with the framework by providingthe necessary function pointers. Unlike the VFS layer, Bentois designed to support ﬁle systems written in Rust, a type-safelanguage that provides memory safety and data race freedom.5 hallenge Solution Problem Description Detailed Solution

Unsafe Shared Memory Management Restricted Memory Sharing §3.1.1, §3.2.1 §4.3Unsafe Kernel Interfaces Safe Abstractions Around Kernel Services §3.1.2 §4.5Transferring Objects During Upgrade Online Upgrade Component §3.2.2 §4.8Table 3: Summary of Challenges and the Associated SolutionsVFSBentoFSFile SystemBentoKSKernel Services ① File Operations API ② Kernel Services API (a) Kernel Bento

PosixBentoFS-UserFile SystemBentoKS-Userclib (b) UserspaceBento for Debug-ging

Figure 1: The design of BentoTable 3 shows a summary of the challenges and solutionsin Bento. Bento currently consists of two components. Onecomponent of the framework interposes between the VFScalls and the ﬁle system, handling calls into the ﬁle system.This component provides the ﬁle operations API, translatingfrom the VFS interface. The other component interposes be-low the ﬁle system, handling calls out of the ﬁle system intothe kernel. This component provides wrappers around kerneldata structures and functions, allowing the Rust ﬁle system tosafely access relevant kernel functionality. For ﬁle systems,this primarily handles block I/O.

To write a ﬁle system using Bento, developers write a safeRust kernel module using the provided APIs and insert thatmodule into their running Linux kernel like any other kernelmodule. File system functions are exposed to the operatingsystem by implementing the ﬁle operations API and provid-ing those function pointers to Bento when the ﬁle system isinserted. When ﬁle system functions need to access kernelfunctionality, they can do so by calling the safe Rust func-tions provided by the kernel services API.

The VFS layer and the patterns it introduces cause fundamen-tal challenges to safety when handling memory managementof shared data structures, in particular inodes. Inodes are allo-cated and destroyed using functions implemented in the ﬁlesystem and called by the kernel. When the kernel needs anew inode, it requests one from the ﬁle system which allo-cates the inode using its own memory pool. When the ker-nel is ﬁnished with the inode, it returns the inode to the ﬁlesystem so the memory can be reclaimed. Giving ownershipfrom Rust to C can be implemented in Rust by leaking thememory behind the data structure; this is safe because leak-ing memory does not violate Rust’s memory safety, but isnot ideal. Taking ownership from C to Rust cannot be im-plemented safely. Rust must trust that the data structure willnot be used anymore and was originally allocated by Rust.Since these properties cannot be validated by the Rust com-piler, this is inherently unsafe.In order to enable safe ﬁle systems, Bento must provide adifferent interface than the VFS layer for ﬁle system opera-tions. Calls from the VFS layer are intercepted by BentoFSand translated into this new interface, shown in Figure 1 at ① .This interface calls from BentoFS to the ﬁle system, so theinterface must be designed so it can be implemented safely.To support this, we deﬁne a model that our interface mustfollow. Our interface follows what we call an “ownership model”,borrowing the terminology from Rust. In this model, own-ership of an object can never be passed across the interface,but objects can be “borrowed”. For each object, one side ofthe interface is responsible for both the lifetime management(tracking when the object is no longer needed) and memorymanagement. To share an object, the caller passes a referenceto the object to the callee. This does not pass ownership (thecallee has no control over the underlying memory) but doesallow the callee to access the object. This is analogous to aborrow in Rust and similarly can be mutable or immutable, al-lowing modiﬁcation of the object or not, respectively. To theﬁle system developer, this is just writing typical Rust code.This model implies a contract between the caller and thecallee. The caller is responsible for ensuring that the object isnot freed while it has been borrowed, that the object is valid,and that only one mutable borrow exists at one time. The6allee is responsible for only accessing an object during theborrow window, accessing objects correctly (i.e., no pointerarithmetic), and only mutating objects during a mutable bor-row.In this case, the callee is the ﬁle system, written in Rust.All of the callee’s responsibilities are checked by the Rustcompiled when using safe Rust, so the ﬁle system is guaran-teed to uphold the model. Our framework is the caller andmust be carefully designed to fulﬁll its side of the contract.This ownership model can be viewed as a relaxed versionof what is needed across address space boundaries where nomemory can be shared. This observation led us to leveragethe FUSE kernel module and the FUSE low-level API whendeveloping BentoFS and the ﬁle operations API. The ﬁle op-erations API is a Rust version of FUSE low-level API aug-mented with a reference to the super_block data structureneeded for ﬁle system block operations.This model should not introduce signiﬁcant performanceoverhead. This loan/borrow model is only used to check com-pile time properties, so does not add performance overheadat runtime. The performance impact of the interface changeis more difﬁcult to predict, but should still be low. The designinterface does not increase the functionality needed to imple-ment a ﬁle system, it just splits the behavior implemented bya VFS ﬁle system between BentoFS and the ﬁle system.

File systems need access to kernel functionality implementedoutside the ﬁle system, such as block I/O for access to theunderlying storage device. These kernel interfaces, like thosein the VFS layer, are not designed to abide by type-safetyproperties and so cannot be directly used in the ﬁle system.In order to enable use of necessary kernel services, BentoKSprovides safe abstractions around kernel data structures andfunctions.As an example, we will focus on the kernel block I/O func-tions. File systems in Linux access block devices using thebuffer cache. In this API, a ﬁle system that needs to read orwrite to a block device calls sb_bread , passing in a pointerto the super_block data structure and a block number. Thisfunction returns a buffer_head data structure representingthe requested block. The block’s data is represented as apointer and size in the buffer_head and the ﬁle system canread and/or write to this memory region. When the ﬁle sys-tem is done using the buffer_head , it must call brelse orbuffers can be leaked.The widespread use of pointers and pointer manipulationin the Linux kernel make this challenging. Safe Rust disal-lows dereferencing raw pointers because the compiler cannotcheck the validity of the memory being pointed to. Rust in-stead relies on typed references that cannot be offset, cast tononequivalent types, or

NULL safely. However, many kernelinterfaces rely on pointers, so these interfaces cannot be used by the ﬁle system safely.

In order to access kernel functionality, the ﬁle system mustbe able to use kernel data structures, both for calling kernelfunctions and for making use of objects provided to the ﬁlesystem by the ﬁle operations API described above. The ker-nel operates on pointers, but directly exposing these pointersto the ﬁle system results in safety errors. If the block I/Ofunctions exposed to the ﬁle system accept a pointer to thesuperblock, no guarantees can be made about the memorylayout underlying that pointer.We use a capability-based model to safely expose ker-nel pointers to the ﬁle system where pointers are replacedby capability-style types deﬁned in Bento. These types givethe ﬁle system the right to access to the ﬁelds of the datastructure and to call functions that are exposed by that type.Creation of these capability-types is limited; they cannot besafely cast from other types, and initialization is predeﬁnedand sometimes entirely disallowed. Bento converts betweenthe capability type and the analogous kernel type. For exam-ple, the ﬁle system often receives the

SuperBlock capabil-ity type from the ﬁle operations API to represent the kernel super_block data structure. It can use the

SuperBlock ca-pability type to read ﬁelds of the kernel super_block andcall kernel functions like sb_bread for block I/O that requirea kernel super_block . The

SuperBlock type cannot be cre-ated by the ﬁle system, so having this type is proof that theﬁle system has access to a valid kernel super_block . Bentocan then safely convert the capability type to a pointer anddirectly access kernel functions.The capability types are compile-time wrappers aroundpointers so the Rust compiler can enforce correctness prop-erties at compile time. It is assumed that the kernel passes invalid pointers, so no properties need to be checked at runtimeand no runtime overhead is added.

Bento must also provide wrapping abstractions around ker-nel services so they can be used safely by the ﬁle system.To enable ﬁle systems written in safe Rust, Bento must pro-vide safe abstractions wrapping kernel services. These ab-stractions can be used by the ﬁle system like any other Rustdata structures and functions.To be concrete, we address the example discussedabove. We provide a safe abstraction to wrap the kernel buffer_head . We implement a method on the

BufferHead wrapper to convert the separate pointer and size ﬁelds for thecontained memory region into a sized memory region thatcan be used safely. That method must use unsafe code tomake a sized memory region out of the unsized pointer andsize ﬁelds, but the ﬁle system can call the method safely. To7revent accidental memory leaks, we call the brelse func-tion in the drop method of the

BufferHead wrapper, whichis called when the wrapper goes out of scope. With this,buffer management has the same properties as memory man-agement in Rust: memory leaks are possible but difﬁcult.These abstractions can, in some cases, add a small amountof performance overhead. If a kernel function has require-ments on its arguments, the wrapping method will mostlikely need to perform a runtime check to ensure that therequirements are held. This overhead should be small sincechecks are not performed often and are simple.

In order to enable online upgrades, Bento will provide a me-diating layer that maintains any state that needs to be pre-served through the upgrade, such as long-lived kernel datastructures like a network connection for a networked ﬁle sys-tem or internal ﬁle system state like an in-memory cache ofon-disk data structures. Bento is already a runtime in the ker-nel, so it can easily be extended to include the necessary func-tionality.This component will need to have a data structure transfermechanism so important data structures can be passed fromthe old version of the ﬁle system to the new version duringthe upgrade. Kernel data structures can already be trackedby Bento through the kernel services API, and functionalitycan be added to support transferring these data structures.To transfer ﬁle system internal data structures, the onlineupgrade component will extend Bento’s interface with newfunctions for storing in-memory state and initializing fromthat provided state. When the old version of the ﬁle systemis about to be stopped, the online upgrade component willcall the ﬁle system’s provided function. This function willperform any necessary shutdown, such as ﬂushing state, andwill return in-memory state that should be transferred. Thisstate will then be passed to the new version of the ﬁle systemwhen it starts up so it can restore the necessary in-memorystate.

To support easy debugging, Bento will enable developers torun the same code in userspace and in the kernel and so useuserspace debuggers. To enable this, Bento will provide alter-nate implementations of the BentoFS and BentoKS compo-nents that interface with userlevel interfaces, speciﬁcally thePOSIX API instead of VFS and C library functions insteadof kernel services. Since the interfaces exposed by the kerneland by userspace libraries are different, it is not obvious thatthe APIs written for the kernel will be able to be implementedwithout modiﬁcation. We will analyze and implement this aspart of our future work.

Bento is built in Linux kernel version 4.15. It is implementedas a Linux kernel module in 1409 lines of Rust code for Ben-toKS and 7409 lines of C code for BentoFS.

Writing a kernel module in Rust is different than writinguserspace Rust code. The basic structure of our kernel mod-ule is borrowed from tsgates/rust.ko on Github. The kernelmodule is compiled as a static library which is then linkedwith any required C code to generate the kernel module (a.ko ﬁle). This kernel module can then be inserted into thekernel as normal by any sudo user. Kernel code in Rust, likeall kernel code, cannot use the standard library, but the Rustcore library can still be used. We found that we had to ad-ditionally limit the Rust implementation to code that can’tcause a panic.The Rust portions of the Bento kernel module must inter-face with C code. Rust data structures can be tagged with to force the memory layout to match the C lay-out of the same structure, allowing the data structure to bepassed across the language boundary. Rust functions can becalled from C as long as they are tagged with ,preventing the Rust compiler from mangling the name ofthe function. Rust’s FFI (Foregin Function Interface) enablesRust code to seamlessly call functions implemented in C.The Rust code only needs to deﬁne the function interfacein an extern block and the functions will be linked at com-pile time. The Rust bindgen tool can be used to automaticallygenerate these bindings from C header ﬁles.

One of the primary jobs of Bento is to interpose between theVFS layer and the ﬁle system. As part of this translation, theﬁle operations component of Bento must handle the interac-tions with core kernel data structures that are expected of aﬁle system written against the VFS layer.We use the FUSE kernel module and the FUSE low-levelinterface as starting points for BentoFS and the ﬁle opera-tions API. The FUSE kernel module must implement muchof the same functionality as BentoFS, so we use a modiﬁedversion of it to implement BentoFS.Unlike in FUSE, the ﬁle operations layer and the ﬁle sys-tem reside in the same address space and trust domain. Bentocan therefore communicate with the ﬁle system using func-tion calls. Our framework implements this like the VFS layer;function pointers to ﬁle system operations are stored in a datastructure that is provided to Bento when the ﬁle system ismounted and upgraded.8 .3 Implementing Safe Wrappers

The Rust capability types are implemented as a Rust typewith one ﬁeld: a pointer to the relevant kernel type. This en-ables BentoFS to pass a pointer to the kernel data structure tothe ﬁle system functions with no overhead. BentoKS imple-ments methods of these capability types that the ﬁle systemcan use to safely access kernel functionality. These functionscan be called from the Rust ﬁle system on the capability typedata structures even though these were originally allocatedas C data structures.

We evaluate the performance of Bento to determine what,if any, overheads exist to using it. For this, we have imple-mented the ﬁle system from the xv6 teaching operating sys-tem and two variants: one written in C, running in the ker-nel using the VFS layer and one written in Rust, running inuserspace using FUSE. By comparing against the VFS layer,we can determine the overhead Bento introduces. By com-paring against FUSE, we can quantify the beneﬁts of Bentorelative to a purely userlevel ﬁle system.Since xv6 is a toy operating system, it is missing opti-mizations that a commercial-grade optimizations would have.This can heavily impact the FUSE baseline because the un-optimized operations may be particularly expensive fromuserspace. The VFS baseline is also less optimized thanBento because Bento inherits optimizations from the FUSEkernel module while the VFS baseline was just written forthis evaluation. Therefore, the xv6 evaluation could be some-what unfairly optimistic to Bento when compared to the sameevaluation on a commercial-grade ﬁle system. We thereforealso compare against ext4 on the macrobenchmarks. Ext4 ismore optimized than the xv6 ﬁle system, but the performanceresults can still be compared to understand ballpark perfor-mance differences. Relatively small differences can indicatethat our results may be similar to those we would achieveon a commercial-grade ﬁle system. We mount ext4 with the data=journal option so it logs ﬁle data in the journal likethe xv6 ﬁle system.

In order to write the xv6 ﬁle system in Rust and run thebenchmarks, some changes needed to be made to the ﬁle sys-tem design. In all versions of the ﬁle system, we needed toadd locks around inode and block number allocations due torace conditions on the block device. We also added doubleindirect blocks to all three versions of the ﬁle system so ﬁlesup to 4GB could be created. In general, the Rust versionsinclude more locks than the C version and ofﬁcial xv6 repos-itory [34], speciﬁcally on global mutable variables that are only modiﬁed during initialization. Otherwise, the Rust ﬁlesystems are nearly identical to the C ﬁle systems.

As a baseline, we implement a VFS ﬁle system written inC. It is implemented in 1862 lines of C code. This ﬁlesys-tem is as close to our framework’s version as possible to en-able accurate comparison between the two approaches. Thisbaseline allows us to analyze any overhead that Bento mayintroduce over the VFS layer.The other baseline is a userlevel version using FUSE. Thisis 1744 lines of Rust and uses a Rust reimplementation ofthe FUSE userspace library [15] with minor changes suchas enabling the writeback cache. The code for this versionis nearly identical to the code written using our framework.Minor changes to the code are needed to swap out kernelservices for Rust user-level services, such as using the Ruststandard library mutex instead of the kernel semaphore. Ad-ditionally, block I/O from userspace is done by opening theLinux disk ﬁle using the

O_DIRECT ﬂag. We note future workwill be able to run the same code in Bento and at userlevel.

The benchmarks were run on a machine with 8 × Intel Corei7 CPU, 31 GiB DDR4 RAM, and a 512GB Samsung PM981NVMe SSD. All benchmarks were run using the SSD asthe backing device. Due to the present health concerns, thebenchmarks were run through a virtual machine using PCIepassthrough to access the SSD.

From ﬁlebench, we run the single-threaded and 32-threadedread, write, ﬁle creation and ﬁle deletion microbenchmarksand the varmail and ﬁleserver macrobenchmarks. To this, wealso measure untaring the Linux kernel. These benchmarkswere run on an NVME SSD and executed for one minutein all cases. The varmail macrobenchmark simulates a mailserver. It repeatedly generates ﬁle creates, ﬁle deletes, ﬁlereads and writes, and appends to an operation log, syncingafter writing. The ﬁleserver macrobenchmark simulates a ﬁleserving application.We see signiﬁcant slowdown because of slow block I/Ofrom userspace, even through the

O_DIRECT ﬁle interface.Each block operation from userspace must pass across theuser/kernel boundary and through the VFS layer beforereaching the disk, adding 200-400ns to each operation. Ontop of that, the ﬁle interface imposes additional overheads.The ﬁle system must occasionally sync blocks to disk (suchas during log operations), but the ﬁle interface provides noway to sync parts of a ﬁle, so the whole disk ﬁle must be9eq-1t seq-32t rnd-1t rnd-32t0100200300 O p s / s ec ( x1000 ) Bento C-Kernel FUSEFigure 2: Read Performance (4KB), Ops/secsynced every time one block needs to be synced, makingfsyncs very costly.

The performance results for the read microbenchmarks areshown in Figure 2 and Figure 3. All ﬁgures include single-threaded and 32-threaded benchmarks for both sequentialread and random read. Figure 2 shows performance for 4KBreads in operations per second. The other graphs show perfor-mance for 32KB, 128KB, and 1024 KB reads in throughput,measured in MBps.All three versions of the ﬁle system show very similar per-formance results for all sizes of reads for both random andsequential reads. The similarity in performance is due to in-kernel caching and the small size of the ﬁle. All three ver-sions of the ﬁle system use the same technique for cachingread requests, implemented in the ﬁle system in the C-kernelversion, in the ﬁle operations layer in Bento, and in FUSEkernel module for the FUSE version. Since the ﬁle is smalland read requests are fast, the ﬁle is cached very quickly. Af-ter this, all requests hit the same in-kernel cache, and all ver-sions execute the exact same code. The xv6 ﬁle system can-not support ﬁles larger than 4GB, so we cannot run a bench-mark that evaluates the differences.

The performance results for the write microbenchmarks areshown in Figure 4. The graphs include single-threaded se-quential writes and single-threaded and 32-threaded randomwrites for 32KB, 128KB and 1024KB writes. Our evaluation 1 Thread 32 ThreadsBento 1126 1072C-Kernel 933 881FUSE 24 24Table 4: Create Microbenchmark Performance (Ops/sec)does not include 4KB writes because these often triggered asegmentation fault in Filebench. Performance is measured inthroughput in MBps. The performance of the FUSE ﬁle sys-tem was so low, these bars are nearly ﬂush with the bottomof the graphs.The Bento ﬁle system shows similar performance to theC version of the ﬁle system, and both perform much betterthan the FUSE version of the ﬁle system. The versions writ-ten in Rust using Bento and in C in the kernel implementnearly identical behavior, so it is expected for them to havesimilar performance. The Bento ﬁle system performs some-what better than the VFS ﬁle system on large writes becauseBento, which inherits from the FUSE kernel module, uses amore optimized technique for writing pages. Bento uses the writepages method instead of writepage , allowing sequen-tial pages to be batched.We see signiﬁcant slowdown because of slow block I/Ofrom userspace, even through the

O_DIRECT ﬁle interface.Each block operation from userspace must pass across theuser/kernel boundary and through the VFS layer beforereaching the disk, adding 200-400ns to each operation. Ontop of that, the ﬁle interface imposes additional overheads.The ﬁle system must occasionally sync blocks to disk (suchas during log operations), but the ﬁle interface provides noway to sync parts of a ﬁle, so the whole disk ﬁle must besynced every time one block needs to be synced, making fsync s very costly.

Performance for the ﬁle creation microbenchmark on an SSDis shown in Table 4 for both single threaded and 32-threadedcreates. In these benchmarks, Bento shows competitive per-formance to the C version in the kernel and much betterperformance than the FUSE version. File creation involvesmany small writes (and so syncs in the log), so the FUSEperformance is heavily impacted by slow syncs. FUSE showsless slowdown for creates than it does for writes. This occursbecause the create microbenchmarks spend a smaller percent-age of the time executing slow disk operations.

Performance results for the ﬁle deletion microbenchmark onan SSD are shown in Table 5 for both single-threaded and 32-threaded benchmarks. These results show similar trends as10ento C-Kernel FUSE s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (a) Reads (32KB) s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (b) Reads (128KB) s e q - t s e q - t r nd - t r nd - t T h r oughpu t ( M B p s x1000 ) (c) Reads (1024KB) Figure 3: Read Performance (32KB-1024KB), Throughput MBps (x1000)Bento C-Kernel FUSEseq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (a) Writes (32KB) seq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (b) Writes (128KB) seq-1t rnd-1t rnd-32t0100200 T h r oughpu t ( M B p s ) (c) Writes (1024KB) Figure 4: Write Performance, Throughput (MBps)11 Thread 32 ThreadsBento 7499 7502C-Kernel 7500 8253FUSE 118 116Table 5: Delete Microbenchmark Performance (Ops/sec)Varmail (ops/s) Fileserver (ops/s) Untar (s)Bento 320 3860 19.8C-Kernel 303 2947 31.6FUSE 24 7 3404.9Ext4 785 5172 6.2Table 6: Macrobenchmark Performancethe ﬁle creation microbenchmarks because both are metadataheavy benchmarks so generate many small writes.

Performance results for the varmail macrobenchmark on anSSD are shown in Table 6. The ﬁle system implemented us-ing Bento and the C version in the kernel have very similarperformance while the FUSE version shows much worse per-formance. Since this is a metadata-heavy macrobenchmark,the xv6 ﬁle system results are similar to the metadata-heavymicrobenchmarks (ﬁle creation and deletion). The FUSE ver-sion performs comparatively better on the varmal benchmarkthan it does on the other benchmarks because varmail exe-cutes fsyncs on ﬁles. While fsyncs are slower for the FUSEversion than they are for the other two xv6 ﬁle systems, theslowdown is not as large when whole ﬁles are being syncedinstead of individual blocks. On all three versions of the xv6ﬁle system, the fsyncs take up the majority of the runtime,so the performance properties of the fsyncs are reﬂected inthe overall performance numbers. For this benchmark, ext4performs about 2.5x faster than either of the in-kernel xv6implementations.

Performance results for the ﬁleserver benchmark on the SSDare shown in Table 6. This benchmark involves many reads,writes, and ﬁle creates and deletes. Like the other bench-marks, these results show that the ﬁle system implementedusing Bento and the version using the VFS layer in the ker-nel have very similar performance, and both outperform theFUSE version. This benchmark is particularly affected by theFUSE slowdowns because it involves many writes and cre-ates, both of which introduce signiﬁcant overhead. Ext4 onlyperforms only 33% better than the xv6 ﬁle system written us-ing Bento. At that point, ext4 appears to be bounded by the throughput of the SSD.

This benchmark (shown in Table 6) untars the Linux kernelonto the relevant ﬁle system, generating many ﬁle creationsand writes across many directories. Unlike the other bench-marks, this measures total execution time instead of opera-tions per second, so lower is better. This benchmark showssomewhat more performance difference between the Bentoﬁle system and the VFS ﬁle system. This is likely causedby the same difference seen in the write microbenchmarks:Bento is able to batch sequential writes while the VFS imple-mentation of xv6 is not.

As future work, we will demonstrate development velocityby implementing and evaluating real-world ﬁle systems us-ing Bento. The simplicity of the xv6 ﬁle system was idealfor the proof-of-concept work, but a more full-featured ﬁlesystem will better demonstrate the velocity, generality, andlow performance impact of Bento. We plan to focus on a re-search ﬁle system that shows promise for practical use, bothproving that Bento can support a more complicated ﬁle sys-tem and providing a high-performance implementation of aﬁle system that has demand in the community.

FUSE (Filesystem in Userspace) [14] is a framework that en-ables implementing a ﬁle system to run in userspace. TheFUSE framework consists of two pieces, a kernel driver thattranslates VFS calls to FUSE-internal requests that are sentto userspace and libFUSE, a userspace library that interfaceswith the user ﬁle system. Like Bento, FUSE targets safetyand ease-of-development for Linux ﬁle systems. It’s ableto provide these properties quite well by running code inuserspace and providing a simpliﬁed interface. However, run-ning the ﬁle system in userspace introduces extra kernelcrossings, leading to up to 83% overhead. This overheadis too severe for many applications, and production systemsrarely employ FUSE. Bento takes advantage of the interfacework done in FUSE, using the FUSE low-level interface anda modiﬁed version of the FUSE kernel driver.

The extended Berkeley Packet Filter (eBPF) [23] is anothertechnique for safe extensibility in Linux. It allows users toinsert limited pieces of code to run in the Linux kernel ata set of predeﬁned locations with restricted permissions. As12mplied by the name, eBPF was originally aimed at networkpacket processing, but has since been expanded to supportbroader functionality. While the mainline eBPF code has nosupport for ﬁle systems, a project [5] has enabled writingparts of a stackable ﬁle system using eBPF. Broadly, eBPFprovides a high level of safety and good performance butcan’t easily support the large modules with complex logicand data structures that Bento targets. eBPF programs arelimited in size and type of operations. While eBPF pro-grams can be chained together with tail calls, maintainingstate across the tail calls is complicated at best.

Over the last few years, several papers have been publishedon veriﬁed operating systems and ﬁle systems [7, 24, 29].These projects use formal veriﬁcation to ensure that an im-plementation of a system abides by some deﬁned correct-ness properties. By enforcing proofs of correctness proper-ties, this technique is able to eliminate many bugs withoutnecessarily adding performance overhead. However, veriﬁedﬁle systems are still difﬁcult to design and implement, requir-ing specialized knowledge. Additionally, there are currentlyno practical mechanisms to verifying concurrent code.

Software fault isolation is a technique for limiting the im-pact of faults in a module and has seen several implementa-tions [6, 33], including one for Linux modules [20]. Usingthis technique, faults in a protected module are unable to im-pact the correctness of surrounding code. SFI can have sig-niﬁcant performance overhead in many cases, ranging any-where from 0 × to 4 × CPU overhead. This overhead man-ifests both while executing the isolated. module and whiletransitioning into and out of the module. Additionally, whileSFI can address some of the safety concerns when develop-ing Linux modules, it does not reduce the number of bugs inthe module, only ensuring that bugs in the module will beisolated to the module.

Other projects have also employed a high-level languagewith a strict type system to provide safety in the operat-ing system. Like Bento, the SPIN operating system [4] pro-vides safe extensibility by combining a safe, modular inter-face with modules implemented in a safe language, thoughSPIN designed the whole operating system around extensi-bility. Bento applies these techniques and the associated ben-eﬁts to Linux. Other operating systems projects, such as Sin-gularity [16], Biscuit [9], and Redox [27], explore writingthe entire operating system in a high-level language. Addi-tionally, we’re not the ﬁrst group to integrate Rust code into the Linux kernel. Other projects have implemented devicedrivers in Rust [18, 19].

The work we have done on Bento so far is the beginning ofa larger project. Over the next year, we plan to further workboth on Bento for ﬁle systems and for other interfaces acrossthe Linux kernel. Over the next six months, we plan to con-clude our work on the ﬁle system extensibility layer and sub-mit a paper on the topic. After that, we intend to apply theconcepts in this paper to other interfaces across the Linuxkernel, starting with networking.

On top of addressing the remaining challenges and imple-menting future work discussed thus far (online upgrades,debugging API, composable ﬁle systems, real-world evalu-ation), we will update to more recent kernel versions to takeadvantage of new functionality. Recent versions of the Linuxkernel have included a new abstraction for performing asyn-chronous I/O from userspace called io_uring . Using thisinterface for the I/O accesses from the FUSE version of thexv6 ﬁle system in the evaluation could result in better perfor-mance numbers, potentially decreasing the overhead seen byusing FUSE. More interestingly, our framework could hookinto the ﬁle I/O part of io_uring along with the VFS layer,allowing users to entirely bypass the VFS layer. We cannotcurrently use io_uring because our framework is built inLinux kernel version 4.15 and io_uring requires version5.1. Updating to the new kernel version and incorporating io_uring is part of our future work on this project.

While this project focuses on ﬁle systems, none of the goals,challenges, or techniques are unique to ﬁle systems; othertypes of extensions can also beneﬁt from this design. Weplan to ﬁrst focus on networking, particularly the TCP/IPfunctionality (or more broadly, OSI layers 3 and 4). Recentwork has shown demand for specialized, userspace networkstacks [21, 22] to improve performance for speciﬁc applica-tions. Similarly, exokernel operating systems [12] and kernel-bypass networking [25] seek to improve performance by us-ing optimized network stacks and/or avoiding the overheadof the Linux kernel network stack. However, kernel bypassnetworking has downsides, requiring the whole NIC to begiven to the userspace process and burning CPU cores forpolling. Applying the concepts from this projects to kernelnetworking could enable running specialized network stacksin the kernel, enabling the performance gains of running aspecialized network stack without the downsides of kernelbypass networking.13n the long term, we intend to apply the concepts from thisproject to interfaces across the Linux kernel. Many interfacescould beneﬁt from improved safety and increased develop-ment velocity. Past work has used Rust to write drivers [18],but these drivers still have a signiﬁcant amount of unsafecode. Other pieces of the kernel, such as scheduling algo-rithms or security modules, could be targets for future work.

In this paper we present Bento, a framework to improve de-velopment velocity in operating systems, focusing on Linuxkernel ﬁle systems for this work. We’ve identiﬁed severalproperties an extensibility framework must provide for highdevelopment velocity: safety, performance, generality, com-patibility with existing operating systems, and other fea-tures for fast development velocity such as online upgradesand code reuse. By taking advantage of Rust, a moderntype-safe, non-garbage-collected language, and enforcing re-stricted memory sharing between the ﬁle system and the ker-nel, safe abstractions around kernel services, and isolated ﬁlesystem modules, Bento is able to provide the ﬁrst four ofthese properties for Linux kernel ﬁle systems, with the ﬁfthcoming soon. We’ve implemented the xv6 ﬁle system usingBento and shown that it has similar performance to a VFSkernel ﬁle system using the same design. As future work,we’ll continue development on Bento, adding more featuressuch as support for online upgrades, and we’ll use Bentoto implement more full-featured ﬁle systems. Code will beavailable once this additional work has been completed.

References [1] Abutalib Aghayev, Sage Weil, Michael Kuchnik, MarkNelson, Gregory R. Ganger, and George Amvrosiadis.File systems unﬁt as distributed storage backends:Lessons from 10 years of ceph evolution. In

Proceed-ings of the 27th ACM Symposium on Operating SystemsPrinciples , SOSP ’19, page 353–369, New York, NY,USA, 2019. Association for Computing Machinery.[2] Thomas E. Anderson, Marco Canini, Jongyul Kim, De-jan Kostic, Youngjin Kwon, Simon Peter, Waleed Reda,Henry N. Schuh, and Emmett Witchel. Assise: Perfor-mance and availability via NVM colocation in a dis-tributed ﬁle system.

CoRR , abs/1910.05106, 2019.[3] Adam Belay, George Prekas, Ana Klimovic, SamuelGrossman, Christos Kozyrakis, and Edouard Bugnion.IX: A protected dataplane operating system for highthroughput and low latency. In , pages 49–65, Broomﬁeld, CO, October2014. USENIX Association. [4] B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E.Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Ex-tensibility Safety and Performance in the SPIN Operat-ing System. In

SOSP , 1995.[5] Ashish Bijlani and Umakishore Ramachandran. Exten-sion framework for ﬁle systems in user space. In

Pro-ceedings of the 2019 USENIX Conference on UsenixAnnual Technical Conference , USENIX ATC ’19, page121–134, USA, 2019. USENIX Association.[6] Miguel Castro, Manuel Costa, Jean-Philippe Martin,Marcus Peinado, Periklis Akritidis, Austin Donnelly,Paul Barham, and Richard Black. Fast Byte-granularitySoftware Fault Isolation. In

SOSP , 2009.[7] Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chli-pala, M. Frans Kaashoek, and Nickolai Zeldovich. Us-ing Crash Hoare Logic for Certifying the FSCQ FileSystem. In

SOSP , 2015.[8] James C. Corbett, Jeffrey Dean, Michael Epstein,Andrew Fikes, Christopher Frost, JJ Furman, SanjayGhemawat, Andrey Gubarev, Christopher Heiser, Pe-ter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eu-gene Kogan, Hongyi Li, Alexander Lloyd, Sergey Mel-nik, David Mwaura, David Nagle, Sean Quinlan, Ra-jesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szyma-niak, Christopher Taylor, Ruth Wang, and Dale Wood-ford. Spanner: Google’s globally-distributed database.In , pages 261–264,Hollywood, CA, 2012. USENIX Association.[9] Cody Cutler, M. Frans Kaashoek, and Robert T. Morris.The beneﬁts and costs of writing a POSIX kernel in ahigh-level language. In

OSDI , 2018.[10] Michael Dalton, David Schultz, Jacob Adriaens, AhsanAreﬁn, Anshuman Gupta, Brian Fahs, Dima Rubinstein,Enrique Cauich Zermeno, Erik Rubow, James Alexan-der Docauer, Jesse Alpert, Jing Ai, Jon Olson, KevinDeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis,Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krish-nan, Subbaiah Venkata, Yossi Richter, Uday Naik, andAmin Vahdat. Andromeda: Performance, isolation, andvelocity at scale in cloud network virtualization. In , pages 373–387,Renton, WA, April 2018. USENIX Association.[11] Docker. , 2018.[12] D. R. Engler, M. F. Kaashoek, and J. O’Toole, Jr.Exokernel: An Operating System Architecture forApplication-level Resource Management. In

SOSP ,1995.1413] Daniel Firestone, Andrew Putnam, Sambhrama Mund-kur, Derek Chiou, Alireza Dabagh, Mike Andrewartha,Hari Angepat, Vivek Bhanu, Adrian Caulﬁeld, EricChung, Harish Kumar Chandrappa, Somesh Chatur-mohta, Matt Humphrey, Jack Lavier, Norman Lam,Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, GauthamPopuri, Shachar Raindel, Tejas Sapre, Mark Shaw,Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava,Anshuman Verma, Qasim Zuhair, Deepak Bansal, DougBurger, Kushagra Vaid, David A. Maltz, and AlbertGreenberg. Azure accelerated networking: Smartnicsin the public cloud. In , pages 51–66, Renton, WA, April 2018. USENIXAssociation.[14] Filesystem in Userspace. https://github.com/libfuse/libfuse , 2018.[15] Rust FUSE. https://github.com/zargony/fuse-rs .[16] Galen C. Hunt and James R. Larus. Singularity: Re-thinking the Software Stack.

SIGOPS OSR , 2007.[17] Youngjin Kwon, Henrique Fingler, Tyler Hunt, SimonPeter, Emmett Witchel, and Thomas E. Anderson.Strata: A cross media ﬁle system. In

Proceedings ofthe 26th Symposium on Operating Systems Principles,Shanghai, China, October 28-31, 2017 , pages 460–477.ACM, 2017.[18] Zhuohua Li, Jincheng Wang, Mingshen Sun, andJohn C.S. Lui. Securing the device drivers of your em-bedded systems: Framework and prototype. In

Proceed-ings of the 14th International Conference on Availabil-ity, Reliability and Security , ARES ’19, New York, NY,USA, 2019. Association for Computing Machinery.[19] Linux-kernel-module-rust. https://github.com/fishinabarrel/linux-kernel-module-rust .[20] Yandong Mao, Haogang Chen, Dong Zhou, Xi Wang,Nickolai Zeldovich, and M. Frans Kaashoek. SoftwareFault Isolation with API Integrity and Multi-principalModules. In

SOSP , 2011.[21] Ilias Marinos, Robert N.M. Watson, and Mark Hand-ley. Network stack specialization for performance.

SIG-COMM Comput. Commun. Rev. , 44(4):175–186, Au-gust 2014.[22] Michael Marty, Marc de Kruijf, Jacob Adriaens,Christopher Alfeld, Sean Bauer, Carlo Contavalli,Michael Dalton, Nandita Dukkipati, William C. Evans,Steve Gribble, and et al. Snap: A microkernel approachto host networking. In

Proceedings of the 27th ACMSymposium on Operating Systems Principles , SOSP ’19, page 399–413, New York, NY, USA, 2019. Asso-ciation for Computing Machinery.[23] Steven McCanne and Jacobson Van. The BSD PacketFilter: A New Architecture for User-level Packet Cap-ture. In

Winter USENIX , 1993.[24] Luke Nelson, Helgi Sigurbjarnarson, Kaiyuan Zhang,Dylan Johnson, James Bornholt, Emina Torlak, andXi Wang. Hyperkernel: Push-Button Veriﬁcation of anOS Kernel. In

SOSP , 2017.[25] Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports,Doug Woos, Arvind Krishnamurthy, Thomas Anderson,and Timothy Roscoe. Arrakis: The operating system isthe control plane. In ,pages 1–16, Broomﬁeld, CO, October 2014. USENIXAssociation.[26] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jack-son, Andy Zhou, Jarno Rajahalme, Jesse Gross, AlexWang, Jonathan Stringer, Pravin Shelar, Keith Amidon,and Martín Casado. The Design and Implementation ofOpen vSwitch. NSDI, 2015.[27] Redox. , 2018.[28] Chuck Rossi. Rapid release at massive scale.[29] Helgi Sigurbjarnarson, James Bornholt, Emina Torlak,and Xi Wang. Push-button Veriﬁcation of File Systemsvia Crash Reﬁnement. In

OSDI , 2016.[30] William Tu, Joe Stringer, Yifeng Sun, and Wei Yi-Hung.Bringing the Power of eBPF to Open vSwitch. In

LinuxPlumbers Conference , 2018.[31] Amin Vahdat and Thomas E. Anderson. Transparentresult caching. In . USENIX Association, 1998.[32] Bharath Kumar Reddy Vangoor, Vasily Tarasov, andErez Zadok. To fuse or not to fuse: Performance ofuser-space ﬁle systems. In

Proceedings of the 15thUsenix Conference on File and Storage Technologies ,FAST’17, page 59–72, USA, 2017. USENIX Associa-tion.[33] Robert Wahbe, Steven Lucco, Thomas E. Anderson,and Susan L. Graham. Efﬁcient Software-based FaultIsolation. In

SOSP , 1993.[34] xv6 OS. https://github.com/mit-pdos/xv6-public .[35] Gerd Zellweger, Simon Gerber, Kornilios Kourtis, andTimothy Roscoe. Decoupling cores, kernels, and oper-ating systems. In Jason Flinn and Hank Levy, editors,15 , pages 17–31. USENIX Association,2014.[36] Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, and Anirudh Badam. I’m not dead yet!: Therole of the operating system in a kernel-bypass era. In