[PDF] A FaaS File System for Serverless Computing

Abstract

Serverless computing with cloud functions is quickly gaining adoption, but constrains programmers with its limited support for state management. We introduce a shared file system for cloud functions. It offers familiar POSIX semantics while taking advantage of distinctive aspects of cloud functions to achieve scalability and performance beyond what traditional shared file systems can offer. We take advantage of the function-grained fault tolerance model of cloud functions to proceed optimistically using local state, safe in the knowledge that we can restart if cache reads or lock activity cannot be reconciled upon commit. The boundaries of cloud functions provide implicit commit and rollback points, giving us the flexibility to use transaction processing techniques without changing the programming model or API. This allows a variety of stateful sever-based applications to benefit from the simplicity and scalability of serverless computing, often with little or no modification.

Full PDF

AA FaaS File System for Serverless Computing

Johann Schleier-Smith , Leonhard Holz Nathan Pemberton Joseph M. Hellerstein UC Berkeley Packet Computing

Abstract

Serverless computing with cloud functions is quicklygaining adoption, but constrains programmers with itslimited support for state management. We introduce ashared ﬁle system for cloud functions. It offers familiarPOSIX semantics while taking advantage of distinctiveaspects of cloud functions to achieve scalability and per-formance beyond what traditional shared ﬁle systems canoffer. We take advantage of the function-grained fault tol-erance model of cloud functions to proceed optimisticallyusing local state, safe in the knowledge that we can restartif cache reads or lock activity cannot be reconciled uponcommit. The boundaries of cloud functions provide im-plicit commit and rollback points, giving us the ﬂexibilityto use transaction processing techniques without changingthe programming model or API. This allows a variety ofstateful sever-based applications to beneﬁt from the sim-plicity and scalability of serverless computing, often withlittle or no modiﬁcation.

In some ways programming the cloud has never beeneasier—serverless computing puts the power of thousandsof computers at developers’ ﬁngertips [43, 30]. Autoscal-ing and pay-per-use mean that developers using Functionas a Service (FaaS) platforms experience an illusion of in-ﬁnite scale, and need not worry about allocating or admin-istering the underlying resources [22, 44]. Still, serverlesscomputing remains a new ﬁeld, and users quickly run intolimitations and difﬁculties [38], especially when it comesto managing application state .Perhaps the most familiar and time-tested solution tostate management is the POSIX ﬁle system API. This ledus to wonder about offering a POSIX ﬁlesystem API toserverless functions. Our hypothesis was that the designpatterns of cloud functions could offer opportunities forﬁle system performance that could match or even exceedexisting POSIX implementations for stateful VMs in thecloud. In this paper we explore this design space. OurFunction as a Service File System (FaaSFS) implementa-tion allows programmers to use familiar system interfaces,as well as existing software and libraries, while obtaining

Stateless cloud functions λ λ λ

Object Storage Key-value Storage FaaSFSStateful storage services

Figure 1: Serverless applications often use stateless cloudfunctions together with stateful services such as objectstorage or key-value storage. FaaSFS provides a sharedﬁle system for cloud functions. It provides a familiarPOSIX API, and through caching and transactional isola-tion achieves performance near that of a local ﬁle system.the scalability beneﬁts of serverless computing and per-formance that is competitive with “serverful” NFS ﬁlesys-tems offered for cloud VMs.When using FaaS, also known as cloud functions, forserverless computing, programmers upload their applica-tions as fragments of code written in high-level program-ming languages. They then specify when these bits ofcode should run, e.g., in response to web requests or toevents published on queues. Cloud functions are state-less in the sense that once a function ﬁnishes executing,the cloud provider can purge its execution environment,including the content of its local disk, and reassign re-sources to another customer.Figure 1 illustrates a common solution to state manage-ment in serverless computing, which is to use a statelessFaaS tier with one of a range of stateful storage services.Each cloud provider has different offerings, which forcesprogrammers to conform to provider-speciﬁc APIs. Thislimits portability, and makes it particularly difﬁcult to useserverless computing with existing code.A shared POSIX ﬁle system API for the serverlesscloud setting is easy to ask for, but can it be made to Examples include S3 and DynamoDB on AWS, Storage and Cos-mosDB on Azure, and Cloud Object Storage and Datastore on GoogleCloud. a r X i v : . [ c s . D C ] S e p ork? Cloud functions are subject to all the vagaries ofcloud execution [27], including resource contention andfailures. These challenges are exacerbated by a high de-gree of elasticity, which can raise parallelism from zero tothousands of concurrent executions in seconds [43]. Onenaturally concludes that the serverless environment is notconducive to implementing POSIX, which promises lin-earizability and has a chatty API that is vulnerable to la-tency.FaaSFS processes POSIX calls optimistically , using lo-cally cached state and wrapping cloud function ﬁle systeminteractions in a transaction mechanism to recover con-sistency when conﬂicts ensue. This approach is practicalbecause programmers are accustomed to cutting up theirapplications into independent pieces when building cloudfunctions. This style of programming is driven by twodistinctive properties of the cloud functions environment: • Limited execution time . Cloud functions typically doone thing and then return, a property enforced by aconﬁgurable execution time limit (ranging from sec-onds to minutes). FaaSFS can thus form transactions transparently by starting and ending them at functionboundaries; the programming model stays the same. • Function-grained fault tolerance . Cloud functionsare routinely re-executed by the FaaS platform, andtheir programmer contract generally requires devel-opers to write idempotent code. FaaSFS also re-quires programmers to write retry-safe code, but pro-vides them with a transaction mechanism that makesit simpler to do so correctly.Working with existing FaaS platforms is challengingbecause the environment is restricted in many ways, butwe were are able to work around these problems. Map-ping POSIX into a transactional database context requiressome care because the memory consistency guaranteesthat describe ﬁle systems do not correspond directly tothe transaction isolation levels provided by databases. Ourimplementation also needed to surmount several algorith-mic challenges in order to achieve good performance: al-lowing high concurrency when the ﬁle length changesrequires special treatment, and we utilize a ﬁne-grainedcache update mechanism (both described in Section 4.2).Our aim in this work is to demonstrate proof of designfeasibility. We focus on showing the performance nec-essary to run existing software, on integrating with thecloud functions environment, and on developing the pro-tocols and mechanisms for ensuring transactional consis-tency and cache consistency. None of these evaluationsare exhaustive, and we leave construction of a fault tol-erant and scalable back end to future work. We believe,however, that the work presented here makes a good casefor our approach and represents a full implementation ofthe salient serverless-speciﬁc aspects of our design.Our evaluation demonstrates the value of POSIX com- pliance by running both real-world applications and syn-thetic benchmarks including Filebench and TPC-C. Byrunning a popular blog software, we also show how wecan support an unmodiﬁed real world application. Set-ting up this software with FaaSFS is much like running itin a local development environment, and involves noneof the conﬁguration necessary for a scalable and fault-tolerant server-based deployment. For workloads thatbeneﬁt from optimistic execution, FaaSFS can outperformtraditional shared ﬁle systems accessed from staticallyallocated (serverful) VMs. For all workloads, FaaSFSprovides a bridge between a world of existing softwareand programming practice and the autoscaling and zero-conﬁguration world of serverless cloud computing.

Scalable image processing is the canonical example usecase for serverless [18], and it illustrates the awkward-ness of working with state. While there are countless im-age processing tools that operate directly on ﬁles, these donot speak the various provider-speciﬁc APIs of cloud stor-age. Using them in a serverless function involves rewrit-ing them to ﬁrst stage objects in local temporary storage,to then run the tools, and to then write outputs back tocloud storage, a process that introduces complexity.Cloud functions also seem like a natural ﬁt for web andAPI serving, but using them is not as simple as it mightbe because of state management. FaaS offers an autoscal-ing model well matched to these use cases, and makesit simple to run web application code, which is typicallystateless. However, even a simple web site such as a blogrunning on a single server will often use both a databaseand a local ﬁle system. Meeting these state needs withcloud services is possible (see Figure 1), however it re-quires programmers to use cloud-provider-speciﬁc APIsand may add complex conﬁguration. We return to this ex-ample in our evaluation of Section 5.4.PyWren [43] highlighted the beneﬁt of cloud functionsin making cloud computing accessible to a broad rangeof scientiﬁc computing users. However, PyWren requiresa customized distribution of Python designed for pre-loading into cloud functions. This distribution is crafted tobe sufﬁciently inclusive to be practically useful, while re-maining small enough to ﬁt into the limited storage spaceavailable on the cloud function. The needs of this use casecan be met more naturally with a shared ﬁle system.

In the cloud there are many storage systems that are super-ﬁcially similar to a ﬁle system, but which for good reason2o not provide a POSIX API. POSIX makes a linearizabil-ity guarantee, which is attainable within a single server,but is costly in a distributed and fault-tolerant setting. Totake an early example, the designers of the Google FileSystem [32, 51] chose weaker consistency guarantees, infavor of achieving greater robustness to server failures,in part because their workloads simply did not requirestronger guarantees. Object stores might look like ﬁlesystems for the cloud, but typically store only immutableitems. Some, such as AWS S3 also provide only even-tual consistency for many metadata operations. Key-valuestores allow programmers to select from a range of consis-tency guarantees (e.g., as in Dynamo [28] and AWS Dy-namoDB [3]). They map names to bytes, as ﬁle systemsdo, however their APIs are more restrictive than POSIX,requiring key-level replacement for modiﬁcation, ratherthan supporting updates or appends.There are numerous shared ﬁle systems and protocolsthat promise POSIX compliance, or something close to it.These include NFS [61, 37], SMB [13], Lustre [65], andGPFS [63]. These can work quite well in settings sucha high-performance computing, where ﬁle transfer unitsare large and concurrent access is coarse-grained and con-trolled by a job framework. They are more challenged inenvironments where concurrent access is common, e.g.,as documented for NFS [46]. What all of these sharedﬁle systems have in common is that a client must eitherhold exclusive access to shared state (which could be aﬁle, a directory, or a ﬁle range), or must wait on the net-work to perform operations on shared state maintained atthe server. The established approach to this is leases [34],but it is vulnerable to client failure or slowness, and isinconsistent with the common assumptions for cloud in-frastructure [27], which assume that failures and delayscan be common.

In designing FaaSFS we needed to simultaneously meetrequirements for performance, POSIX conformance, andcloud function integration. We touch upon each in turn.

A key motivation for providing a ﬁle system API is com-patibility with existing software ecosystems. In practicethis means matching not only the API semantics, but alsothe performance characteristics. Ideally, FaaSFS shoulddeliver a service that works like a local disk, but in a dis-tributed cloud setting.The POSIX ﬁle system API is a chatty one, whichmakes this goal challenging and elusive. Many applica-tions still perform a positioned read by ﬁrst performing a seek and then a read , even though the newer pread has now been available for over 20 years. Some applica-tions open and close ﬁles repeatedly, something thatprogrammers see as relatively cheap but which every timeinduces access control checks along the entire path to root.In common implementations, the cp command issues asequence of writes with block size of 64 KB or less. Com-pare that to the minium part size for multi-part uploads toAWS S3 object storage, which is 5 MB.Even on a single server, kernel caching of data andmetadata is essential to achieving acceptable ﬁle systemperformance. In a shared ﬁle system setting caching isequally important, but becomes more difﬁcult to achievesince clients are distributed.Given these challenges, a key innovation in FaaSFSis the use optimistic execution to achieve greater perfor-mance than traditional shared ﬁle systems attain. We at-tribute the advantages of FaaSFS to four beneﬁts of itsmechanisms: • Optimistic lock elision—lock requests always suc-ceed locally and speculatively, rather than traversingthe network as in other shared ﬁle systems. Our com-mit validation phase ensures strict serializable trans-action isolation [39], which ensures that lock seman-tics are respected (see Section 3.2). • Optimistic use of cached state—we serve read re-quests from local cache, speculatively assuming thatthis state will still be valid at commit time. • Snapshot reads—we support multiversioned state,which allows readers to make progress irrespectiveof write activity. Our implementation achieves thisthrough a pull-based cache update mechanism andwith a multiversioned back end. • Fine-grained cache updates—we update or invalidateclient caches at a block level, whereas other sys-tems operate at a ﬁle level. Our cache update mech-anism takes advantage of a transactional backend,which retains the information necessary for block-level change tracking in the transaction log.See Section 4.2 for additional discussion of these mech-anisms. Among the advantages listed above, optimisticlock elision and optimistic use of cache state requirespeculative execution capabilities. Snapshot reads couldbe provided in a blocking transactional implementation,however the optimistic mechanism helps because there isno need to enter a snapshot mode explicitly; we can opti-mistically assume the transaction will be read-only. Fine-grained cache updates are easy to provide once the changetracking needed for snapshots is in place, and could be in-corporated into traditional shared ﬁle systems.3 .2 Reconciling POSIX with transactions

Perhaps counterintuitively, our decision to use transac-tions in FaaSFS is motivated by performance. The peren-nial challenge for shared ﬁle systems is overcoming net-work latency when enforcing locks and obtaining up-to-date ﬁle system state. The use of optimistic trans-actions to transparently remove locks in programs hasbeen established in the hardware context [60]. We drawupon this approach, in addition to well established opti-mistic techniques for providing serializable isolation indatabases [20]. Transactions are often used to simplifycrash recovery or concurrent programming, but in FaaSFSthese beneﬁts are ancillary to their motivating purpose,which is to provide a mechanism for optimistic execution.Databases traditionally offer one or more transactionalguarantees [15], whereas the behavior of ﬁle systems isdictated by the POSIX speciﬁcation [45]. We address howthe two can be reconciled after ﬁrst reviewing each.POSIX [45] uses informal language in describing itsguarantees, saying “Writes can be serialized with respectto other reads and writes” and “If a read() of ﬁle data canbe proven (by any means) to occur after a write() of thedata, it must reﬂect that write()...A similar requirementapplies to multiple write operations to the same ﬁle po-sition.” In more formal language we can understand thisas requiring both atomicity and linearizability . Atomicityensures that POSIX operations are indivisible units, eachof which must be seen to have happened in its entirety ornot have happened at all. This means, for example, thata read must never observe just part of a write . Simi-larly, an object subject to a rename always appears witheither the old name or the new one; it may not disappearor appear under both names, even transiently. Lineariz-ability is the consistency property requiring that all oper-ations on the ﬁle system reﬂect a single global total order,and that this order corresponds to real-time, as observedat each client using wall clocks [39]; each operation mustslot into the total order at a point between its observedstart time and its observed completion time.Transactions form envelopes around multiple opera-tions and provide guarantees relating to these groupsrather than to operations individually. Transactional guar-antees are commonly described by one of several isolationlevels , which originally described locking schemes butnow have mechanism-agnostic deﬁnitions [15, 25]. Alter-natives to locking in databases include optimistic concur-rency control [48], multiversioning [20] and deterministicdatabases [73]. At lower isolation levels such as readcommitted or repeatable read a transaction mayexperience some effects of concurrent execution, whereaswith serializable isolation, each transaction always seesthe database as if it were the only transaction executing.How does serializable isolation compare to lineariz- ability? Database isolation guarantees alone make nopromises about real-time behavior, as measured by ob-servers comparing clocks, or even just running transac-tions one-by-one. For example, a database retains the se-rializable guarantee for a read-only transaction even whenevaluating it against a snapshot of its state at an arbitrarypoint in its past. Preventing such behavior calls for real-time correspondence, which for databases is known as ex-ternal consistency [24, 33]. Serializable isolation withexternal consistency is also known as strict serializabil-ity .When a transaction consists of a single operation, strictserializability is equivalent to linearizability [39], how-ever this equivalence is not true in general. In this paper,we demonstrate that transactions with strict serializabilityoffer good semantics and performance in many use cases.But there are some tradeoffs due to the strong semantics.To understand the limitations of wrapping cloud func-tions in serializable transactions, consider the fact thatsuch functions are not able to communicate with one an-other through the ﬁle system since their updates are iso-lated from each other. This precludes functionality thatmight be desirable, e.g., enabling pipelined processing ofa ﬁle while it is still being written. We defer explorationof other isolation models in FaaSFS to future work.In sum, running cloud functions in a transactionalwrapper and with strict serializability does not corresponddirectly to running them with shared access to a lineariz-able POSIX ﬁle system. However, it is equivalent to run-ning them one at a time, in sequence. Cloud functionapplications get the semantics they expect from POSIX,provided they do not rely upon interactions with othersrunning concurrently. The design of FaaSFS is heavily inﬂuenced by the charac-teristics of the cloud functions environment, which differsin several important ways from a traditional server. Thisincludes some limitations which are quite fundamental,as well as others that we accept only because we are notin a position to change the provider’s infrastructure. Keycharacteristics of the environment include: • No root privileges . Lambda offers a controlled en-vironment, and operations like mounting an NFSshare are prohibited. We are also unable to mountFUSE user space ﬁle systems [4] or load kernel mod-ules. Thus we implement FaaSFS as a user space ﬁleserver. • Function instances freeze between invocations .Lambda instances only run when they are process-ing a request invoked through the API. In the frozenstate no processing occurs, even if data arrives on4he network or if a timer is scheduled to raise a sig-nal. Function instances retain FaaSFS cache statewhile frozen, but there is no way to update that stateuntil the instance begins processing a new request.Also, we cannot safely maintain delegations [40] orleases [34] across successive invocations because wemay be unable to revoke them on demand. • No inbound network connections . Cloud functionslive behind a NAT layer that prohibits inbound net-work connections. While some workarounds havebeen demonstrated [30, 76], direct communicationbetween functions is not part of the programmingmodel. Functions using FaaSFS must communicatethrough a shared back end instead. • No names for function instances . While Lambdamay create many instances of a cloud function (asdictated by load), there is no way to route an invoca-tion to a particular instance. Each invocation couldgo to any function instance, which prevents us frompartitioning our cache. • Limited execution time . See Section 1. Cloud func-tions typically run for fractions of a second. Whilethey can run for minutes, we always have a bound ontheir execution time. We use limited execution timeto our advantage, creating transactions transparentlyand thus relieving programmers of using a separatetransactional API. • Function-grained fault tolerance . See Section 1.Cloud functions must be safe to retry, a require-ment that is usually met by requiring idempotentcode. Prior work on serverless key-value storagepoints out that transactional atomicity relaxes thisrequirement [68]: atomic visibility of updates en-sures that all or none of a function’s effects are vis-ible in storage. Hence the transactions provided byFaaSFS simplify idempotence for ﬁlesystem interac-tions, though the programmer must still take care toensure the safety of side-effects outside of FaaSFS.Our approach aligns with the ﬁnal two characteristics,both of which are tied tightly to the cloud functions pro-gramming model. Bounded execution is a signature at-tribute that sets cloud functions apart from serverful ser-vices. Function-grained fault tolerance with retries is thesimplest way to provide reliability.We also rely upon the ability to retain cache state in thecloud function instance memory between repeated func-tion invocations. Even though this may be viewed as adefect in the FaaS model [42], a concession on functionalpurity and a security threat vector, we believe that this it ishere to stay. For performance reasons, providers typicallyreuse function instances for many requests, only shuttingthem down and reclaiming them after they have been idlefor minutes, or even hours [77]. Starting up a new cloudfunction instance requires transferring over code, starting up a language runtime, and perhaps executing application-speciﬁc initialization, all before the invoked function canstart executing. This takes time, perhaps less than a sec-ond but also possible much longer [54]. Some applica-tions may load external data sets, and JIT language run-times like JVMs maintain substantial internal state to im-prove performance. As a consequence, FaaSFS is notalone in beneﬁting from cached state, cloud providers andapplications beneﬁt as well.

We chose to implement FaaSFS from scratch, rather thanmodifying an existing ﬁle system or database implemen-tation, or building on top of one. This choice is impor-tant to achieving our aims because it allows us to inte-grate the caching mechanism with the concurrency con-trol mechanism, which we do by using the transaction logas a source of updates for intermittently connected cloudfunction clients. We break this review of the implemen-tation into two parts, a discussion of the functionality ofeach component, and a discussion of state managementand transactions.

Figure 2 shows the components of FaaSFS in the contextof an application. In the discussion that follows, we workour way down the stack from the application. In our work-loads, all system calls originate in a limited set of sharedlibraries: the C standard library, the pthread library, or thedynamic linker library.

System call intercept:

Operating in the AWS Lambdaenvironment, we have no ability to modify or conﬁgurethe kernel. The system calls used by ptrace are alsoblocked, so we resort to binary modiﬁcation via hot patch-ing to intercept system calls. We do this using the SystemCall Intercepting Library [8] provided as part of the Per-sistent Memory Development Kit [62].

Routing:

The FaaSFS Routing Library runs in the ad-dress space of the application and registers a handler withthe System Call Intercepting Library. This handler getsinvoked ahead of all system calls, and can either let thempass unchanged or substitute an alternative implementa-tion. For those calls corresponding to the POSIX ﬁle sys-tem API it performs a routing decision, using argumentssuch as the path name or the ﬁle descriptor to determinewhether the operation should go to FaaSFS or to the un-derlying operating system. For paths, we test the preﬁxes,e.g., /mnt/tsfs , normalizing them to account for rel-ative paths. Some delicate bookkeeping is required, andin the case of forked processes this information must becarried in environment variables.5 pplication ProcessApplication CodeC Standard LibrarySystem Call Intercepting LibraryFaaSFS Routing LibraryLinux KernelFaaSFS User Space Local ServerShared MemoryFaaSFS Backend ServiceNetworkCache & Write Buffer

Figure 2: Overview of of FaaSFS. Yellow indicates thesoftware that we wrote.

Shared memory IPC:

The Routing Library and theLocal Server communicate using a shared memory area.We maintain a set of buffers, conﬁgurable in number andsize, to allow for concurrent requests. By default we pro-vide 10 buffers each 2 MB in size. A client, which maycorrespond to either a thread or a process, ﬁrst checks outa buffer using atomic operations. It then writes the requestdata and marks it as ready for processing by the server.The server will busy wait, spinning for up to 16 µ s beforefalling back to wait on a semaphore. The response worksthe same way, with the client ﬁrst spinning in hopes ofreceiving a low-latency response, before turning to oper-ating system support for coordination. We analyze theefﬁciency of the IPC mechanism in Section 5.1. User Space Local Server:

The FaaSFS User SpaceLocal Server runs in a separate process from the applica-tion, and each instance of a cloud function runs one suchprocess. The Local Server maintains a cache as well asa write buffer, and intermediates all network communi-cation with the FaaSFS Backend Service. Both the Lo-

IPC ServerWrite Buffer Block CacheRPC ManagerRPC ServerCommit validatorBlock-version-stamped in-memory file systemSequencer Indexedundo logSnapshot readTransactional Client

Local ServerBackend

Figure 3: Components of the FaaSFS transactional imple-mentation.cal serer and the Backend Service are written in Go, andwe use Go’s built-in RPC for communication between thetwo. A further discussion of the transactional mechanismsfollows in Section 4.2.

Backend Service

Our focus in this work is on the pro-tocols necessary to maintain performance and POSIX-compliant consisstency across a large number of dis-tributed cloud function. For the purposes of this work weprovide a prototype backend implemented as a monolithicserver that maintains state in memory. The techniques forbuilding a scalable transactional backend service are welldocumented (e.g., [1, 24, 74]) and we believe they can becombined with this work in the future.FaaSFS comprises 3,500 lines of C and 15,000 linesof Go. We delve deeper into the implementation of ourcaching and consistency protocols in the next section.

We now turn to the transactional mechanisms imple-mented in FaaSFS. The diagram in Figure 3 provides anoverview of the major components comprising it. As dis-cussed in Section 3.2, we provide strict serializability withan optimistic concurrency control implementation. At thebeginning of each transaction, the client communicateswith the server to obtain a ﬁlesystem-wide read timestamp T R , which corresponds to the most recently committedversion of the ﬁle system. This introduces a round trip butallows us to guarantee strict serializability. All reads mustbe returned to the application as if they had been issued at T R , unless the transaction itself has modiﬁed the state be-ing read, in which case the effect of its operations must bereﬂected. Each ﬁle is represented as a number of blocks,each of which has an associated timestamp T reﬂecting6he commit time of it last change.Throughout the course of a transaction, the Transac-tional Client at the Local Server maintains a read set R and a write set W . Each read occuring during the courseof a transaction adds a record of the form ( blocknum , T R )to R . Similarly, writes record of the form ( blocknum , changed data ) are added to W , where changed data isof the form ( offset , byte []) and can represent a partial up-date to the state of the block. Associated with each readand write, we record an assertion on the length of the ﬁle,which is discussed in more detail below.At commit time, the Transactional Client of the LocalServer sends the R and W to the Backend Service, wherethey are validated according to the rules of optimistic con-currency control [20]. Each block B stored in the Back-end Service has an associated write timestamp T BW . Foreach record in R , the Backend Service veriﬁes that T BW ≤ T R . If this veriﬁcation fails for any block, then the trans-action is aborted. When veriﬁcation succeeds, the Back-end obtains a commit timestamp T W from the Sequencer.For each write recorded in W , it copies the pre-commitstate of the block to an Undo Log, then updates the blockto apply the transaction.As a consequence of POSIX semantics, each ﬁle readis also implicitly a read of the ﬁle length, as its result candepend on two things other writes to the range of bytesrequested. First, it depends on any truncate operationsthat specify a length less than the last byte requested. Itcan also depend on writes that begin at an offset greaterthan the last byte requested, because POSIX zero-ﬁlls ﬁlesthat have gaps. The ﬁle length can change frequently, butit typically grows more often than it shrinks. In FaaSFS,operations in R also represent a predicate read on the ﬁlelength, i.e., an assertion of f ilelength > = lastbyteread .Reads beginning beyond the end of the ﬁle return 0 bytesand add the assertion f ilelength < = f irstbyteread to R , whereas only those reads limited by reaching the endof ﬁle assert f ilelength = lastbyteread . The assertions,are also validated at commit time.There is a large design space and extensive previouswork on providing caching for distributed transactionalclients [31]. One limitation of the cloud functions envi-ronment that FaaSFS must account for is that instances be-come frozen between invocations, and their cached statecan only be updated once they are running again. TheLocal Server typically contacts the Backend Service atthe beginning of a transaction, requesting a record of anyblocks that might have changed. In the simplest imple-mentation, the Backend Service checks the transaction logto see which blocks have changed since the Local Servercache was last updated, then sends all of them over. Thisdoes not scale, and our protocol allows it to instead sendeither a block-level or a ﬁle-level invalidation, instead ofbroadcasting the updated data. Additionally, FaaSFS al- lows the Backend Service to do nothing at all to update acache, to leave it stale and rely on the commit mechanismto abort any transaction that reads it. There is a large spaceof possible policies that might be implemented with thismechanism, a proper exploration of which is beyond thescope of this paper. Our implementation includes a simplefrequency-based heuristic that sends commonly fetchedblocks and invalidates others.Because of Local Server caches, the FaaSFS BackendService does not need to do very much work to operate asa multiversioned database. In many cases, the lag in up-dating caches means that multiversioning is possible withno server interaction at all. When a client accesses a blockthat is not found in cache, it sends its read timestamp T R along with its request to fetch the block from the Back-end Service, which subsequently uses the Undo Log toretrieve an older version of the block. We begin our evaluation with a comparison of the per-operation latency of common operations in FaaSFS,which illustrates some consequences of our design. Thetest loops over a sequence of operations on one ﬁle whichit opens, reads at a random location, writes at a randomlocation, syncs, and closes. In the case of FaaSFS, theseoperations are wrapped in a transaction.Figure 4 shows the resulting latencies for a block sizeof 1 KB. In order to facilitate comparisons to NFS, whichis not available in AWS Lambda, we used a Docker imagedesigned to replicate the AWS Lambda environment, run-ning it on an EC2 instance (c5.large instance type). Ourback-end server is a c5.9xlarge instance, for both NFS andFaaSFS. All of our environments use a Linux 4.14 kernelon both clients and servers.The fastest operation is seek , which has a median la-tency of about 520 ns for a local ﬁle system, 750 ns forcalls against an NFS target, and 1.6 µ s for FaaSFS ﬁlehandles on EC2. Seek is a trivial operation, the perfor-mance of which is dominated by system call overhead, orin our case, by the latency of our IPC implementation. OnLambda this latency increases to 1.9 µ s.Our implementation of FaaSFS also pays a latencypenalty on account of its IPC mechanism, which is slowerthan a native system call. Ext4 and NFS pay a similar la-tency cost for reading 1 KB, requiring 1.25 µ s and 1.4 µ s,respectively, whereas our implementation of FaaSFS re-quires 5.6 µ s on EC2 and 7.6 µ s on Lambda. When writ-ing, Ext4 and NFS require 2.0 µ s and 2.2 µ s, whereasFaaSFS takes 4.8 µ s on EC2 and 6.2 µ s on Lambda. Mostof the added latency in FaaSFS comes from our IPC mech-anism, which must copy the data two times. Reads take7 pen close stat sync begin commit0200400600800 L a t e n c y ( s ) Local Ext4NFSFaaSFS-DockerFaaSFS-Lambdaseek read write0246810 L a t e n c y ( s ) Figure 4: Median latency measurements show the overheads of our implementation of FaaSFS and the difference inremote access latencies.slightly longer than writes because our implementation ofmultiversion isolation incurs greater overheads on readsthan on writes.We believe that an implementation of FaaSFS as a ker-nel module could bring many of these overheads in linewith the other implementations. As discussed in Sec-tion 3, our user-space approach is driven by a desire todeploy FaaSFS in AWS Lambda and other cloud functionplatforms commercially available today.

To demonstrate the ability of FaaSFS to execute a varietyof simulated applications, we used the Filebench [72] testsuite. We ran six of the standard included “personalities”:ﬁle server, network ﬁle server, mail server, video server,web proxy, and web server. These classic workloads rep-resent a variety of i/o patterns, and thus provide a ﬂavorfor the diversity of applications that FaaSFS can support.In adapting Filebench to the cloud functions setting, wewrap each iteration of the workload in a transaction.Figure 5 compares FaaSFS running on EC2 to NFSwith four concurrent Filebench clients. For each opera-tion, we plot the difference in the time spent during oneiteration of the workload, then divide by the average itera-tion duration in the NFS base case ((

FaaSFS op time - NFSop time )/(

NFS all ops time )). This indicates how changesin the speed of each operation impact overall benchmarkperformance. In the last column we also show this overallperformance difference, which is the cumulative total ofthe differences for each operation.Our implementation of FaaSFS outperforms NFS insome workloads, but lags it in others. For example inthe ﬁle server workload, FaaSFS gains signiﬁcant advan-tages from faster ﬁle open and close operations (O f andC f ), but pays a penalty when opening directories (O d ),beginning transactions (B), and committing (C). Overallit is 61% slower. The web server workload, by contrast,has wins and losses on the same operations, but overallruns about 2.1x faster. This discrepancy is driven primar- ily by the number of operations executed in each transac-tion, which is 3x greater for the web server than it is forthe ﬁle server. The network ﬁle server gains a large advan-tage in read operations (R), and an overall win in perfor-mance, attributable to more effective caching in FaaSFS.With the web proxy, by contrast, FaaSFS sees a disadvan-tage for read operations (R), this time attributable to theincreased overhead of accessing cached data. For the mailserver, we note the reduction in time spent in sync opera-tions (S), though this does not outweigh the cost of timespent in begin (B) and commit (C). In the video server,Filebench implements a per-client rate limit. Here we seethat the much of the time added in begin and commit getsabsorbed the rate limit (Z), so the overall performance im-pact is minimal. When considering workloads that challenge FaaSFS, it ishard to come up with something more demanding thanan OLTP database. We chose the TPC-C benchmark [9]to explore the limits of what FaaSFS could achieve un-der a contended, write-heavy workload with strict correct-ness requirements. Whereas Filebench issues operationswithout conﬁrming that they do the right thing, a databasewill quickly detect corruption if the underlying ﬁle systemdoes not live up to the POSIX guarantees.We chose to run TPC-C on SQLite [6], a databasethat runs as a library and persists its state in a ﬁle sys-tem. SQLite is primarily designed for embedded envi-ronments, and aims to meet high standards for efﬁciencyand management-free operation. The target environmentsmay have limited operating system facilities, e.g., theymay not support shared memory, and by default SQLitecommunicates through the ﬁle system when coordinat-ing among multiple processes. SQLite does only lim-ited caching itself, relying heavily on caching in the un-derlying ﬁle system to achieve good performance. Itspage cache is maintained on a per-session basis, and getscleared any time the database ﬁle changes. SQLite allows8 f C f R W O d C d S Z B C T-50%0%50%

File Server O f C f R W O d C d S Z B C T

Network File Server O f C f R W O d C d S Z B C T

Mail Server O f C f R W O d C d S Z B C T-50%0%50%

Video Server O f C f R W O d C d S Z B C T

Web Proxy O f C f R W O d C d S Z B C T

Web Server F aa S F S v s . N F S ( r e l a t i v e d u r a t i o n ) Figure 5: Filebench workload. Difference in average time elapsed between FaaSFS and NFS (lower is better).Columns are (O f ) open ﬁle, (C f ) close ﬁle, (R) read ﬁle, (W) write ﬁle, (O d ) open directory, (C d ) close directory,(S) fsync, (Z) rate limit, (B) begin, (C) commit, and (T) total overall.Figure 6: TPC-C scaling. We compare FaaSFS to NFSand consider two conﬁgurations, one in which caches areupdated eagerly for all ﬁles, and the other in which thecache for each ﬁle is updated the ﬁrst time the transactionaccesses it.only one writer to access the database at a time, how-ever it supports multiple readers. Readers may run con-currently with one another, and concurrently with writ-ers when multiversion concurrency control is enabled. Inmost cases one can think of a SQLite database as a singleﬁle, however SQLite allows an application to open multi-ple such database ﬁles, and can perform transactions span-ning them using a two phase commit protocol [35].The TPC-C workload models the operation of a largecompany with many regional warehouses and geographi-cally distributed customers. The database serves queries for customers inspecting stock and placing orders, as wellas for processing payments, and tracking deliveries. It isa write-heavy workload, with about 70% of queries re-sulting in a modiﬁcation to the database. There is somelocality of reference, since 90% of orders are served en-tirely from the customer’s regional warehouse, whereas10% include one or more items from another warehouse.This means when partitioned by warehouse, many queriescan complete locally.In our conﬁguration we use 64 warehouses, split across64 SQLite database ﬁles, and a total database size of756 MB. Figure 6 compares performance on NFS totwo conﬁgurations of FaaSFS. In the eager conﬁgurationclient caches receive updates for all changes to the ﬁlesystem at the beginning of the transaction, whereas in the lazy conﬁguration FaaSFS updates cached data for eachﬁle only when it is opened.This experiment highlights both strengths and weak-nesses of our approach: there is a signiﬁcant improve-ment over NFS, nearly 30x in some cases, however thefraction of transactions aborted due to concurrent modiﬁ-cation rises rapidly. When there is only one client activeNFS and eager FaaSFS have comparable performance,with 2,500 and 3,000 tpmC, respectively. With two clientsNFS performance is degraded by a factor of 10, as clientsmust invalidate an entire cached ﬁle whenever any partof it changes. FaaSFS, by contrast, ships sends changedblocks rather than invalidating caches, and sees a 70% in-crease in performance when going from one client to two.The eager update policy in FaaSFS sends changes for allﬁles at the beginning of each transaction, whereas the lazyupdate policy defers fetching changes until the ﬁle is ac-cessed. While the eager policy improves throughout 2-9x it uses signiﬁcant amounts of network bandwidth andserver resources. To ensure that back end capacity wouldnot be a bottleneck, we used a c5.18xlarge EC2 in-stance.While this experiment demonstrates the completenessof our FaaSFS implementation and highlights some ofits scaling characteristics, we do not advocate running awrite-intensive application like that modeled by TPC-Cusing the combination of SQLite and FaaSFS. One poten-tial objection is that layering one transactional system ontop of another is bound to be inefﬁcient, however SQLitehelpfully provides a mode that turns off all crash recov-ery mechanisms. A more serious problem is that SQLitemaintains a sequence id that is updated with every trans-action, so even though FaaSFS can elide locks it is unableto support concurrent updates to a SQLite database ﬁle.This represents false sharing, where transactions that oth-erwise operate on disjoint sets of database blocks nonethe-less contend for the one that records the database version.We come back to how this might be addressed in Sec-tion 7.A back-of-the-envelope calculation suggests that thecost of running TPC-C using Lambda and FaaSFS is com-parable to that of server based implementations. Accord-ing to published commercial benchmarks [10], the capitalcost of a database is approximately $1 USD per tpmC,which works out to $0 . per million transactions whenamortized over 3 years. By way of comparison, a cloudfunction in AWS Lambda with 1 GB memory costs $0.001per minute, which based on our experiments suggests acost of $1 . per million transactions. Neither of theserepresent an all-in cost. Cost ﬁgures for commercial sys-tems leave out power and other data center costs, and as-sume 100% utilization, whereas our FaaSFS estimate in-cludes the cost of Lambda but not the cost of providing theback end and storage. For suitable workloads, those thatare read-heavy and have limited write contention, runningan existing database like SQLite on top of FaaSFS couldbe a practical solution. Interestingly, the SQLite authorsdescribed it as “serverless database” before serverless wasused in the cloud context [7]. Perhaps FaaSFS can turn theworld’s most widely deployed embedded database into aviable cloud database. In order to understand how FaaSFS performs in a full-stack application we chose to evaluate it for running Mez-zanine [5], a popular open source blogging platform writ-ten in Python. Mezzanine is a web application adher-ing to Python’s WSGI standard, and we were able to de-ploy it to AWS Lambda using Zappa [12], an open sourceproject that packages traditional WSGI applications ascloud functions. Zappa uses AWS API Gateway to make C o m p l e t i o n R a t e ( / s ) TargetLambdaServers0 20 40 60 80 100 120Time (s)0510 L a t e n c y ( s ) Figure 7: A blog application backed by FaaSFS on AWSLambda achieves serverless scaling, whereas a 2-serverconﬁguration which may have ample capacity most of thetime does not scale to handle a load spike. In this exper-iment, the number of clients increases over time, with atarget given by the gray line.cloud functions accessible over the internet. The resultingapplication, including libraries, is approximately 100 MBuncompressed, 30 MB compressed. In addition to the ap-plication payload, the cloud function includes a customruntime, provided as a Lambda layer, that is 134 MB un-compressed, 62 MB compressed. We run Mezzanine inits default storage conﬁguration, which uses a SQLite [6]database.For comparison, we provisioned a cluster of two m5.large

EC2 instances, placing them behind an AWSApplication Load Balancer for load balancing, and con-necting them to a shared AWS Aurora MySQL database.Figure 7 shows a workload where a number of clientsrepeatedly load the home page of the blog with a tar-get rate of one request per second. To simulate remoteclients we run the load generator on AWS infrastructurein a different region (approximately 90 ms round-trip net-work latency). We ramp up the number of clients from 2to 800, measuring service latency and throughput. Whilethe two servers are able to sustain 70 requests per second,AWS Lambda with FaaSFS adds resources to meet muchgreater demand, showing only brief latency spikes as itprovisions more resources. We were not able to identifyan impact to latency from making simple updates (addingblog entries, posting comments).At low load, average latency with servers is 122 ms10omponent Cost per millionServerless API Gateway $3.50Lambda $5.21Total $8.71Servers Load Balancer $0.122x EC2 m5.large $0.76Total $0.88Table 1: Pricing breakdown and comparison for blog ap-plication. The per-request cost of serverless could be upto 10x that of fully-loaded servers. As we dicsuss in sec-tion 5.4, several factors are likely to reduce the differencein practice.whereas with Lambda it is 207 ms. We believe that thisis mostly caused by the overheads of API Gateway andLambda invocation. Note that as the service warms up itgets faster: running 1,200 concurrent clients we average7,200 requests/sec with a latency of 156 ms.This test is designed to illustrate one compelling usecase for FaaSFS, making it possible to host a small website on a pay-per-use basis, so usually very little cost,while maintaining the ability to reach large scale quicklywhen needed.Figure 1 shows a breakdown of the costs for the server-less and server-based implementations. This makes it ap-pear that serverless is 10x more expensive than servers,but in practice this is unlikely to be the case. First, onemust maintain and pay for idle capacity on the servers,both in case of failure and in case of load spikes. Fora two-server conﬁguration, fault tolerance requires a 2xover-provisioning, and intra-day peak-to-average skewcan add another 2x. The cost of serverless can also bereduced by replacing API Gateway with an ApplicationLoad Balancer, as used with EC2 (though this is presentlynot supported by Zappa). We may be able to further re-duce the cost of using FaaSFS on Lambda by optimizingour IPC mechanism, which spins the CPU aggressively.An alternative implementation might allow us to achievesimilar performance using cheaper functions conﬁguredwith less memory (high-memory includes high CPU inLambda).

A number of research efforts have recognized that server-less computing is a unique environment with state man-agement needs that remain unmet. The Anna key-valuestore focuses on the elastic scalability demands of aserverless environment, demonstrating performance overa wide range of scale, and automatic storage tiering that adapts to application needs [80, 79]. Anna is extended byCloudburst [69], which adds a FaaS execution layer withintegrated caching of key-value store data. Pocket [47]provides ephemeral storage for serverless analytics, fo-cusing on efﬁcient resource allocation for short time du-rations, a challenge also studied by Locus [59]. AFT ad-dresses provides an atomicity shim that sits between cloudfunctions and cloud storage [68], which makes it easierto achieve the idempotence that cloud functions require.Serverless computing owes its popularity in part to com-patibility with existing software ecosystems, and FaaSFSis unique in adapting POSIX APIs to cloud functions.

Shared ﬁle systems originated with NFS [61] and havesubsequently been subject to extensive research. A con-sistent theme in this work has been achieving both con-sistency and performance, with notable work includingCoda [40], Sprite [53], and V [34]. Ideas from this workhave found their way into into contemporary protocols,including NFSv4 [37] and SMB [13], however these sys-tems are still subject to the limitations discussed in Sec-tion 2.2, as they fundamentally rely on locks, leases,and write-through caching to provide consistency withshared state. Variations on this theme occur in the high-performance computing space, where Lustre [65] is pop-ular, and incorporates intent based locks that allow writeback caching. Another category of shared ﬁle systems in-cludes cluster ﬁle systems like GFS [58] and OCFS2 [29].These ﬁle systems assume that all participants have ac-cess shared block storage, but this makes them vulnerableto misbehaving clients, and they are thus not candidatesfor cloud storage. In the cloud context shared ﬁle systemresearch has focused on back end scalability, especiallyfor metadata, e.g., in Ceph [78], a well as extreme scale,as in Google File System [32, 26], HDFS [67]. DeltaLake [2] is a recent transactional shared ﬁle system de-signed speciﬁcally for analytics workloads. However, itdoes not offer POSIX semantics.

QuickSilver [64, 36] is perhaps the most direct predeces-sor to FaaSFS. It provides operating system support fordistributed transactions. Similar to transactions transpar-ently delineated by cloud function boundaries in FaaSFS,QuickSilver creates a per-process default transaction ifnone was speciﬁed explicitly. However, QuickSilver im-plements blocking transactions rather than using opti-mistic concurrency control, provides weaker isolation,and does not implement client-side caching. It does notexplore any of the performance boosting elements of thiswork.11he Inversion ﬁle system [55] built on top of POST-GRES [70] maps directory data and ﬁle blocks to rela-tional tables, inheriting the isolation guarantees of theunderlying database. Its POSIX compatibility is limitedas it only implemented a small number of basic opera-tions, while adding transactional extensions. Inversionbeneﬁts from caching at the backend database server (viathe buffer pool), but it doesn’t incorporate any client-sidecaching.Various other efforts have sought to combine ﬁle sys-tems and databases. Informix patented the idea of provid-ing a ﬁle system API atop a database backend [17], andsince 2006 Microsoft has shipped TxF a non-shared andnow deprecated [11] transactional ﬁle system with Win-dows. Transactional ﬁle system APIs designed to makecrash recovery simpler have been provided in AvdFS [75],CFS [52], TxFS [41], and TxOS [57]. However these sys-tems do not consider the distributed setting.

A key challenge we encountered in FaaSFS is how statemakes its way to client caches. The database commu-nity has studied this problem, especially in the contextof object databases, and a body of work has been sur-veyed, compared, and categorized [31]. There are alsomiddle-tier caching accelerator implementations such asGanymed [56], which routes read transactions to replicas,and MTCache [50], an extension to SQL Server, whichprovides semantics equivalent to executing transactionsat the database server, but with the twin beneﬁts of of-ﬂoading work from the centralized system and loweringlatency. MTCache relies on materialized views, similar tothe approach used by TimesTen [49]. This work mightprovide useful approaches to updating client caches inFaaSFS. One notable recent system is Sundial [81], whichis particularly close in spirit to our work since it uses op-timistic concurrency control and integrates this concur-rency control mechanism with its caching mechanism, aswe do. Sundial promises improved concurrency, but morework is needed to determine whether its approach can bereconciled with the consistency needs of POSIX work-loads.

FaaSFS as implemented today is a prototype with a mono-lithic in-memory back-end. This has been appropriate forvalidating our design choices and testing them in the thecloud functions environment, including caching, concur-rent transaction conﬂict resolution, yet it remains a re-search artifact. Future work can rely on proven tech-niques for building scalable and distributed transactional systems [24, 71, 1] to create a system that can deployed inpractice.This work has focused on the mechanisms of optimistictransactions, but leaves open a number of policy ques-tions. Cache update policy is one area in which therehave been many proposals [31]. We are also interestedin exploring more optimal techniques for providing strictserializability e.g., with loosely synchronized clocks [14].The choice of consistency model is also open to ex-ploration. While we chose to implement serializabilitywith external consistency, but some applications may runcorrectly with weaker guarantees. The optimistic imple-mentation of snapshot isolation [19] is similar to that forserializability, and in some cases this might sufﬁce. Aninteresting question in this context is under what circum-stances lower isolation still allows lock elision, whichis import for some applications, but which others mightchoose to forego. Weak consistency models, includingeventual consistency methods [16, 66, 28, 23] offer an al-ternative approach [21], and may offer a more appropriateset of guarantees for some applications.

We report on our experience developing FaaSFS, a sharedﬁle system designed speciﬁcally to meet the needs ofserverless applications. FaaSFS brings together thewidely-used POSIX ﬁlesystem API with the operationaland economic beneﬁts of serverless computing. Thisdemonstrates the potential to broaden the scope of server-less computing to encompass some of the most popularapplications in use today, and to welcome developers whoare used to a traditional ﬁle system API.The design of FaaSFS exploits the fact that cloudfunctions are ﬁnite in duration, with clear begin andend points. This allows us to transparently use transac-tion mechanisms to achieve consistency guarantees thatPOSIX applications expect. Transactions also match thecloud function model of reliability, in which function ex-ecutions are the unit of fault tolerance. Atomic com-mit, coupled with automatic retry of idempotent func-tions, gives exactly-once semantics.While transactional ﬁle systems traditionally pay a per-formance penalty, we show that optimistic concurrencycontrol can make transactions perform well in the server-less ﬁle system context. Our evaluation shows that theperformance of FaaSFS is usually comparable to that oftraditional ﬁle systems, and that our POSIX implementa-tion is sufﬁciently complete and compliant to run a varietyof application benchmarks, as well as a full-stack web ap-plication. We hope that FaaSFS will help open up server-less computing to a broader range of applications.12 eferences [1] Cockroach DB. .[2] Delta Lake. https://delta.io/ .[3] DynamoDB. https://aws.amazon.com/dynamodb/ .[4] Filesystem in userspace. https://github.com/libfuse/libfuse .[5] Mezzanine: An open source content managementplatform built using the Django framework. http://mezzanine.jupo.org/ .[6] SQLite. https://sqlite.org/ .[7] SQLite is serverless. .[8] The system call intercepting library. https://github.com/pmem/syscall_intercept .[9] TPC-C. .[10] TPC-C - all results. .[11] Transactional NTFS (TxF). https://docs.microsoft.com/en-us/windows/win32/fileio/transactional-ntfs-portal .[12] Zappa: Serverless Python. https://github.com/Miserlou/Zappa .[13] [MS-SMB]: Server Message Block SMB pro-tocol. https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-SMB/[MS-SMB].pdf , 2018. v20180912.[14] A. Adya, R. Gruber, B. Liskov, and U. Mahesh-wari. Efﬁcient optimistic concurrency control us-ing loosely synchronized clocks.

ACM SIGMODRecord , 24(2):23–34, 1995.[15] A. Adya, B. Liskov, and P. O’Neil. Generalized iso-lation level deﬁnitions. In

Proceedings of 16th Inter-national Conference on Data Engineering (Cat. No.00CB37073) , pages 67–78. IEEE, 2000.[16] P. Bailis and A. Ghodsi. Eventual consistency to-day: Limitations, extensions, and beyond.

Queue ,11(3):20, 2013.[17] I. V. Balabine, R. Kandasamy, and J. A. Skier. Filesystem interface to a database, 1999. US Patent5,937,406. [18] I. Baldini, P. Castro, K. Chang, P. Cheng, S. Fink,V. Ishakian, N. Mitchell, V. Muthusamy, R. Rabbah,A. Slominski, et al. Serverless computing: Currenttrends and open problems. In

Research Advances inCloud Computing , pages 1–20. Springer, 2017.[19] H. Berenson, P. Bernstein, J. Gray, J. Melton,E. O’Neil, and P. O’Neil. A critique of ansi sql iso-lation levels.

ACM SIGMOD Record , 24(2):1–10,1995.[20] P. A. Bernstein and N. Goodman. Concurrency con-trol in distributed database systems.

ACM Comput-ing Surveys (CSUR) , 13(2):185–221, 1981.[21] E. Brewer. CAP twelve years later: How the “rules”have changed.

Computer , (2):23–29, 2012.[22] P. Castro, V. Ishakian, V. Muthusamy, andA. Slominski. The rise of serverless computing.

Communications of the ACM , 62(12):44–54, 2019.[23] N. Conway, W. R. Marczak, P. Alvaro, J. M. Heller-stein, and D. Maier. Logic and lattices for distributedprogramming. In

Proceedings of the Third ACMSymposium on Cloud Computing , page 1. ACM,2012.[24] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost,J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser,P. Hochschild, et al. Spanner: Google’s globally dis-tributed database.

ACM Transactions on ComputerSystems (TOCS) , 31(3):8, 2013.[25] N. Crooks, Y. Pu, L. Alvisi, and A. Clement. See-ing is believing: A client-centric speciﬁcation ofdatabase isolation. In

Proceedings of the ACMSymposium on Principles of Distributed Computing ,pages 73–82, 2017.[26] J. Dean. Evolution and future directions of large-scale storage and computation systems at google.2010.[27] J. Dean and L. A. Barroso. The tail at scale.

Com-munications of the ACM , 56(2):74–80, 2013.[28] G. DeCandia, D. Hastorun, M. Jampani, G. Kakula-pati, A. Lakshman, A. Pilchin, S. Sivasubramanian,P. Vosshall, and W. Vogels. Dynamo: Amazon’shighly available key-value store. In

ACM SIGOPSoperating systems review , volume 41, pages 205–220. ACM, 2007.[29] M. Fasheh. OCFS2: The oracle clustered ﬁle sys-tem, version 2. In

Proceedings of the 2006 LinuxSymposium , volume 1, pages 289–302. Citeseer,2006.1330] S. Fouladi, F. Romero, D. Iter, Q. Li, S. Chatterjee,C. Kozyrakis, M. Zaharia, and K. Winstein. Fromlaptop to Lambda: Outsourcing everyday jobs tothousands of transient functional containers. In , pages 475–488, 2019.[31] M. J. Franklin, M. J. Carey, and M. Livny. Transac-tional client-server cache consistency: Alternativesand performance.

ACM Transactions on DatabaseSystems (TODS) , 22(3):315–363, 1997.[32] S. Ghemawat, H. Gobioff, and S.-T. Leung. Thegoogle ﬁle system. 2003.[33] D. K. Gifford.

Information storage in a decentral-ized computer system . PhD thesis, Stanford Univer-sity, 1981.[34] C. Gray and D. Cheriton. Leases: An efﬁcient fault-tolerant mechanism for distributed ﬁle cache con-sistency.

ACM SIGOPS Operating Systems Review ,23(5), 1989.[35] J. Gray and A. Reuter.

Transaction processing: con-cepts and techniques . Elsevier, 1992.[36] R. Haskin, Y. Malachi, and G. Chan. Recovery man-agement in quicksilver.

ACM Transactions on Com-puter Systems (TOCS) , 6(1):82–108, 1988.[37] T. Haynes and D. Noveck. Network File System(NFS) version 4 protocol. RFC 7530, March2015, https://tools.ietf.org/html/rfc7530 .[38] J. M. Hellerstein, J. Faleiro, J. E. Gonzalez,J. Schleier-Smith, V. Sreekanti, A. Tumanov, andC. Wu. Serverless computing: One step forward,two steps back.

CIDR , 2019.[39] M. P. Herlihy and J. M. Wing. Linearizability: Acorrectness condition for concurrent objects.

ACMTransactions on Programming Languages and Sys-tems (TOPLAS) , 12(3):463–492, 1990.[40] J. H. Howard, M. L. Kazar, S. G. Menees, D. A.Nichols, M. Satyanarayanan, R. N. Sidebotham, andM. J. West. Scale and performance in a distributedﬁle system.

ACM Transactions on Computer Sys-tems (TOCS) , 6(1):51–81, 1988.[41] Y. Hu, Z. Zhu, I. Neal, Y. Kwon, T. Cheng, V. Chi-dambaram, and E. Witchel. TxFS: Leveraging ﬁle-system crash consistency to provide ACID transac-tions.

ACM Trans. Storage , 15(2):1–20, May 2019. [42] A. Jangda, D. Pinckney, Y. Brun, and A. Guha.Formal foundations of serverless computing.

Proc.ACM Program. Lang. , 3(OOPSLA):1–26, Oct.2019.[43] E. Jonas, Q. Pu, S. Venkataraman, I. Stoica, andB. Recht. Occupy the cloud: Distributed computingfor the 99%. In

Proceedings of the 2017 Symposiumon Cloud Computing , pages 445–451. ACM, 2017.[44] E. Jonas, J. Schleier-Smith, V. Sreekanti, C.-C.Tsai, A. Khandelwal, Q. Pu, V. Shankar, J. Car-reira, K. Krauth, N. Yadwadkar, et al. Cloud pro-gramming simpliﬁed: a Berkeley view on server-less computing. UC Berkeley Technical Report No.UCB/EECS-2019-3, 2019.[45] A. Josey, E. Blake, G. Clare, et al. TheOpen Group base speciﬁcations issue 7. https://pubs.opengroup.org/onlinepubs/9699919799/ , 2018.[46] O. Kirch. Why nfs sucks. In

Linux Symposium , vol-ume 2, pages 51–64, 2006.[47] A. Klimovic, Y. Wang, P. Stuedi, A. Trivedi, J. Pfef-ferle, and C. Kozyrakis. Pocket: Elastic ephemeralstorage for serverless analytics. In , pages 427–444, 2018.[48] H.-T. Kung and J. T. Robinson. On optimistic meth-ods for concurrency control.

ACM Transactions onDatabase Systems (TODS) , 6(2):213–226, 1981.[49] T. Lahiri, M.-A. Neimat, and S. Folkman. Oracletimesten: An in-memory database for enterprise ap-plications.

IEEE Data Eng. Bull. , 36(2):6–13, 2013.[50] P.-A. Larson, J. Goldstein, and J. Zhou. MT-cache: Transparent mid-tier database caching inSQL Server. In

Proceedings. 20th InternationalConference on Data Engineering , pages 177–188.IEEE, 2004.[51] K. McKusick and S. Quinlan. GFS: evolution onfast-forward.

Commun. ACM , 53(3):42–49, Mar.2010.[52] C. Min, W.-H. Kang, T. Kim, S.-W. Lee, andY. I. Eom. Lightweight application-level crashconsistency on transactional ﬂash storage. In { USENIX } Annual Technical Conference( { USENIX }{ ATC } , pages 221–234, 2015.[53] M. N. Nelson, B. B. Welch, and J. K. Ouster-hout. Caching in the Sprite network ﬁle system. ACM Transactions on Computer Systems (TOCS) ,6(1):134–154, 1988.1454] E. Oakes, L. Yang, D. Zhou, K. Houck, T. Harter,A. Arpaci-Dusseau, and R. Arpaci-Dusseau. SOCK:Rapid task provisioning with serverless-optimizedcontainers. In , pages 57–70, 2018.[55] M. A. Olson et al. The design and implementation ofthe Inversion ﬁle system. In

USENIX Winter , pages205–218, 1993.[56] C. Plattner and G. Alonso. Ganymed: Scalablereplication for transactional web applications. In

Proceedings of the 5th ACM/IFIP/USENIX interna-tional conference on Middleware , pages 155–174.Springer-Verlag, 2004.[57] D. E. Porter, O. S. Hofmann, C. J. Rossbach,A. Benn, and E. Witchel. Operating system trans-actions. In

Proceedings of the ACM SIGOPS 22ndsymposium on Operating systems principles , SOSP’09, pages 161–176, New York, NY, USA, Oct.2009. Association for Computing Machinery.[58] K. W. Preslan, A. P. Barry, J. E. Brassow, G. M. Er-ickson, E. Nygaard, C. J. Sabol, S. R. Soltis, D. C.Teigland, and M. T. O’Keefe. A 64-bit, shared diskﬁle system for linux. In , pages22–41, Mar. 1999.[59] Q. Pu, S. Venkataraman, and I. Stoica. Shufﬂing,fast and slow: Scalable analytics on serverless in-frastructure. In , pages 193–206, 2019.[60] R. Rajwar and J. R. Goodman. Transactionallock-free execution of lock-based programs.

ACMSIGOPS Operating Systems Review , 36(5):5–17,2002.[61] R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh,and B. Lyon. Design and implementation of the Sunnetwork ﬁlesystem. In

Proceedings of the SummerUSENIX conference , pages 119–130, 1985.[62] S. Scargall.

Introducing the Persistent Memory De-velopment Kit , pages 63–72. Apress, Berkeley, CA,2020.[63] F. Schmuck and R. Haskin. GPFS: A shared-diskﬁle system for large computing clusters.

Proceed-ings of the USENIX Conference on File and StorageTechnologies (FAST) , 2002. [64] F. Schmuck and J. Wylie. Experience with transac-tions in QuickSilver. In

ACM SIGOPS OperatingSystems Review , volume 25, pages 239–253. ACM,1991.[65] P. Schwan et al. Lustre: Building a ﬁle systemfor 1000-node clusters. In

Proceedings of the 2003Linux symposium , volume 2003, pages 380–386,2003.[66] M. Shapiro, N. Preguic¸a, C. Baquero, and M. Za-wirski. Conﬂict-free replicated data types. In

Sym-posium on Self-Stabilizing Systems , pages 386–400.Springer, 2011.[67] K. Shvachko, H. Kuang, S. Radia, R. Chansler, et al.The Hadoop distributed ﬁle system. In , volume 10, pages 1–10, 2010.[68] V. Sreekanti, C. Wu, S. Chhatrapati, J. E. Gonza-lez, J. M. Hellerstein, and J. M. Faleiro. A fault-tolerance shim for serverless computing. In

Pro-ceedings of the Fifteenth European Conference onComputer Systems , pages 1–15, 2020.[69] V. Sreekanti, C. Wu, X. C. Lin, J. Schleier-Smith,J. E. Gonzalez, J. M. Hellerstein, and A. Tumanov.Cloudburst: Stateful Functions-as-a-Service.

VLDB ,13(11):2438–2452, 2020.[70] M. Stonebraker and L. A. Rowe.

The design ofPOSTGRES , volume 15. ACM, 1986.[71] M. Stonebraker and A. Weisberg. The VoltDB mainmemory DBMS.

IEEE Data Eng. Bull. , 36(2):21–27, 2013.[72] V. Tarasov, E. Zadok, and S. Shepler. Filebench:A ﬂexible framework for ﬁle system benchmarking. login: The USENIX Magazine , 41(1):6–12, 2016.[73] A. Thomson and D. J. Abadi. The case for determin-ism in database systems.

Proceedings of the VLDBEndowment , 3(1-2):70–80, 2010.[74] A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam,K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice,T. Kharatishvili, and X. Bao. Amazon Aurora: De-sign considerations for high throughput cloud-nativerelational databases. In

Proceedings of the 2017ACM International Conference on Management ofData , pages 1041–1052, 2017.[75] R. Verma, A. A. Mendez, S. Park, S. S. Mannar-swamy, T. P. Kelly, and C. B. Morrey, III. Failure-atomic updates of application data in a linux ﬁle sys-tem. In { USENIX } Conference on File and torage Technologies ( { FAST } , pages 203–211,2015.[76] T. A. Wagner. Serverless networking isthe next step in the evolution of serverless. https://read.acloud.guru/https-medium-com-timawagner-serverless-networking-the-next-step-in-serverless-evolution-95bc8adaa904 ,2019.[77] L. Wang, M. Li, Y. Zhang, T. Ristenpart, andM. Swift. Peeking behind the curtains of serverlessplatforms. In , pages 133–146, 2018.[78] S. A. Weil, S. A. Brandt, E. L. Miller, D. D.Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed ﬁle system. In Proceedingsof the 7th symposium on Operating systems designand implementation , pages 307–320. USENIX As-sociation, 2006.[79] C. Wu, J. Faleiro, Y. Lin, and J. Hellerstein. Anna:A KVS for any scale.

IEEE Transactions on Knowl-edge and Data Engineering , 2019.[80] C. Wu, V. Sreekanti, and J. M. Hellerstein. Autoscal-ing tiered cloud storage in Anna.

Proceedings of theVLDB Endowment , 12, 2019.[81] X. Yu, Y. Xia, A. Pavlo, D. Sanchez, L. Rudolph,and S. Devadas. Sundial: harmonizing concurrencycontrol and caching in a distributed OLTP databasemanagement system.