Analyzing IO Amplification in Linux File Systems
AAnalyzing IO Amplification in Linux File Systems
Jayashree Mohan Rohan Kadekodi Vijay Chidambaram Department of Computer Science, University of Texas at Austin Department of Computer Science, University of Wisconsin Madison
Abstract
We present the first systematic analysis of read, write,and space amplification in Linux file systems. Whilemany researchers are tackling write amplification in key-value stores, IO amplification in file systems has beenlargely unexplored. We analyze data and metadata oper-ations on five widely-used Linux file systems: ext2, ext4,XFS, btrfs, and F2FS. We find that data operations re-sult in significant write amplification (2–32 × ) and thatmetadata operations have a large IO cost. For exam-ple, a single rename requires 648 KB write IO in btrfs.We also find that small random reads result in read am-plification of 2–13 × . Based on these observations, wepresent the CReWS conjecture about the relationship be-tween IO amplification, consistency, and storage spaceutilization. We hope this paper spurs people to designfuture file systems with less IO amplification, especiallyfor non-volatile memory technologies. File systems were developed to enable users to easily andefficiently store and retrieve data. Early file systems suchas the Unix Fast File System [1] and ext2 [2] were sim-ple file systems. To enable fast recovery from crashes,crash-consistency techniques such as journaling [3] andcopy-on-write [4] were incorporated into file systems, re-sulting in file systems such as ext4 [5] and xfs [6]. Mod-ern file systems such as btrfs [7] include features such assnapshots and checksums for data, making the file sys-tem even more complex.While the new features and strong crash-consistencyguarantees have enabled wider adoption of Linux filesystems, it has resulted in the loss of a crucial aspect:efficiency. File systems now maintain a large number ofdata structures on storage, and both data and metadatapaths are complex and involve updating several blockson storage. In this paper, we ask the question: what isthe IO cost of various Linux file-system data and meta-data operations? What is the IO amplification of vari-ous operations on Linux file systems? While this ques-tion is receiving wide attention in the world of key-value stores [8–13] and databases [14], this has been largely ig-nored in file systems. File systems have traditionally op-timized for latency and overall throughput [15–18], andnot on IO or space amplification.We present the first systematic analysis of read, write,and space amplification in Linux file systems. Read am-plification indicates the ratio of total read IO to user datarespectively. For example, if the user wanted to read 4KB, and the file system read 24 KB off storage to satisfythat request, the read amplification is 6 × . Write ampli-fication is defined similarly. Space amplification mea-sures how efficiently the file system stores data: if theuser writes 4 KB, and the file system consumes 40 KBon storage (including data and metadata), the space am-plification is 10 × .We analyze five widely-used Linux file systems thatoccupy different points in the design space: ext2 (nocrash consistency guarantees), ext4 (metadata journal-ing), XFS (metadata journaling), F2FS (log-structuredfile system), and btrfs (copy-on-write file system). Weanalyze the write IO and read IO resulting from vari-ous metadata operations, and the IO amplification arisingfrom data operations. We also analyze these measures fortwo macro-benchmarks: compiling the Linux kernel, andFilebench varmail [19]. We break down write IO cost byIO that was performed synchronously (during fsync() )and IO that was performed during delayed backgroundcheckpointing.We find several interesting results. For data operationssuch as overwriting a file or appending to a file, there wassignificant write amplification (2–32 × ). Small randomreads resulted in a read amplification of 2–8 × , even witha warm cache. Metadata operations such as directory cre-ation or file rename result in significant storage IO: forexample, a single file rename required 12–648 KB to bewritten to storage. Even though ext4 and xfs both im-plement metadata journaling, we find XFS significantlymore efficient for file updates. Similarly, though F2FSand btrfs are implemented based on the log-structuredapproach (copy-on-write is a dual of the log-structuredapproach), we find F2FS to be significantly more effi-cient across all workloads. In fact, in all our experi-ments, btrfs was an outlier, producing the highest read,1 a r X i v : . [ c s . O S ] J u l rite, and space amplification. While this may partlyarise from the new features of btrfs (that other file sys-tems do not provide), the copy-on-write nature of btrfs isalso part of the reason.We find that IO amplification arises due to three mainfactors: the block-based interface, the crash-consistencymechanisms of file systems, and the different data struc-tures maintained by storage to support features such assnapshots. Based on these observations, we introduce the CReWS conjecture. The CReWS conjecture states thatfor a general-purpose file system on a shared storage de-vice, it is impossible to provide strong crash-consistencyguarantees while also minimizing read, write, and spaceamplification. We discuss different designs of file sys-tems, and show that for a general-purpose file system(used by many applications), minimizing write amplifi-cation leads to space amplification. We hope the CReWSconjecture helps guide future file-system designers.With the advent of non-volatile memory technologiessuch as Phase Change Memory [20] that have limitedwrite cycles, file-system designers can no longer ignoreIO amplification. Such technologies offer the byte-basedinterface, which can greatly help to reduce IO amplifi-cation. Data structures can be updated byte-by-byte ifrequired, and the critical metadata operations can be re-designed to have low IO footprint. We hope this paperindicates the current state of IO amplification in Linuxfile systems, and provides a useful guide for the design-ers of future file systems.
We now analyze five Linux file systems which representa variety of file-system designs. First, we present ourmethodology ( § § § We use blktrace [21], dstat [22], and iostat [23]to monitor the block IO trace of different file-system op-erations such as rename() on five different Linux filesystems. These tools allow us to accurately identify thefollowing three metrics.
Write Amplification . The ratio of total storage write IOto the user data. For example, if the user wrote 4 KB, andthat resulted in the file system writing 8 KB to storage,the write amplification is 2. For operations such as filerenames, where there is no user data, we simply reportthe total write IO. Write IO and write amplification bothshould be minimized.
Read Amplification . Similar to write amplification, thisis the ratio of total storage read IO to user-requested data.For example, if the user wants to read 4 KB, and the filesystem reads 12 KB off the storage to serve the read re-quest, the read amplification is 3. We report the total readIO for metadata operations such as file creation. Readamplification should also be minimized.
Space Amplification . The ratio of bytes consumed onstorage to bytes stored by the user. For example, the userwants to store 4 KB. If the file system has to consume20 KB on storage to store 4 KB of user data, the spaceamplification is 5. Space amplification is a measure ofhow efficiently the file system is using storage, and thusshould be minimized. We calculate space amplificationbased on the unique disk locations written to the storage,during the workloads.Note that if the user stores one byte of data, the writeand space amplification is trivially 4096 since the filesystem performs IO in 4096 block-sized units. We as-sume that a careful application will perform read andwrite in multiples of the block size. We also use noatime in mounting the file systems we study. Thus, our re-sults represent amplification that will be observed evenfor careful real-world applications.
We analyze five different Linux file systems. Each ofthese file systems is (or was in the recent past) usedwidely, and represents a different point in the file-systemdesign spectrum. ext2 . The ext2 file system [24] is a simple file systembased on the Unix Fast File System [1]. ext2 does not in-clude machinery for providing crash consistency, insteadopting to fix the file system with fsck after reboot. ext2writes data in place, and stores file metadata in inodes.ext2 uses direct and indirect blocks to find data blocks. ext4 . ext4 [2] builds on the ext2 codebase, but uses jour-naling [3] to provide strong crash-consistency guaran-tees. All metadata is first written to the journal beforebeing checkpointed (written in-place) to the file system.ext4 uses extents to keep track of allocated blocks.
XFS . The XFS [6] file system also uses journaling to pro-vide crash consistency. However, XFS implements jour-naling differently from ext4. XFS was designed to havehigh scalability and parallelism. XFS manages the allo-cated inodes through the inode B+ tree, while the freespace information is managed by B+ trees. The inodeskeep track of their own allocated extents.
F2FS . F2FS [25] is a log-structured file system designedspecifically for solid state drives. Similar to the origi-nal LFS [26], F2FS writes all updates to storage sequen-2ially. The logs in F2FS are composed of multiple sege-ments, with the segment utilization monitored using Seg-ment Information Table (SIT). Additionally, to avoid thewandering tree problem [27], F2FS assigns a node IDto the metadata structures like inodes, direct and indi-rect blocks. The mapping between node ID and the ac-tual blockaddress is maintained in a Node Address Table(NAT), which has to be referred to read data off stor-age, resulting in some overhead. Though data is writtensequentially to the logs, NAT and SIT updates are firstjournaled and then written out in place. btrfs . btrfs [7] is a copy-on-write file system based onB+ trees. The entire file system is composed of differ-ent B+ trees ( e.g., file-system tree, extent tree, checksumtree, etc. ), all emerging from a single tree called as thetree of tree roots. All the metadata of Btrfs is located inthese trees. The file-system tree stores the informationabout all the inodes, while the extent tree holds the meta-data related to each allocated extent. Btrfs uses copy-on-write logging, in which any modification to a B+ treeleaf/node is preceded by copying of the entire leaf/nodeto the log tree.
We measure the read IO, write IO, and space consumedby different file-system operations.
First, we focus on data operations: file read, file over-write, and file append. For such operations, it is easyto calculate write amplification, since the workload in-volves a fixed amount of user data. The results are pre-sented in Table 1.
File Overwrite . The workload randomly seeks to a4KB-aligned location in a 100 MB file, does a 4 KB write(overwriting file data), then calls fsync() to make thedata durable. The workload does 10 MB of such writes.From Table 1, we observe that ext2 has the lowest writeand space amplification, primarily due to the fact that ithas no extra machinery for crash consistency; hence theoverwrites are simply performed in-place. The 2 × writeamplification arises from writing both the data block andthe inode (to reflect modified time). XFS has a similarlow write amplification, but higher space amplificationsince the metadata is first written to the journal. Whencompared to XFS, ext4 has higher write and space am-plification: this is because ext4 writes the superblockand other information into its journal with every trans-action; in other words, XFS journaling is more efficientthan ext4 journaling. Interestingly, F2FS has an efficientimplementation of the copy-on-write technique, leadingto low write and space amplification. The roll-forward Measure ext2 ext4 xfs f2fs btrfs
File Overwrite
Write Amplification 2.00 4.00 2.00 2.66 32.65Space Amplification 1.00 4.00 2.00 2.66 31.17
File Append
Write Amplification 3.00 6.00 2.01 2.66 30.85Space Amplification 1.00 6.00 2.00 2.66 29.77
File Read (cold cache)
Read Amplification 6.00 6.00 8.00 9.00 13.00
File Read (warm cache)
Read Amplification 2.00 2.00 5.00 3.00 8.00
Table 1:
Amplification for Data Operations . The tableshows the read, write, and space amplification incurredby different file systems when reading and writing files. recovery mechanism of F2FS allows F2FS to write onlythe direct node block and data on every fsync() , withother data checkpointed infrequently [25]. In contrast,btrfs has a complex implementation of the copy-on-writetechnique (mostly due to a push to provide more featuressuch as snapshots and stronger data integrity) that leadsto extremely high space and write amplification. Whenbtrfs is mounted with the default mount options that en-able copy-on-write and checksumming of both data andmetadata, we see 32 × write amplification as shown in Ta-ble 1. However, if the checksumming of the user data isdisabled, the write amplification drops to 28 × , and whenthe copy-on-write feature is also disabled for user data(metadata is still copied on write), the write amplifica-tion for overwrites comes down to about 18.6 × . An in-teresting take-away from this analysis is that even if youpre-allocate all your files on these file systems, writeswill still lead to 2–30 × write amplification. File Append . Our next workload appends a 4 KB blockto the end of a file and calls fsync() . The workloaddoes 10 MB of such writes. The appended file is emptyinitially. Our analysis for the file overwrite workloadmostly holds for this workload as well; the main differ-ence is that more metadata (for block allocation) has tobe persisted, thus leading to more write and space am-plification for ext2 and ext4 file systems. In F2FS andxfs, the block allocation information is not persisted atthe time of fsync() , leading to behavior similar to fileoverwrites. Thus, on xfs and f2fs, pre-allocating filesdoes not provide a benefit in terms of write amplifica-tion.We should note that write amplification is high in ourworkloads because we do small writes followed by a3igure 1:
Write Amplification for Various Write Sizes . The figure shows the write amplification observed forwrites of various sizes followed by a fsync() call. fsync() . The fsync() call forces file-system activity,such as committing metadata transactions, which has afixed cost regardless of the size of the write. As Fig-ure 1 shows, as the size of the write increases, the writeamplification drops close to one. Applications which is-sue small writes should take note of this effect: even ifthe underlying hardware does not get benefit from bigsequential writes (such as SSDs), the file system itselfbenefits from larger writes.
File Reads . The workload seeks to a random 4 KBaligned block in a 10 MB and reads one block. In Table 1,we make a distinction between a cold-cache read, and awarm-cache read. On a cold cache, the file read usuallyinvolves reading a lot of file-system metadata: for ex-ample, the directory, the file inode, the super block etc. .On subsequent reads (warm cache), reads to these blockswill be served out of memory. The cold-cache read am-plification is quite high for all the file systems. Even inthe case of simple file systems such as ext2, reading afile requires reading the inode. The inode read triggersa read-ahead of the inode table, increasing the read am-plification. Since the read path does not include crash-consistency machinery, ext2 and ext4 have the same readamplification. The high read amplification of xfs resultsfrom reading the metadata B+ tree and readahead for filedata. F2FS read amplification arises from reading ex-tra metadata structures such as the NAT table and theSIT table [25]. In btrfs, a cold-cache file read involvesreading the Tree of Tree roots, the file-system and thechecksum tree, leading to high read amplification. Ona warm cache, the read amplification of all file systemsgreatly reduces, since global data structures are likely tobe cached in memory. Even in this scenario, there is 2–8 × read amplification for Linux file systems. Measure ext2 ext4 xfs f2fs btrfs
File Create
Write Cost (KB) 24 52 52 16 116 fsync checkpoint
20 24 48 12 48Read Cost (KB) 24 24 32 36 40Space Cost (KB) 24 52 20 16 116
Directory Create
Write Cost (KB) 28 64 80 20 132 fsync checkpoint
24 28 76 12 64Read Cost (KB) 20 20 60 36 60Space Cost (KB) 28 64 54 20 132
File Rename
Write Cost (KB) 12 32 16 20 648 fsync checkpoint
Table 2:
IO Cost for Metadata Operations . The tableshows the read, write, and space IO costs incurred by dif-ferent file systems for different metadata operations. Thewrite cost is broken down into IO at the time of fsync() ,and checkpointing IO performed later.
We now analyze the read and write IO (and space con-sumed) by different file-system operations. We presentfile create, directory create, and file rename. We haveexperimentally verified that the behavior of other meta-data operations, such as file link, file deletion, and direc-tory deletion, are similar to our presented results. Table 2presents the results. Overall, we find that metadata oper-ations are very expensive: even a simple file rename re-sults in the 12–648 KB being written to storage. On stor-age with limited write cycles, a metadata-intensive work-load may wear out the storage quickly if any of these filesystems are used.In many file systems, there is a distinction between IOperformed at the time of the fsync() call, and IO per-formed later in the background. The fsync()
IO is per-formed in the critical path, and thus contributes to user-perceived latency. However, both kinds of IO ultimatelycontribute to write amplification. We show this break-down for the write cost in Table 2.
File Create . The workload creates a new file in a pre-4xisting directory of depth three ( e.g., a/b/c ) and calls fsync() on the parent directory to ensure the creation ispersisted. File creation requires allocating a new inodeand updating a directory, and thus requires 16–116 KBof write IO and 24–40 KB of read IO in the various filesystems. F2FS is the most efficient in terms of write IO(but requires a lot of read IO). Overall, ext2 is the mostefficient in performing file creations. ext2, XFS, andF2FS all strive to perform the minimum amount of IO inthe fsync() critical path. Due to metadata journaling,ext4 writes 28 KB in the critical path. btrfs performs theworst, requiring 116 KB of write IO (68 KB in the criticalpath) and 40 KB in checkpointing IO. The poor perfor-mance of btrfs results from having to update a number ofdata structures, including the file-system tree, the direc-tory index, and backreferences to create a file [7].
Directory Create . The workload creates a new directoryin an existing directory of depth four, and calls fsync() on the parent directory. Directory creation follows a sim-ilar trend to file creation. The main difference is the addi-tional IO in creating the directory itself. As before, btrfsexperience the most write IO cost and read IO cost forthis workload. ext2 and F2FS are the most efficient.
File Rename . The workload renames a file within thesame directory, and calls fsync() on the parent direc-tory to ensure the rename is persisted. Renaming afile requires updating two directories. Performing re-name atomically requires machinery such as journalingor copy-on-write. ext2 is the most efficient, requiringonly 32 KB of IO overall. Renaming a file is a surpris-ingly complex process in btrfs. Apart from linking andunlinking files, renames also change the backreferencesof the files involved. btrfs also logs the inode of everyfile and directory (from the root to the parent directory)involved in the operation. The root directory is persistedtwice, once for unlink, and once for the link. As a re-sult, btrfs is the least efficient, requiring 696 KB of IOto rename a single file. Even if many of these inodes arecached, btrfs renames are significantly less efficient thanin other file systems.
Macro-benchmark: Kernel Compilation . To providea more complete picture of the IO amplification of filesystems, we also measure IO amplification for a macro-benchmark: uncompressing a Linux kernel tarball, andcompiling the kernel. The results are presented in Ta-ble 3. The file systems perform 6.09–6.41 GB of writeIO and 0.23–0.27 GB of read IO. ext2 is the most effi-cient file system, achieving the lowest write and spacecost. Among file systems providing crash-consistencyguarantees, ext4 and XFS perform well, achieving lowerwrite and space cost than the copy-on-write file systemsof F2FS and btrfs. btrfs performs the most write IO, anduses the most space on storage. The kernel compilation
Measure ext2 ext4 xfs f2fs btrfs
Kernel Compilation
Write Cost (GB) 6.09 6.19 6.21 6.38 6.41Read Cost (GB) 0.25 0.24 0.24 0.27 0.23Space Cost (GB) 5.94 6.03 5.96 6.2 6.25
Filebench Varmail
Write Cost (GB) 1.52 1.63 1.71 1.82 2.10Read Cost (KB) 116 96 116 1028 0Space Cost (GB) 1.45 1.57 1.50 1.77 2.02
Table 3:
IO Cost for Macro-benchmarks . The ta-ble shows the read, write, and space IO costs incurredby different file systems when compiling the Linux ker-nel 3.0 and when running the Varmail benchmark in theFilebench suite. workload does not result in lot of write amplification (orvariation between file systems), because the fsync() isnot called often; thus each file system is free to grouptogether operations to reduce IO and space cost. Even inthis scenario, the higher write and space amplification ofbtrfs is observed.
Macro-benchmark: Filebench Varmail . We ranthe Varmail benchmark from the Filebench benchmarksuite [19] with the following parameters: 16 threads,total files 100K, mean file size 16 KB. Varmail simu-lates a mail server, and performs small writes followedby fsync() on different files using multiple threads. Inthis fsync() -heavy workload, we see that the effects ofwrite, read, and space amplification are clear. ext2 stillperforms the least IO and uses the least storage space.btrfs performs 38% more write IO than ext2, and uses39% more space on storage. F2FS performs better thanbtrfs, but has a high read cost (10 × other file systems). Discussion . IO and space amplification arises in Linuxfile systems due to using the block interface, from crash-consistency techniques, and the need to maintain and up-date a large number of data structures on storage. Com-parison of XFS and ext4 shows that even when the samecrash-consistency technique (journaling) is used, the im-plementation leads to a significant difference in IO am-plification. With byte-addressable non-volatile memorytechnologies arriving on the horizon, using such block-oriented file systems will be disastrous. We need to de-velop lean, efficient file systems where operations suchas file renames will result in a few bytes written to stor-age, not tens to hundreds of kilobytes.5
The CReWS Conjecture
Inspired by the RUM conjecture [28] from the world ofkey-value stores, we propose a similar conjecture for filesystems: the CReWS conjecture . The CReWS conjecture states that it is impossible fora general-purpose file system to provide strong crash(C)onsistency guarantees while simultaneously achiev-ing low (R)ead amplification, (W)rite amplification, and(S)pace amplification.
By a general-purpose file system we mean a file sys-tem used by multiple applications on a shared storagedevice. If the file system can be customized for a singleapplication on a dedicated storage device, we believe itis possible to achieve the other four properties simulta-neously.For example, consider a file system designed specifi-cally for an append-only log such as Corfu [29] (withoutthe capability to delete blocks). The storage device isdedicated for the append-only log. In this scenario, thefile system can drop all metadata and treat the device asa big sequential log; storage block 0 is block 0 of theappend-only log, and so on. Since there is no metadata,the file system is consistent at all times implicitly, andthere is low write, read, and space amplification. How-ever, this only works if the storage device is completelydedicated to one application.Note that we can extend our simple file-system to acase where there are N applications. In this case, wewould divide up the storage into N units, and assign oneunit to each application. For example, lets say we divideup a 100 GB disk for 10 applications. Even if an appli-cation only used one byte, the rest of its 10 GB is notavailable to other applications; thus, this design leads tohigh space amplification.In general, if multiple applications want to share a sin-gle storage device without space amplification, dynamicallocation is required. Dynamic allocation necessitatesmetadata keeping track of resources which are available;if file data can be dynamically located, metadata suchas the inode is required to keep track of the data loca-tions. The end result is a simple file system such asext2 [24] or NoFS [15]. While such systems offer lowread, write, and space amplification, they compromiseon consistency: ext2 does not offer any guarantees on acrash, and a crash during a file rename on NoFS couldresult in the file disappearing.File systems that offer strong consistency guaranteessuch as ext4 and btrfs incur significant write amplifica-tion and space amplification, as we have shown in pre-vious sections. Thus, to the best of our knowledge, theCReWS conjecture is true. We spent some time trying to come up with something cool likeRUM, but alas, this is the best we could do
Implications . The CReWs conjecture has useful impli-cations for the design of storage systems. If we seekto reduce write amplification for a specific applicationsuch as a key-value store, it is essential to sacrifice oneof the above aspects. For example, by specializing thefile system to a single application, it is possible to min-imize the three amplification measures. For applica-tions seeking to minimize space amplification, the filesystem design might sacrifice low read amplification orstrong consistency guarantees. For non-volatile mem-ory file systems [30, 31], given the limited write cyclesof non-volatile memory [32], file systems should be de-signed to trade space amplification for write amplifica-tion; given the high density of non-volatile memory tech-nologies [20, 33–36], this should be acceptable. Thus,given a goal, the CReWS conjecture focuses our atten-tion on possible avenues to achieve it.
We analyze the read, write, and space amplification offive Linux file systems. We find that all examined filesystems have high write amplification (2–32 × ) and readamplification (2–13 × ). File systems that use crash-consistency techniques such as journaling and copy-on-write also suffer from high space amplification (2–30 × ).Metadata operations such as file renames have large IOcost, requiring 32–696 KB of IO for a single rename.Based on our results, we present the CReWS conjec-ture: that a general-purpose file system cannot simulta-neously achieve low read, write, and space amplificationwhile providing strong consistency guarantees. With theadvent of byte-addressable non-volatile memory tech-nologies, we need to develop leaner file systems withoutsignificant IO amplification: the CReWS conjecture willhopefully guide the design of such file systems. References [1] Marshall K McKusick, William N Joy, Samuel J Leffler, andRobert S Fabry. A fast file system for unix.
ACM Transactionson Computer Systems (TOCS) , 2(3):181–197, 1984.[2] Avantika Mathur, Mingming Cao, Suparna Bhattacharya, An-dreas Dilger, Alex Tomas, and Laurent Vivier. The new ext4filesystem: current status and future plans. In
Proceedings ofthe Linux symposium , volume 2, pages 21–33. Citeseer, 2007.[3] Robert Hagmann. Reimplementing the Cedar file system usinglogging and group commit. In
SOSP , 1987.[4] Dave Hitz, James Lau, and Michael Malcolm. File System De-sign for an NFS File Server Appliance. In
Proceedings of theUSENIX Winter Technical Conference (USENIX Winter ’94) , SanFrancisco, California, January 1994.[5] Avantika Mathur, Mingming Cao, Suparna Bhattacharya,Alex Tomas Andreas Dilge and, and Laurent Vivier. The NewExt4 filesystem: Current Status and Future Plans. In
OttawaLinux Symposium (OLS ’07) , Ottawa, Canada, July 2007.
6] Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, MikeNishimoto, and Geoff Peck. Scalability in the xfs file system. In
USENIX Annual Technical Conference , volume 15, 1996.[7] Ohad Rodeh, Josef Bacik, and Chris Mason. Btrfs: The linuxb-tree filesystem.
ACM Transactions on Storage (TOS) , 9(3):9,2013.[8] Michael A Bender, Martin Farach-Colton, Jeremy T Fineman,Yonatan R Fogel, Bradley C Kuszmaul, and Jelani Nelson.Cache-oblivious streaming b-trees. In
Proceedings of the nine-teenth annual ACM symposium on Parallel algorithms and archi-tectures , pages 81–92. ACM, 2007.[9] Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala,and Raju Rangaswami. Nvmkv: a scalable, lightweight, ftl-awarekey-value store. In , pages 207–219, 2015.[10] Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. Lsm-trie: Anlsm-tree-based ultra-large key-value store for small data items. In ,pages 71–82, 2015.[11] Russell Sears and Raghu Ramakrishnan. blsm: a general purposelog structured merge tree. In
Proceedings of the 2012 ACM SIG-MOD International Conference on Management of Data , pages217–228. ACM, 2012.[12] Pradeep J Shetty, Richard P Spillane, Ravikant R Malpani,Binesh Andrews, Justin Seyster, and Erez Zadok. Buildingworkload-independent storage with vt-trees. In
Presented as partof the 11th USENIX Conference on File and Storage Technologies(FAST 13) , pages 17–30, 2013.[13] Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea CArpaci-Dusseau, and Remzi H Arpaci-Dusseau. Wisckey: sepa-rating keys from values in ssd-conscious storage. In , pages133–148, 2016.[14] Percona TokuDB. .[15] Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau,and Remzi H. Arpaci-Dusseau. Consistency Without Ordering. In
Proceedings of the 10th USENIX Symposium on File and StorageTechnologies (FAST ’12) , pages 101–116, San Jose, California,February 2012.[16] Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, An-drea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Opti-mistic Crash Consistency. In
Proceedings of the 24th ACM Sym-posium on Operating Systems Principles (SOSP ’13) , Farming-ton, PA, November 2013.[17] Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan,Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, andRemzi H. Arpaci-Dusseau. Application crash consistency andperformance with ccfs. In , pages 181–196, Santa Clara,CA, 2017. USENIX Association.[18] William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, JohnEsmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneen-dra Reddy, Leif Walsh, et al. Betrfs: A right-optimized write-optimized file system. In
FAST , pages 301–315, 2015.[19] Andrew Wilson. The new and improved filebench. In , 2008.[20] Simone Raoux, Geoffrey W. Burr, Matthew J. Breitwisch,Charles T. Rettner, Y. C. Chen, Robert M. Shelby, Martin Salinga,Daniel Krebs, S. H. Chen, H.L. Lung, and C. H. Lam. Phase-change random access memory: A scalable technology.
IBMJournal of Research and Development , 52(4.5):465–479, 2008. [21] Block I/O Layer Tracing. https://linux.die.net/man/8/blktrace , December 2016.[22] Generating System Resource Statisting. https://linux.die.net/man/1/dstat , December 2016.[23] Reporting I/O Statistics. https://linux.die.net/man/1/iostat , December 2016.[24] Remy Card, Theodore Ts’o, and Stephen Tweedie. Design andImplementation of the Second Extended Filesystem. In
FirstDutch International Symposium on Linux , Amsterdam, Nether-lands, December 1994.[25] Changman Lee, Dongho Sim, Joo Young Hwang, and SangyeunCho. F2fs: A new file system for flash storage. In
FAST , pages273–286, 2015.[26] Mendel Rosenblum and John K Ousterhout. The design and im-plementation of a log-structured file system.
ACM Transactionson Computer Systems (TOCS) , 10(1):26–52, 1992.[27] Artem B Bityutskiy. Jffs3 design issues, 2005.[28] Manos Athanassoulis, Michael S Kester, Lukas M Maas, RaduStoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan.Designing access methods: The rum conjecture. In
InternationalConference on Extending Database Technology , pages 461–466,2016.[29] Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, TedWobber, Michael Wei, and John D Davis. Corfu: A shared logdesign for flash clusters. In , pages1–14, 2012.[30] Jian Xu and Steven Swanson. NOVA: a log-structured file systemfor hybrid volatile/non-volatile main memories. In
FAST , 2016.[31] Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy,Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jack-son. System software for persistent memory. In
Proceedings ofthe Ninth European Conference on Computer Systems , page 15.ACM, 2014.[32] Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. A durableand energy efficient main memory using phase change memorytechnology. In
ACM SIGARCH computer architecture news , vol-ume 37, pages 14–23. ACM, 2009.[33] Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun,J Jianhua Yang, and Hai Li. Emerging non-volatile mem-ories: opportunities and challenges. In
Proceedings of theseventh IEEE/ACM/IFIP International Conference on Hard-ware/Software Codesign and System Synthesis , pages 325–334,2011.[34] Yenpo Ho, Garng M Huang, and Peng Li. Nonvolatile Memris-tor Memory: Device Characteristics and Design Implications. In
Proceedings of the 2009 International Conference on Computer-Aided Design , pages 485–490. ACM, 2009.[35] Dmitri B. Strukov, Gregory S. Snider, Duncan R. Stewart, andR. Stanley Williams. The missing memristor found.
Nature ,2008.[36] Leon Chua. Resistance switching memories are memristors.
Ap-plied Physics A , 102(4):765–783, 2011., 102(4):765–783, 2011.