[PDF] On Failure Diagnosis of the Storage Stack

Abstract

Diagnosing storage system failures is challenging even for professionals. One example is the "When Solid State Drives Are Not That Solid" incident occurred at Algolia data center, where Samsung SSDs were mistakenly blamed for failures caused by a Linux kernel bug. With the system complexity keeps increasing, such obscure failures will likely occur more often. As one step to address the challenge, we present our on-going efforts called X-Ray. Different from traditional methods that focus on either the software or the hardware, X-Ray leverages virtualization to collects events across layers, and correlates them to generate a correlation tree. Moreover, by applying simple rules, X-Ray can highlight critical nodes automatically. Preliminary results based on 5 failure cases shows that X-Ray can effectively narrow down the search space for failures.

Full PDF

PPosition: On Failure Diagnosis of the Storage Stack

Duo Zhang, Om Rameshwar Gatla, Runzhou Han, Mai Zheng

Iowa State University

Abstract

Diagnosing storage system failures is challenging evenfor professionals. One example is the “When Solid StateDrives Are Not That Solid” incident occurred at Algo-lia data center, where Samsung SSDs were mistakenlyblamed for failures caused by a Linux kernel bug. Withthe system complexity keeps increasing, such obscurefailures will likely occur more often.As one step to address the challenge, we present ouron-going efforts called X-Ray. Different from traditionalmethods that focus on either the software or the hard-ware, X-Ray leverages virtualization to collects eventsacross layers, and correlates them to generate a corre-lation tree. Moreover, by applying simple rules, X-Raycan highlight critical nodes automatically. Preliminaryresults based on 5 failure cases shows that X-Ray caneffectively narrow down the search space for failures.

The storage stack is witnessing a sea-change driven bythe advances in non-volatile memory (NVM) technolo-gies [58, 53, 60, 65, 46, 87, 77, 51, 84]. For example,ﬂash-based solid state drives (SSDs) and persistent mem-ories (PMs) are replacing hard disk drives (HDDs) as thedurable device [93, 2, 73, 31, 12, 4]; NVMe [28] andCXL [7] are redeﬁning the host-device interface; blk-mq [49] alleviates the single queue and lock contentionbottleneck at the block I/O layer; the SCSI subsystemand the Ext4 ﬁle system, which have been tuned forHDDs for decades, are also being adapted for NVM (e.g.,scsi-mq [32, 55, 91] and DAX [9]); in addition, vari-ous NVM-oriented new designs/optimizations have beenproposed (e.g., F2FS [67], NOVA [94], Kevlar [59]),some of which require cohesive modiﬁcations through-out the storage stack (e.g., the TRIM support [34]).The new systems generally offer higher performance.However, as a disruptive technology, the NVM-basedcomponents have to co-exist with the traditional storageecosystem, which is notoriously complex and difﬁcult to OS (Kernel) SSD-1 SSD-2 SSD-3Debian 6.0 (2.6.32) 317 27 0Ubuntu 14.04 (3.16) 88 1 0Table 1:

SSDs exhibit different symptoms when testedon different OSes.

Each cell shows the average numberof errors. Reported by [101]. get right despite decades of efforts [95, 79, 71, 86]. Com-pared with the performance gain, the implication on sys-tem reliability is much less studied or understood.One real example is the “When Solid-State Drives AreNot That Solid” incident occurred in Algolia data cen-ter [31], where a random subset of SSD-based serverscrashed and corrupted ﬁles for unknown reasons. Thedevelopers “spent a big portion of two weeks just isolat-ing machines and restoring data as quickly as possible”.After trying to diagnose almost all software in the stack(e.g., Ext4, mdadm [1]), and switching SSDs from differ-ent vendors, they ﬁnally (mistakenly) concluded that itwas Samsung’s SSDs to blame. Samsung’s SSDs werecriticized and blacklisted, until one month later Samsungengineers found that it was a TRIM-related Linux kernelbug that caused the failure [30].As another example, Zheng et al. studied the behaviorof SSDs under power fault [100]. The testing frameworkbypassed the ﬁle system, but relied on the block I/O layerto apply workloads and check the behavior of devices.In other words, the SSDs were essentially evaluated to-gether with the block I/O layer. Their initial experimentswere performed on Linux kernel v2.6.32, and eight outof ﬁfteen SSDs exhibited a symptom called “serializa-tion errors” [100]. However, in their follow-up workwhere similar experiments were conducted on a newerkernel (v3.16.0) [101], the authors observed that the fail-ure symptoms on some SSDs changed signiﬁcantly (seeTable 1, adapted from [101]). It was eventually con-ﬁrmed that the different symptoms was caused by a sync-related Linux kernel bug [101].One commonality of the two cases above is that peo-

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! a r X i v : . [ c s . O S ] M a y le try to infer the behavior of storage devices indirectly through the operating system (OS) kernel, and they tendto believe that the kernel is correct. This is natural inpractice because users typically have to access storagedevices with the help of the kernel, and they usually donot have the luxury of inspecting the device behavior di-rectly. Also, NVM devices are relatively young com-pared with the long history of the OS kernel, so theymight seem less trustable. We call such common prac-tice as a top-down approach.Nevertheless, both cases show that the OS kernel mayplay a role in causing system failures, while the devicemay be innocent. More strangely, in both cases, differ-ent devices seem to have different sensitivity to the ker-nel bug, and some devices may even “tolerate” the kernelbug. For example, no failure was observed on Intel SSDsin the Algolia case [31], and the SSD-3 in Table 1 neverexhibited any serialization errors in Zheng et al. ’s exper-iments [100]. Since switching devices is one simple andcommon strategy to identify device issues when diagnos-ing system failures, the different sensitivity of devices tothe software bugs can easily drive the investigation to thewrong direction, wasting human efforts and resulting inwrong conclusions, as manifested in the two cases above.In fact, similar confusing and debatable failures arenot uncommon today [29, 33, 27, 16]. With the trend ofstorage devices becoming more capable and more specialfeatures are being exposed to the host-side software [78,41, 36], the interaction between hardware and software isexpected to be more complex. Consequently, analyzingstorage system failures solely based on the existing top-down approach will likely become more problematic. Inother words, new methodologies for diagnosing failuresof the storage stack are much needed.The rest of the paper is organized as follows: First, wediscuss the limitations of existing efforts ( § §

3) and the preliminary results ( § §

5) and concludewith the discussion topics section ( § In this section, we discuss two groups of existing effortsthat may alleviate the challenge of diagnosing storagestack failures to some extent. We defer the discussion ofother related work (e.g., diagnosing performance issuesand distributed systems) to § Great efforts have been made to test the storage softwarein the stack [95, 74, 76, 99], with the goal of expos-ing bugs that could lead to failures. For example, EX-PLODE [95] and B [76] apply fault injections to detectcrash-consistency bugs in ﬁle systems. However, testing tools are generally not suitable for di-agnosing system failures because they typically require awell-controlled environment (e.g., a highly customizedkernel [95, 76]), which may be substantially differentfrom the storage stack that need to be diagnosed. To some extent, failure diagnosis is the reverse processof fault injection testing. Due to the importance, manypractical tools have been built, including the following:

Debuggers [14, 52, 18] are the de facto way to diag-nose system failures. They usually support ﬁne-grainedmanual inspection (e.g., set breakpoints, check memorybytes). However, signiﬁcant human efforts are needed toharness the power and diagnose the storage stack. Themanual effort required will keep increasing as the soft-ware becomes more complex. Also, these tools typicallycannot collect the device information directly.

Software Tracers [26, 3, 10, 57] can collect variousevents from a target system to help understand the be-havior. However, similar to debuggers, they focus onhost-side events only, and usually do not have automa-tion support for failure inspection.

Bus Analyzers [15, 23] are hardware equipments thatcan capture the communication data between a host sys-tem and a device, which are particularly useful for an-alyzing the device behavior. However, since they onlyreport bus-level information, they cannot help much onunderstanding system-level behaviors.Note that both debuggers and software tracers repre-sent the traditional top-down diagnosis approach. On theother hand, bus analyzers have been used to diagnosesome of the most obscure failures that involved host-device interactions [8, 80], but they are not as convenientas the software tools.

Our goal is to help practitioners to narrow down the theroot causes of storage system failures quickly. To thisend, we are exploring a framework called

X-Ray , whichis expected to have the following key features: • Full stack: many critical operations (e.g., sync,TRIM) require cohesive support at both device andhost sides; inspecting only one side (and assumingthe other side is correct) is fundamentally limited; • Isolation: the device information should be col-lected without relying on the host-side software(which may be problematic itself); • Usability: no special hardware or software modi-ﬁcation is needed; manual inspection should be re-duced as much as possible.

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! igure 1:

The X-Ray Approach . The target software stack is hosted in a virtual machine; DevAgent, HostAgent, andX-Explorer are the three main components; the basic mode visualizes a correlation tree for inspection; the advancedmode highlights critical nodes based on rules.

Figure 1 shows an overview of the

X-Ray framework,which includes three major components:

DevAgent , HostAgent , and

X-Explorer .First, we notice that the virtualization technology ismature enough to support unmodiﬁed OS today [25, 19,20]. Moreover, recent research efforts have enabled em-ulating sophisticated storage devices in a virtual ma-chine (VM), including SATA SSDs (e.g., VSSIM [96])and NVMe SSDs (e.g., FEMU [68]). Therefore, wecan leverage virtualization to support cross-layer anal-ysis with high ﬁdelity and no hardware dependence.Speciﬁcally, we host the target storage software stackin a QEMU VM [47]. At the virtual device layer, theDevAgent ( § § § The device-level information is helpful because stor-age failures are often related to the persistent states,and changing persistent states (in)correctly requires(in)correct device command sequences. The DevAgentrecords the information in a command log directly with-out any dependency on the host-side kernel (which mightbe buggy itself), similar to the bus analyzer [15].

SCSI Device.

The Linux kernel communicates with aSCSI device by sending Command Descriptor Blocks(CDBs) over the bus. QEMU maintains a structSCSICommand for each SCSI command, which con-tains a 16-byte buffer (

SCSICommand->buf ) holding theCDB. Every SCSI command type is identiﬁed by the op-code at the beginning of the CDB, and the size of CDBis determined by the opcode. For example, the CDB forthe WRITE 10 command is represented by the ﬁrst 10bytes of the buffer. For simplicity, we always transfer 16bytes from the buffer to the command log and use theopcode to identify valid bytes. QEMU classiﬁes SCSIcommands into either Direct Memory Access (DMA)commands (e.g., READ 10) or Admin commands (e.g.,VERIFY 10), and both are handled in the same way inDevAgent since they share the same structure.

NVMe Device.

QEMU maintains a structNvmeCmd for each NVMe command, and emulatesthe io uring [22, 17] interface to transfer NVMecommands to a NVMe device, The interface deﬁnes twotypes of command queues: submission and completion.The submission queues are further classiﬁed into eitherI/O submission queue or Admin submission queue,which are processed via nvme process sq io and nvme process sq admin in QEMU respectively. TheDevAgent intercepts both queues and records both I/Ocommands and Admin commands, similar to SCSI.

The HostAgent aims to track host-side events to help un-derstand the high level semantics of system activities. Asmentioned in §

2, many tracers have been developed withdifferent tradeoffs [21]. The current prototype of HostA-gent is based on ftrace [13], which has native supporton Linux based on kprobes [3]. We select ftrace [13]because its convenient support on tracing caller-calleerelationship. When

CONFIG FUNCTION GRAPH TRACER is deﬁned, the ftrace graph call routine will storethe pointer to the parent to a ring buffer at function re-turn via the link register, which is ideal for X-Ray. On

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! igure 2:

A partial correlation tree.

The tree includes one syscall (green), 704 kernel functions (white nodes), and3 device commands (blue); the critical path (red) is selected by a simple rule: all ancestors of the command nodes. the other hand, ftrace only records function executiontime instead of the epoch time needed for synchroniza-tion with DevAgent events. To workaround the limita-tion, we modify the front end of ftrace to record theepoch time at system calls, and calculates the epoch timeof kernel functions based on their execution time sincethe corresponding system calls. Another issue we ob-serve is that ftrace may miss executed kernel functions.We are working on improving the completeness.

The events collected by DevAgent and HostAgent arevaluable for diagnosis. However, the quantity is usuallytoo large for manual inspection. Inspired by the visual-ization layer of other diagnosis tools [48, 88, 39, 81], theX-Explorer visualizes the relationships among the eventsand highlights the critical ones.

The TreeBuilder generates a correlation tree to representthe relationships among events in the storage stack. Thetree contains three types of nodes based on the eventsfrom HostAgent and DevAgent: (1) SYSCALL nodesrepresent the system calls invoked in the bug-triggeringworkload; (2) KERNEL nodes represent the internal ker-nel functions involved; (3) CMD nodes represent thecommands observed at the device.There are two types of edges in the tree: (1) the edgesamong SYSCALL and KERNEL nodes represent func-tion invocation relations (i.e., parent and child); (2) theedges between CMD nodes and other nodes represent close relations in terms of time . In other words, thedevice-level events are correlated to the host-side eventsbased on timestamps. While the idea is straightforward,we observe an out-of-order issue caused by virtualiza-tion: the HostAgent timestamp is collected within theVM, while the DevAgent timestamp is collected outsidethe VM; the device commands may appear to occur be-fore the corresponding system calls based on the rawtimestamps. To workaround the issue, we set up an NTPserver [24] at the DevAgent side and perform NTP syn-chronization at the HostAgent Side. We ﬁnd that suchNTP based synchronization may mitigate the timestamp gap to a great extent, as will be shown in §

4. Anotherpotential solution is to modify the dynamic binary trans-lation (DBT) layer of QEMU to minimize the latency.

The correlation tree is typically large due to the com-plexity of the storage stack. Inspired by the rule-baseddiagnosis tools [48], the TreePruner traverses the tree andhighlights the critical paths and nodes (i.e., the paths andnodes of interest) automatically based on a set of rulesstored in the RuleDB, which can be either speciﬁed bythe user or derived from a normal system execution.

User-speciﬁed rules.

Users may specify expected re-lations among system events as rules. For example, thesync-family system calls (e.g., sync , fsync ) should gen-erate SYNC CACHE (SCSI) or FLUSH (NVMe) com-mands to the device, which is crucial for crash con-sistency; similarly, blkdev fsync should be triggeredwhen calling fsync on a raw block device. In addition,users may also specify simple rules to reduce the tree(e.g., all ancestor nodes of WRITE commands).Our current prototype hard-coded a few rules as treetraversal operations based on the failure cases we stud-ied ( § Normal system execution.

Failures are often tricky toreproduce due to different environments (e.g., differentkernel versions) [6]. In other words, failures may not al-ways manifest even under the same bug-triggering work-loads. Based on this observation, and inspired by deltadebugging [97, 75], we may leverage a normal systemexecution as a reference when available.When a normal system is available, we host the corre-sponding software stack in the

X-Ray

VM and build thecorrelation tree under the same bug-triggering workload.For clarity, we name the tree from the normal system ex-ecution as the reference tree , which essentially capturesthe implicit rules among events in the normal execution.By comparing the trees, divergences that cause differentsymptoms can be identiﬁed quickly.

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! igure 3:

Comparison. (a) the critical path from Fig-ure 2; (b) the critical path from a reference tree.

We have built a preliminary prototype of X-Ray and ap-plied it to diagnose 5 failure cases based on real bugsfrom the literature [101, 70, 76, 63]. We discuss one casein details and summarize the results at the end.

Case Study.

Figure 2 shows a partial correlation treefor diagnosing a failure where synchronous writes ap-pear to be committed out of order on a raw block device.The tree starts with a syscall node (green), which trig-gers 704 kernel functions (white nodes) and three devicecommands (blue nodes). The red lines shows the criticalpath and nodes selected by one simple rule: “ancestorsof device commands” (Rule fsync syscall only generates three WRITE (0x2a) com-mands without explicitly sending SYNC CACHE (0x35)command to the device, which is incorrect based onPOSIX. Further investigation conﬁrms that the root causelies in the blkdev fsync node on the critical path.When a normal system is available, X-Ray may helpmore. Figure 3 (b) shows the critical path on a referencetree. Apparently, the SYNC CACHE (0x35) commandappears, and a new function blkdev issue flush isinvolved. By comparison, it is clear that the differencestems from the blkdev fsync node. ID Original Rule ) ( ) ( ) ( )2 34,083 697 328 22( ) ( ) ( ) ( )3 24,355 1,254 1,210 15( ) ( ) ( ) (0.06%)4 273,653 10,230 9,953 40( ) ( ) (3.64%) (0.01%)5 284,618 5,621 5,549 50( ) ( ) ( ) (0.04%)Table 2:

Result Summary.Summary.

Table 2 summarizes the result. BesidesRule

Analyzing Storage Devices.

Many researchers havestudied the behaviors of storage devices in depth, includ-ing both HDDs [43, 44, 56, 64, 82] and SSDs [38, 54, 58,60, 66, 69, 83, 89, 72]. For example, Maneas et al. [72]study the reliability of 1.4 million SSDs deployed in Ne-tApp RAID systems. Generally, these studies providevaluable insights for reasoning complex system failuresinvolving device, which is complementary to X-Ray.

Diagnosing Distributed Systems.

Great efforts havebeen made on tracing and analyzing distributed systems[40, 45, 88, 62, 81, 42]. For example, Aguilera et al. [40]trace network messages and infer causal relationshipsand latencies to diagnose performance issues. Similarto X-Ray, these methods need to align traces. However,their algorithms typically make use of unique featuresof network events (e.g., RPC Send/Receive pairs, IDs inmessage headers), which are not available for X-Ray. Onthe other hand, some statistic based methods [62] are po-tentially applicable when enough traces are collected.

Software Engineering.

Many software engineeringtechniques have been proposed for diagnosing user-levelprograms (e.g., program slicing [37, 92, 98], delta debug-ging [97, 75], checkpoint/re-execution [85, 90]). In gen-eral, applying them directly to the storage stack remainschallenging due to the complexity. On the other hand,some high-level ideas are likely applicable. For example,Sambasiva et al. [81] apply delta debugging to comparerequest ﬂows to diagnose performance problems in UrsaMinor [35], similar to the reference tree part of X-Ray.

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome!

Discussion Topics Section

Emulating Storage Devices.

As mentioned in §

3, so-phisticated SATA/NVMe SSDs have been emulated inQEMU VM [96, 68]. Among others, such efforts areimportant for realizing the VM-based full-stack tracingand diagnosis. However, we do have observed some lim-itations of existing emulated devices, which may affectthe failure reproducing (and thus diagnosis) in VM. Forexample, advanced features like the TRIM operation arenot fully supported on VSSIM or FEMU yet, but the Al-golia failure case [31] requires a TRIM-capable deviceto manifest. As a result, we are not able to reproduce theAlgolia failure in VM. Therefore, emulating storage de-vices precisely would be helpful for the X-Ray approachand/or failure analysis in general, in addition to the otherwell-known beneﬁts [96, 68]. We would like to discusshow to improve the emulation accuracy under practicalconstraints (e.g., conﬁdentiality).

Deriving Rules.

The automation of X-Ray depends onthe rules. The current prototype hard-coded a number ofsimple rules based on our preliminary study and domainknowledge, which is limited. We would like to exploreother implicit rules in the storage stack with other do-main experts. Also, we plan to collect correlation treesfrom normal system executions and apply machine learn-ing algorithms to derive the potential rules. We wouldlike to discuss the feasibility.

Other Usages of X-Ray . We envision that some otheranalysis could be enabled by

X-Ray . For example, withprecise latency and casual relationships among events,we may identify the paths that are critical for I/O perfor-mance, similar to the request ﬂow analysis in distributedsystems [81]. Another possibility is to measure the writeampliﬁcation at different layers across the stack. Wewould like to discuss the opportunities.

Other Challenges of Failure Diagnosis.

There are otherchallenges that are not covered by X-Ray. For exam-ple, X-Ray assumes that there is a bug-triggering work-load that can reliably lead to the failure. In practice,deriving bug-triggering workloads from user workloads(which may be huge or inconvenient to share) is oftentricky [11, 5]. We would like to discuss such challenges.

Sharing Failure Logs.

The cross-layer approach wouldbe most effective for diagnosing obscure failures that in-volve both the OS kernel and the device [31, 101]. Basedon our communications with storage practitioners, suchfailures are not uncommon. However, the details of suchfailures are usually unavailable to the public, which lim-its the use cases that could shape the design of reliabil-ity tools like X-Ray. The availability of detailed failurelogs at scale is critical for moving similar research ef-forts forward. We would like to discuss how to improvelog sharing given constraints (e.g., privacy).

We thank the anonymous reviewers for their insightfulcomments and suggestions. This work was supportedin part by the National Science Foundation (NSF) undergrant CNS-1566554/1855565. Any opinions, ﬁndings,and conclusions or recommendations expressed in thismaterial are those of the author(s) and do not necessarilyreﬂect the views of NSF.

References

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome!

FAST’05: Proceedings of the 4th conference onUSENIX Conference on File and Storage Tech-nologies , 2005.[36] Ahmed Abulila, Vikram Sharma Mailthody, ZaidQureshi, Jian Huang, Nam Sung Kim, JinjunXiong, and Wen mei Hwu. Flatﬂash: Exploit-ing the byte-accessibility of ssds within a uniﬁedmemory-storage hierarchy. In

ASPLOS ’19: Pro-ceedings of the Twenty-Fourth International Con-ference on Architectural Support for Program-ming Languages and Operating Systems , 2019.[37] Hiralal Agrawal, Richard A. DeMillo, and Eu-gene H. Spafford. An execution-backtracking ap-proach to debugging. In

IEEE Software , 1991.[38] Nitin Agrawal, Vijayan Prabhakaran, Ted Wob-ber, John D. Davis, Mark Manasse, and Rina Pan-igrahy. Design Tradeoffs for SSD Performance.In

ATC’08: USENIX 2008 Annual Technical Con-ference , 2008.[39] Marcos K. Aguilera, Jeffrey C Mogul, Janet LynnWiener, Patrick Alexander Reynolds, and AthichaMuthitacharoen. Performance debugging for dis-tributed systems of black boxes. In

ACM SIGOPSOperating Systems Review , 2003.[40] Marcos K. Aguilera, Jeffrey C Mogul, Janet LynnWiener, Patrick Alexander Reynolds, and AthichaMuthitacharoen. Performance debugging for dis-tributed systems of black boxes. In

ACM SIGOPSOperating Systems Review , 2003.[41] Duck-Ho Bae, Insoon Jo, Youra adel Choi, Jooyoung Hwang, Sangyeun Cho, Dong gi Lee, andJaeheon Jeong. 2b-ssd: the case for dual, byte-

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! nd block-addressable solid-state drives. In

ISCA’18: Proceedings of the 45th Annual InternationalSymposium on Computer Architecture , 2018.[42] Paramvir Bahl, Ranveer Chandra, Albert Green-berg, Srikanth Kandula, David Aaron Maltz, andMing Zhang. Towards highly reliable enterprisenetwork services via inference of multi-level de-pendencies. In

ACM SIGCOMM Computer Com-munication Review , 2007.[43] Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Garth R.Goodson, and Bianca Schroeder. An analysis ofdata corruption in the storage stack.

Trans. Stor-age , 4(3):8:1–8:28, November 2008.[44] Lakshmi N. Bairavasundaram, Garth R. Goodson,Shankar Pasupathy, and Jiri Schindler. An analy-sis of latent sector errors in disk drives. In

Pro-ceedings of the 2007 ACM SIGMETRICS Interna-tional Conference on Measurement and Modelingof Computer Systems , SIGMETRICS ’07, pages289–300, New York, NY, USA, 2007. ACM.[45] Paul Barham, Austin Donnelly, Rebecca Isaacs,and Richard Mortier. Using magpie for requestextraction and workload modelling. In

OSDI’04:Proceedings of the 6th conference on Symposiumon Operating Systems Design & Implementation ,2004.[46] Hanmant P Belgal, Nick Righos, Ivan Kalastirsky,Jeff J Peterson, Robert Shiner, and Neal Mielke.A new reliability model for post-cycling chargeretention of ﬂash memories. In

Proceedings ofthe 40th Annual Reliability Physics Symposium ,pages 7–20. IEEE, 2002.[47] Fabrice Bellard. Qemu, a fast and portable dy-namic translator. In

USENIX Annual TechnicalConference, FREENIX Track , volume 41, page 46,2005.[48] Sapan Bhatia, Abhishek Kumar, Marc E. Fiuczyn-ski, and Larry Peterson. Lightweight, high-resolution monitoring for troubleshooting produc-tion systems. In

OSDI’08: Proceedings of the 8thUSENIX conference on Operating systems designand implementation , 2008.[49] Matias Bjorling, Jens Axboe, David Nellans, andPhilippe Bonnet. Linux block io: Introducingmulti-queue ssd access on multi-core systems. In

SYSTOR ’13: Proceedings of the 6th InternationalSystems and Storage Conference . ACM, 2013. [50] James Bornholt, Antoine Kaufmann, JialinLi, Arvind Krishnamurthy, Emina Torlak, andXi Wang. Specifying and checking ﬁle systemcrash-consistency models. In

ASPLOS ’16: Pro-ceedings of the Twenty-First International Confer-ence on Architectural Support for ProgrammingLanguages and Operating Systems , 2016.[51] Adam Brand, Ken Wu, Sam Pan, and David Chin.Novel read disturb failure mechanism induced byFLASH cycling. In

Proceedings of the 31st An-nual Reliability Physics Symposium , pages 127–132. IEEE, 1993.[52] Peter A. Buhr, Martin Karsten, and Jun Shih. Kdb:a multi-threaded debugger for multi-threaded ap-plications. In

SPDT ’96: Proceedings of the SIG-METRICS symposium on Parallel and distributedtools , 1996.[53] Yu Cai, Erich F. Haratsch, Onur Mutlu, and KenMai. Error Patterns in MLC NAND Flash Mem-ory: Measurement, Characterization, and Analy-sis. In

Proceedings of the Conference on Design,Automation and Test in Europe , DATE ’12, pages521–526, San Jose, CA, USA, 2012. EDA Con-sortium.[54] Yu Cai, Gulay Yalcin, Onur Mutlu, Erich FHaratsch, Osman Unsal, Adrian Cristal, and KenMai. Neighbor-cell assisted error correction forMLC NAND ﬂash memories. In

ACM SIG-METRICS Performance Evaluation Review , vol-ume 42, pages 491–504. ACM, 2014.[55] Blake Caldwell. Improving block-level efﬁciencywith scsi-mq. arXiv preprint arXiv:1504.07481 ,2015.[56] Peter M. Chen, Edward K. Lee, Garth A. Gib-son, Randy H. Katz, and David A. Patterson.RAID: high-performance, reliable secondary stor-age.

ACM Comput. Surv. , 26(2):145–185, June1994.[57] Ulfar Erlingsson, Marcus Peinado, Simon Peter,and Mihai Budiu. Fay: Extensible distributed trac-ing from kernels to clusters. In

ACM Transactionson Computer Systems (TOCS) , 2012.[58] Ryan Gabrys, Eitan Yaakobi, Laura M. Grupp,Steven Swanson, and Lara Dolecek. Tacklingintracell variability in TLC ﬂash through tensorproduct codes. In

ISIT’12 , pages 1000–1004,2012.

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome!

59] Vaibhav Gogte, William Wang, Stephan Diestel-horst, Aasheesh Kolli, Peter M. Chen, SatishNarayanasamy, and Thomas F. Wenisch. Soft-ware wear management for persistent memories.In { USENIX } Conference on File and Stor-age Technologies ( { FAST } , 2019.[60] Laura M. Grupp, Adrian M. Caulﬁeld, JoelCoburn, Steven Swanson, Eitan Yaakobi, Paul H.Siegel, and Jack K. Wolf. Characterizing ﬂashmemory: anomalies, observations, and appli-cations. In Proceedings of the 42nd AnnualIEEE/ACM International Symposium on Microar-chitecture , MICRO 42, pages 24–33, New York,NY, USA, 2009. ACM.[61] Haryadi S. Gunawi, Abhishek Rajimwale, An-drea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Sqck: A declarative ﬁle system checker.In

OSDI’08: Proceedings of the 8th USENIX con-ference on Operating systems design and imple-mentation , 2008.[62] Srikanth Kandula, Ratul Mahajan, Patrick DVerkaik, Sharad Agarwal, Jitendra DattatrayaPadhye, and Paramvir Bahl. Detailed diagnosisin enterprise networks. In

ACM SIGCOMM Com-puter Communication Review , 2009.[63] Seulbae Kim, Meng Xu, Sanidhya Kashyap,Jungyeon Yoon, Wen Xu, and Taesoo Kim. Find-ing semantic bugs in ﬁle systems with an exten-sible fuzzing framework. In

SOSP ’19: Proceed-ings of the 27th ACM Symposium on OperatingSystems Principles , pages 147–161. ACM, 2019.[64] Andrew Krioukov, Lakshmi N Bairavasundaram,Garth R Goodson, Kiran Srinivasan, Randy The-len, Andrea C Arpaci-Dusseau, and Remzi HArpaci-Dusseau. Parity lost and parity regained.In

FAST , volume 8, pages 1–15, 2008.[65] H Kurata, K Otsuga, A Kotabe, S Kajiyama, T Os-abe, Y Sasago, S Narumi, K Tokami, S Kamohara,and O Tsuchiya. The impact of random telegraphsignals on the scaling of multilevel ﬂash memo-ries. In

VLSI Circuits, 2006. Digest of Techni-cal Papers. 2006 Symposium on , pages 112–113.IEEE, 2006.[66] H Kurata, K Otsuga, A Kotabe, S Kajiyama, T Os-abe, Y Sasago, S Narumi, K Tokami, S Kamohara,and O Tsuchiya. The impact of random telegraphsignals on the scaling of multilevel ﬂash memo-ries. In

VLSI Circuits, 2006. Digest of Techni-cal Papers. 2006 Symposium on , pages 112–113.IEEE, 2006. [67] Changman Lee, Dongho Sim, Joo-Young Hwang,and Sangyeun Cho. F2fs: A new ﬁle systemfor ﬂash storage. In

Proceedings of the 13thUSENIX Conference on File and Storage Tech-nologies (FAST’15) , pages 273–286, 2015.[68] Huaicheng Li, Mingzhe Hao, Michael Hao Tong,Swaminathan Sundararaman, Matias Bjørling,and Haryadi S Gunawi. The { CASE } of { FEMU } : Cheap, accurate, scalable and exten-sible ﬂash emulator. In { USENIX } Confer-ence on File and Storage Technologies ( { FAST } , pages 83–90, 2018.[69] Jiangpeng Li, Kai Zhao, Xuebin Zhang, Jun Ma,Ming Zhao, and Tong Zhang. How much can datacompressibility help to improve nand ﬂash mem-ory lifetime? In FAST , pages 227–240, 2015.[70] Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H.Arpaci-Dusseau, and Shan Lu. A study of linuxﬁle system evolution. In

Proceedings of the 11thUSENIX Conference on File and Storage Tech-nologies , FAST’13, pages 31–44, 2013.[71] Lanyue Lu, Yupu Zhang, Thanh Do, SamerAl-Kiswany, Andrea C. Arpaci-Dusseau, andRemzi H. Arpaci-Dusseau. Physical disentangle-ment in a container-based ﬁle system. In , pages 81–96,Broomﬁeld, CO, 2014. USENIX Association.[72] Stathis Maneas, Kaveh Mahdaviani, Tim Emami,and Bianca Schroeder. A study of ssd reliabilityin large scale enterprise storage deployments. In { USENIX } Conference on File and StorageTechnologies ( { FAST } , 2020.[73] Justin Meza, Qiang Wu, Sanjeev Kumar, and OnurMutlu. A large-scale study of ﬂash memory fail-ures in the ﬁeld. In ACM SIGMETRICS Perfor-mance Evaluation Review . ACM, 2015.[74] Changwoo Min, Sanidhya Kashyap, Byoungy-oung Lee, Chengyu Song, and Taesoo Kim.Cross-checking semantic correctness: The case ofﬁnding ﬁle system bugs. In

Proceedings of the25th Symposium on Operating Systems Principles ,pages 361–377. ACM, 2015.[75] Ghassan Misherghi and Zhendong Su. Hdd: hi-erarchical delta debugging. In

Proceedings of the28th international conference on Software engi-neering , pages 142–151. ACM, 2006.

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome!

76] Jayashree Mohan, Ashlie Martinez, SoujanyaPonnapalli, Pandian Raju, and Vijay Chi-dambaram. Finding crash-consistency bugs withbounded black-box crash testing. In { USENIX } Symposium on Operating Systems De-sign and Implementation ( { OSDI } , pages 33–50, 2018.[77] T. Ong, A. Frazio, N. Mielke, S. Pan, N. Righos,G. Atwood, and S. Lai. Erratic Erase In ETOX/supTM/ Flash Memory Array. In Symposium on VLSITechnology , VLSI’93, 1993.[78] Xiangyong Ouyang, David Nellans, RobertWipfel, David Flynn, and Dhabaleswar K. Panda.Beyond block i/o: Rethinking traditional storageprimitives. In , 2011.[79] Vijayan Prabhakaran, Lakshmi N. Bairavasun-daram, Nitin Agrawal, Haryadi S. Gunawi, An-drea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. IRON File Systems. In

Proceedings ofthe 20th ACM Symposium on Operating SystemsPrinciples (SOSP’05) , pages 206–220, Brighton,United Kingdom, October 2005.[80] Alma Riska and Erik Riedel. Evaluation ofdisk-level workloads at different time-scales. In , 2009.[81] Raja R. Sambasivan, Alice X. Zheng, Michael DeRosa, and Elie Krevat. Diagnosing performancechanges by comparing request ﬂows. In

NSDI’11: 8th USENIX Symposium on Networked Sys-tems Design and Implementation , 2011.[82] Bianca Schroeder and Garth A. Gibson. Disk fail-ures in the real world: What does an MTTF of1,000,000 hours mean to you? In

Proceedings ofthe 5th USENIX Conference on File and StorageTechnologies (FAST’07) , 2007.[83] Bianca Schroeder, Raghav Lagisetty, and ArifMerchant. Flash reliability in production: The ex-pected and the unexpected. In , pages 67–80, Santa Clara, CA, February2016. USENIX Association.[84] Jihye Seo, Wook-Hee Kim, Woongki Baek,Beomseok Nam, and Sam H Noh. Failure-atomicslotted paging for persistent memory.

ACM SIG-PLAN Notices , 52(4):91–104, 2017. [85] Sudarshan M. Srinivasan, Srikanth Kandula,Christopher R. Andrews, , and Yuanyuan Zhou.Flashback: A lightweight extension for rollbackand deterministic replay for software debugging.In

USENIX 2004 Annual Technical Conference ,2004.[86] Richard Stallman, Roland Pesch, Stan Shebs,et al. Debugging with gdb.

Free Software Foun-dation , 51:02110–1301, 2002.[87] Kang-Deog Suh, Byung-Hoon Suh, Young-HoLim, Jin-Ki Kim, Young-Joon Choi, Yong-NamKoh, Sung-Soo Lee, Suk-Chon Kwon, Byung-Soon Choi, Jin-Sun Yum, Jung-Hyuk Choi, Jang-Rae Kim, and Hyung-Kyu Lim. A 3.3V 32MbNAND ﬂash memory with incremental step pulseprogramming scheme. In

IEEE Journal of Solid-State Circuits , JSSC’95, 1995.[88] Jiaqi Tan, Xinghao Pan, Soila P Kavulya, RajeevGandhi, and Priya Narasimhan. Mochi: visuallog-analysis based tools for debugging hadoop. In

HotCloud’09: Proceedings of the 2009 confer-ence on Hot topics in cloud computing , 2009.[89] Huang-Wei Tseng, Laura M. Grupp, and StevenSwanson. Understanding the impact of power losson ﬂash memory. In

Proceedings of the 48th De-sign Automation Conference (DAC’11) , 2011.[90] Joseph Tucek, Shan Lu, Chengdu Huang, SpirosXanthos, and Yuanyuan Zhou. Triage: diagnosingproduction run failures at the user’s site. In

ACMSIGOPS Operating Systems Review , 2007.[91] Bart Van Assche. Increasing scsi lld driver per-formance by using the scsi multiqueue approach,2015.[92] Mark Weiser. Programmers use slices when de-bugging. In

Communications of the ACM , 1982.[93] Brent Welch and Geoffrey Noer. Optimizing a hy-brid ssd/hdd hpc storage system based on ﬁle sizedistributions. In ,2013.[94] Jian Xu and Steven Swanson. Nova: A log-structured ﬁle system for hybrid volatile/non-volatile main memories. In { USENIX } Con-ference on File and Storage Technologies ( { FAST } , 2016.[95] Junfeng Yang, Can Sar, and Dawson Engler. Ex-plode: a lightweight, general system for ﬁndingserious storage system errors. In Proceedings of

Accepted as Poster at 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’20). Feedback Welcome! he 7th symposium on Operating systems designand implementation , pages 131–146. USENIXAssociation, 2006.[96] Jinsoo Yoo, Youjip Won, Joongwoo Hwang,Sooyong Kang, Jongmoo Choil, Sungroh Yoon,and Jaehyuk Cha. Vssim: Virtual machine basedssd simulator. In ,pages 1–14. IEEE, 2013.[97] Andreas Zeller. Isolating cause-effect chainsfrom computer programs. In

Proceedings of the10th ACM SIGSOFT Symposium on Foundationsof Software Engineering , SIGSOFT ’02/FSE-10,pages 1–10, New York, NY, USA, 2002. ACM.[98] Xiangyu Zhang, Rajiv Gupta, and Youtao Zhang.Precise dynamic slicing algorithms. In

ICSE ’03:Proceedings of the 25th International Conferenceon Software Engineering , 2003.[99] Mai Zheng, Joseph Tucek, Dachuan Huang, FengQin, Mark Lillibridge, Elizabeth S. Yang, Bill WZhao, and Shashank Singh. Torturing Databasesfor Fun and Proﬁt. In , pages 449–464, Broomﬁeld, CO,2014.[100] Mai Zheng, Joseph Tucek, Feng Qin, and MarkLillibridge. Understanding the robustness of SSDsunder power fault. In

Proceedings of the 11thUSENIX Conference on File and Storage Tech-nologies (FAST’13) , 2013.[101] Mai Zheng, Joseph Tucek, Feng Qin, Mark Lillib-ridge, Bill W Zhao, and Elizabeth S. Yang. Re-liability Analysis of SSDs under Power Fault. In

ACM Transactions on Computer Systems (TOCS) ,2017.,2017.