Remzi H. Arpaci-Dusseau

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Remzi H. Arpaci-Dusseau is active.

Explore More

Publication

Featured researches published by Remzi H. Arpaci-Dusseau.

symposium on operating systems principles | 2005

IRON file systems

Vijayan Prabhakaran; Lakshmi N. Bairavasundaram; Nitin Agrawal; Haryadi S. Gunawi; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.

workshop on i/o in parallel and distributed systems | 1999

Cluster I/O with River: making the fast case common

Remzi H. Arpaci-Dusseau; Eric C. Anderson; Noah Treuhaft; David E. Culler; Joseph M. Hellerstein; David A. Patterson; Katherine A. Yelick

We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case — even in the face of nonuniformities in hardware, software, and workload. River is based on two simple design features: a high-performance distributed queue, and a storage redundancy mechanism called graduated declustering. We have implemented a number of data-intensive applications on River, which validate our design with near-ideal performance in a variety of non-uniform performance scenarios.

virtual execution environments | 2008

VMM-based hidden process detection and identification using Lycosid

Stephen Jones; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Use of stealth rootkit techniques to hide long-lived malicious processes is a current and alarming security issue. In this paper, we describe, implement, and evaluate a novel VMM-based hidden process detection and identification service called Lycosid that is based on the cross-view validation principle. Like previous VMM-based security services, Lycosid benefits from its protected location. In contrast top revious VMM-based hidden process detectors, Lycosid obtains guest process information implicitly. Using implicit information reduces its susceptibility to guest evasion attacks and decouples it from specific guest operating system versions and patch levels. The implicit information Lycosid depends on, however, can be noisy and unreliable. Statistical inference techniques like hypothesis testing and line arregression allow Lycosid to trade time for accuracy. Despite low quality inputs, Lycosid provides a robust, highly accurate service usable even insecurity environments where the consequences for wrong decisions can behig.

architectural support for programming languages and operating systems | 2006

Geiger: monitoring the buffer cache in a virtual machine environment

Stephen Jones; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Virtualization is increasingly being used to address server management and administration issues like flexible resource allocation, service isolation and workload migration. In a virtualized environment, the virtual machine monitor (VMM) is the primary resource manager and is an attractive target for implementing system features like scheduling, caching, and monitoring. However, the lackof runtime information within the VMM about guest operating systems, sometimes called the semantic gap, is a significant obstacle to efficiently implementing some kinds of services.In this paper we explore techniques that can be used by a VMM to passively infer useful information about a guest operating systems unified buffer cache and virtual memory system. We have created a prototype implementation of these techniques inside the Xen VMM called Geiger and show that it can accurately infer when pages are inserted into and evicted from a systems buffer cache. We explore several nuances involved in passively implementing eviction detection that have not previously been addressed, such as the importance of tracking disk block liveness, the effect of file system journaling, and the importance of accounting for the unified caches found in modern operating systems.Using case studies we show that the information provided by Geiger enables a VMM to implement useful VMM-level services. We implement a novel working set size estimator which allows the VMM to make more informed memory allocation decisions. We also show that a VMM can be used to drastically improve the hit rate in remote storage caches by using eviction-based cache placement without modifying the application or operating system storage interface. Both case studies hint at a future where inference techniques enable a broad new class of VMM-level functionality.

international conference on management of data | 1997

High-performance sorting on networks of workstations

Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; David E. Culler; Joseph M. Hellerstein; David A. Patterson

We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale SMPs that have traditionally held the performance records. On a 64-node cluster, we sort 6.0 GB in just under one minute, while a 32-node cluster finishes the Datamation benchmark in 2.41 seconds. Our implementations can be applied to a variety of disk, memory, and processor configurations; we highlight salient issues for tuning each component of the system. We evaluate the use of commodity operating systems and hardware for parallel sorting. We find existing OS primitives for memory management and file access adequate. Due to aggregate communication and disk bandwidth requirements, the bottleneck of our system is the workstation I/O bus.

symposium on operating systems principles | 2011

A file is not a file: understanding the I/O behavior of Apple desktop applications

Tyler Harter; Chris Dragga; Michael Vaughn; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

We analyze the I/O behavior of iBench, a new collection of productivity and multimedia application workloads. Our analysis reveals a number of differences between iBench and typical file-system workload studies, including the complex organization of modern files, the lack of pure sequential access, the influence of underlying frameworks on I/O patterns, the widespread use of file synchronization and atomic operations, and the prevalence of threads. Our results have strong ramifications for the design of next generation local and cloud-based storage systems.

ACM Transactions on Storage | 2008

An analysis of data corruption in the storage stack

Lakshmi N. Bairavasundaram; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Garth R. Goodson; Bianca Schroeder

An important threat to reliable storage of data is silent data corruption. In order to develop suitable protection mechanisms against data corruption, it is essential to understand its characteristics. In this article, we present the first large-scale study of data corruption. We analyze corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. We study three classes of corruption: checksum mismatches, identity discrepancies, and parity inconsistencies. We focus on checksum mismatches since they occur the most. We find more than 400,000 instances of checksum mismatches over the 41-month period. We find many interesting trends among these instances, including: (i) nearline disks (and their adapters) develop checksum mismatches an order of magnitude more often than enterprise-class disk drives, (ii) checksum mismatches within the same disk are not independent events and they show high spatial and temporal locality, and (iii) checksum mismatches across different disks in the same storage system are not independent. We use our observations to derive lessons for corruption-proof system design.

symposium on operating systems principles | 2013

Optimistic crash consistency

Vijay Chidambaram; Thanumalayan Sankaranarayana Pillai; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

We introduce optimistic crash consistency, a new approach to crash consistency in journaling file systems. Using an array of novel techniques, we demonstrate how to build an optimistic commit protocol that correctly recovers from crashes and delivers high performance. We implement this optimistic approach within a Linux ext4 variant which we call OptFS. We introduce two new file-system primitives, osync() and dsync(), that decouple ordering of writes from their durability. We show through experiments that OptFS improves performance for many workloads, sometimes by an order of magnitude; we confirm its correctness through a series of robustness tests, showing it recovers to a consistent state after crashes. Finally, we show that osync() and dsync() are useful in atomic file system and database update scenarios, both improving performance and meeting application-level consistency demands.

ACM Transactions on Storage | 2014

A Study of Linux File System Evolution

Lanyue Lu; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Shan Lu

We conduct a comprehensive study of file-system code evolution. By analyzing eight years of Linux file-system changes across 5079 patches, we derive numerous new (and sometimes surprising) insights into the file-system development process; our results should be useful for both the development of file systems themselves as well as the improvement of bug-finding tools.

workshop on hot topics in operating systems | 2001

Fail-stutter fault tolerance

Remzi H. Arpaci-Dusseau; Andrea C. Arpaci-Dusseau

Traditional fault models present system designers with two extremes: the Byzantine fault model, which is general and therefore difficult to apply, and the fail-stop fault model, which is easier to employ but does not accurately capture modern device behavior To address this gap, we introduce the concept of fail-stutter fault tolerance, a realistic and yet tractable fault model that accounts for both absolute failure and a new range of performance failures common in modern components. Systems built under the fail-stutter model will likely perform well, be highly reliable and available, and be easier to manage when deployed.

Explore More