Lakshmi N. Bairavasundaram

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lakshmi N. Bairavasundaram is active.

Explore More

Publication

Featured researches published by Lakshmi N. Bairavasundaram.

measurement and modeling of computer systems | 2007

An analysis of latent sector errors in disk drives

Lakshmi N. Bairavasundaram; Garth R. Goodson; Shankar Pasupathy; Jiri Schindler

The reliability measures in todays disk drive-based storage systems focus predominantly on protecting against complete disk failures. Previous disk reliability studies have analyzed empirical data in an attempt to better understand and predict disk failure rates. Yet, very little is known about the incidence of latent sector errors i.e., errors that go undetected until the corresponding disk sectors are accessed. Our study analyzes data collected from production storage systems over 32 months across 1.53 million disks (both nearline and enterprise class). We analyze factors that impact latent sector errors, observe trends, and explore their implications on the design of reliability mechanisms in storage systems. To the best of our knowledge, this is the first study of such large scale our sample size is at least anorder of magnitude larger than previously published studies and the first one to focus specifically on latent sector errors and their implications on the design and reliability of storage systems.

symposium on operating systems principles | 2005

IRON file systems

Vijayan Prabhakaran; Lakshmi N. Bairavasundaram; Nitin Agrawal; Haryadi S. Gunawi; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.

symposium on operating systems principles | 2011

An empirical study on configuration errors in commercial and open source systems

Zuoning Yin; Xiao Ma; Jing Zheng; Yuanyuan Zhou; Lakshmi N. Bairavasundaram; Shankar Pasupathy

Configuration errors (i.e., misconfigurations) are among the dominant causes of system failures. Their importance has inspired many research efforts on detecting, diagnosing, and fixing misconfigurations; such research would benefit greatly from a real-world characteristic study on misconfigurations. Unfortunately, few such studies have been conducted in the past, primarily because historical misconfigurations usually have not been recorded rigorously in databases. In this work, we undertake one of the first attempts to conduct a real-world misconfiguration characteristic study. We study a total of 546 real world misconfigurations, including 309 misconfigurations from a commercial storage system deployed at thousands of customers, and 237 from four widely used open source systems (CentOS, MySQL, Apache HTTP Server, and OpenLDAP). Some of our major findings include: (1) A majority of misconfigurations (70.0%~85.5%) are due to mistakes in setting configuration parameters; however, a significant number of misconfigurations are due to compatibility issues or component configurations (i.e., not parameter-related). (2) 38.1%~53.7% of parameter mistakes are caused by illegal parameters that clearly violate some format or rules, motivating the use of an automatic configuration checker to detect these misconfigurations. (3) A significant percentage (12.2%~29.7%) of parameter-based mistakes are due to inconsistencies between different parameter values. (4) 21.7%~57.3% of the misconfigurations involve configurations external to the examined system, some even on entirely different hosts. (5) A significant portion of misconfigurations can cause hard-to-diagnose failures, such as crashes, hangs, or severe performance degradation, indicating that systems should be better-equipped to handle misconfigurations.

international symposium on computer architecture | 2004

X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs

Lakshmi N. Bairavasundaram; Muthian Sivathanu; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

RAID storage arrays often possess gigabytes of RAM for caching disk blocks. Currently, most RAID systems use LRU or LRU-like policies to manage these caches. Since these array caches do not recognize the presence of file system buffer caches, they redundantly retain many of the same blocks as those cached by the file system, thereby wasting precious cache space. In this paper, we introduce X-RAY, an exclusive RAID array caching mechanism. X-RAY achieves a high degree of (but not perfect) exclusivity through gray-box methods: by observing which files have been accessed through updates to file system meta-data, X-RAY constructs an approximate image of the contents of the file system cache and uses that information to determine the exclusive set of blocks that should be cached by the array. We use microbenchmarks to demonstrate that X-RAYs prediction of the file system buffer cache contents is highly accurate, and trace-based simulation to show that X-RAY considerably outperforms LRU and performs as well as other more invasive approaches. The main strength of the X-RAY approach is that it is easy to deploy - all performance gains are achieved without changes to the SCSI protocol or the file system above.

measurement and modeling of computer systems | 2006

Semantically-smart disk systems: past, present, and future

Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Lakshmi N. Bairavasundaram; Timothy E. Denehy; Florentina I. Popovici; Vijayan Prabhakaran; Muthian Sivathanu

In this paper we describe research that has been on-going within our group for the past four years on semantically-smart disk systems. A semantically-smart system goes beyond typical block-based storage systems by extracting higher-level information from the stream of traffic to disk; doing so enables new and interesting pieces of functionality to be implemented within low-level storage systems. We first describe the development of our efforts over the past four years, highlighting the key technologies needed to build semantically-smart systems as well as the main weaknesses of our approach. We then discuss future directions in the design and implementation of smarter storage systems.

dependable systems and networks | 2008

Analyzing the effects of disk-pointer corruption

Lakshmi N. Bairavasundaram; Meenali Rungta; Nitin Agrawa; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Michael M. Swift

The long-term availability of data stored in a file system depends on how well it safeguards on-disk pointers used to access the data. Ideally, a system would correct all pointer errors. In this paper, we examine how well corruption-handling techniques work in reality. We develop a new technique called type-aware pointer corruption to systematically explore how a file system reacts to corrupt pointers. This approach reduces the exploration space for corruption experiments and works without source code. We use type-aware pointer corruption to examine Windows NTFS and Linux ext3. We find that they rely on type and sanity checks to detect corruption, and NTFS recovers using replication in some instances. However, NTFS and ext3 do not recover from most corruptions, including many scenarios for which they possess sufficient redundant information, leading to further corruption, crashes, and unmountable file systems. We use our study to identify important lessons for handling corrupt pointers.

Operating Systems Review | 2012

Responding rapidly to service level violations using virtual appliances

Lakshmi N. Bairavasundaram; Gokul Soundararajan; Vipul Mathur; Kaladhar Voruganti; Kiran Srinivasan

One of the key goals in the data center today is providing storage services with service-level objectives (SLOs) for performance metrics such as latency and throughput. Meeting such SLOs is challenging due to the dynamism observed in these environments. In this position paper, we propose dynamic instantiation of virtual appliances, that is, virtual machines with storage functionality, as a mechanism to meet storage SLOs efficiently. In order for dynamic instantiation to be realistic for rapidlychanging environments, it should be automated. Therefore, an important goal of this paper is to show that such automation is feasible. We do so through a caching case study. Specifically, we build the automation framework for dynamically instantiating virtual caching appliances. This framework identifies sets of interfering workloads that can benefit from caching, determines the cache-size requirements of workloads, non-disruptively migrates the application to use the cache, and warms the cache to quickly return to acceptable service levels. We show through an experiment that this approach addresses SLO violations while using resources efficiently.

workshop on storage security and survivability | 2006

Limiting trust in the storage stack

Lakshmi N. Bairavasundaram; Meenali Rungta; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

We propose a framework for examining trust in the storage stack based on different levels of trustworthiness present across different channels of information flow. We focus on corruption in one of the channels, the data channel and as a case study, we apply type-aware corruption techniques to examine Windows NTFS behavior when on-disk pointers are corrupted. We find that NTFS does not verify on-disk pointers thoroughly before using them and that even established error handling techniques like replication are often used ineffectively. Our study indicates the need to more carefully examine how trust is managed within modern file systems.

file and storage technologies | 2008