Swaminathan Sundararaman

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swaminathan Sundararaman is active.

Explore More

Publication

Featured researches published by Swaminathan Sundararaman.

acm international conference on systems and storage | 2013

HEC: improving endurance of high performance flash-based cache devices

Jingpei Yang; Ned D. Plasson; Greg Gillis; Nisha Talagala; Swaminathan Sundararaman; Robert B. Wood

Flash memory is widely used for its fast random I/O access performance in a gamut of enterprise storage applications. However, due to the limited endurance and asymmetric write performance of flash memory, minimizing writes to a flash device is critical for both performance and endurance. Previous studies have focused on flash memory as a candidate for primary storage devices; little is known about its behavior as a Solid State Cache (SSC) device. In this paper, we propose HEC, a High Endurance Cache that aims to improve overall device endurance via reduced media writes and erases while maximizing cache hit rate performance. We analyze the added write pressures that cache workloads place on flash devices and propose optimizations at both the cache and flash management layers to improve endurance while maintaining or increasing cache hit rate. We demonstrate the individual and cumulative contributions of cache admission policy, cache eviction policy, flash garbage collection policy, and flash device configuration on a) hit rate, b) overall writes, and c) erases as seen by the SSC device. Through our improved cache and flash optimizations, 83% of the analyzed workload ensembles achieved increased or maintained hit rate with write reductions up to 20x, and erase count reductions up to 6x.

ACM Transactions on Storage | 2010

Membrane: Operating system support for restartable file systems

Swaminathan Sundararaman; Sriram Subramanian; Abhishek Rajimwale; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Michael M. Swift

We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system failures, and does so while remaining transparent to running applications; upon failure, the file system restarts, its state is restored, and pending application requests are serviced as if no failure had occurred. Membrane provides transparent recovery through a lightweight logging and checkpoint infrastructure, and includes novel techniques to improve performance and correctness of its fault-anticipation and recovery machinery. We tested Membrane with ext2, ext3, and VFAT. Through experimentation, we show that Membrane induces little performance overhead and can tolerate a wide range of file system crashes. More critically, Membrane does so with little or no change to existing file systems, thus improving robustness to crashes without mandating intrusive changes to existing file-system code.

ACM Transactions on Storage | 2017

Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs

Shiqin Yan; Huaicheng Li; Mingzhe Hao; Michael Hao Tong; Swaminathan Sundararaman; Andrew A. Chien; Haryadi S. Gunawi

Flash storage has become the mainstream destination for storage users. However, SSDs do not always deliver the performance that users expect. The core culprit of flash performance instability is the well-known garbage collection (GC) process, which causes long delays as the SSD cannot serve (blocks) incoming I/Os, which then induces the long tail latency problem. We present ttFlash as a solution to this problem. ttFlash is a “tiny-tail” flash drive (SSD) that eliminates GC-induced tail latencies by circumventing GC-blocked I/Os with four novel strategies: plane-blocking GC, rotating GC, GC-tolerant read, and GC-tolerant flush. These four strategies leverage the timely combination of modern SSD internal technologies such as powerful controllers, parity-based redundancies, and capacitor-backed RAM. Our strategies are dependent on the use of intra-plane copyback operations. Through an extensive evaluation, we show that ttFlash comes significantly close to a “no-GC” scenario. Specifically, between the 99 and 99.99th percentiles, ttFlash is only 1.0 to 2.6× slower than the no-GC case, while a base approach suffers from 5–138× GC-induced slowdowns.

european conference on computer systems | 2014

Snapshots in a flash with ioSnap

Sriram Subramanian; Swaminathan Sundararaman; Nisha Talagala; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

Snapshots are a common and heavily relied upon feature in storage systems. The high performance of flash-based storage systems brings new, more stringent, requirements for this classic capability. We present ioSnap, a flash optimized snapshot system. Through careful design exploiting common snapshot usage patterns and flash oriented optimizations, including leveraging native characteristics of Flash Translation Layers, ioSnap delivers low-overhead snapshots with minimal disruption to foreground traffic. Through our evaluation, we show that ioSnap incurs negligible performance overhead during normal operation, and that common-case operations such as snapshot creation and deletion incur little cost. We also demonstrate techniques to mitigate the performance impact on foreground I/O during intensive snapshot operations such as activation. Overall, ioSnap represents a case study of how to integrate snapshots into a modern, well-engineered flash-based storage system.

european conference on computer systems | 2011

Refuse to crash with Re-FUSE

Swaminathan Sundararaman; Laxman Visampalli; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

We introduce Re-FUSE, a framework that provides support for restartable user-level file systems. Re-FUSE monitors the user-level file-system and on a crash transparently restarts the file system and restores its state; the restart process is completely transparent to applications. Re-FUSE provides transparent recovery through a combination of novel techniques, including request tagging, system-call logging, and non-interruptible system calls.We tested Re-FUSE with three popular FUSE file systems: NTFS-3g, SSHFS, and AVFS. Through experimentation, we show that Re-FUSE induces little performance overhead and can tolerate a wide range of file-system crashes. More critically, Re-FUSE does so with minimal modification of existing FUSE file systems, thus improving robustness to crashes without mandating intrusive changes.

workshop on storage security and survivability | 2007

Exploiting type-awareness in a self-recovering disk

Kiron Vijayasankar; Gopalan Sivathanu; Swaminathan Sundararaman; Erez Zadok

Data recoverability in the face of partial disk errors is an important prerequisite in modern storage. We have designed and implemented a prototype disk system that automatically ensures the integrity of stored data, and transparently recovers vital data in the event of integrity violations. We show that by using pointer knowledge, effective integrity assurance can be performed inside a block-based disk with negligible performance overheads. We also show how semantics-aware replication of blocks can help improve the recoverability of data in the event of partial disk errors with small space overheads. Our evaluation results show that for normal user workloads, our disk system has a performance overhead of only 1-5% compared to traditional disks.

symposium on operating systems principles | 2010

Why panic()?: improving reliability with restartable file systems

Swaminathan Sundararaman; Sriram Subramanian; Abhishek Rajimwale; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Michael M. Swift

The file system is one of the most critical components of the operating system. Almost all applications running in the operating system require file systems to be available for their proper operation. Though file-system availability is critical in many cases, very little work has been done on tolerating file system crashes. In this paper, we propose Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system failures and does so while remaining transparent to running applications; upon failure, the file system restarts, its state is restored, and pending application requests are serviced as if no failure had occurred. Our initial evaluation ofMembrane with ext2 shows thatMembrane induces little performance overhead and can tolerate a wide range of file system crashes. More critically, Membrane does so with few changes to ext2, thus improving robustness to crashes without mandating intrusive changes to existing filesystem code.

symposium on operating systems principles | 2015

Mjölnir: collecting trash in a demanding new world

Zev Weiss; Sriram Subramanian; Swaminathan Sundararaman; Vinay Sridhar; Nisha Talagala; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau

As flash devices become ubiquitous in data centers and cost per gigabyte drops, flash systems need to provide data services similar to those of traditional storage. We present Mjölnir, a powerful and scalable engine that addresses the core problems that make efficient flash based data services challenging: multi-reference management and garbage collection. Additionally, by providing powerful primitives for address remapping, Mjölnir enables redesign of the I/O stack for greater efficiency and performance with flash. Mjölnir uses techniques from language runtimes for reference management and garbage collection; we show via prototype and experimental evaluation that this design can deliver predictable performance even with varied user workloads across a range of capacity and reference-count scales.

ACM Transactions on Storage | 2018

Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Haryadi S. Gunawi; Riza O. Suminto; Russell Sears; Casey Golliher; Swaminathan Sundararaman; Xing Lin; Tim Emami; Weiguang Sheng; Nematollah Bidokhti; Caitie McCaffrey; Deepthi Srinivasan; Biswaranjan Panda; Andrew Baptist; Gary Grider; Parks Fields; Kevin Harms; Robert B. Ross; Andree Jacobson; Robert Ricci; Kirk Webb; Peter Alvaro; H. Birali Runesha; Mingzhe Hao; Huaicheng Li

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.

symposium on operating systems principles | 2015

Towards software defined persistent memory: rethinking software support for heterogenous memory architectures

Swaminathan Sundararaman; Nisha Talagala; Dhananjoy Das; Amar Mudrankit; Dulcardo Arteaga

The emergence of persistent memories promises a sea-change in application and data center architectures, with efficiencies and performance not possible with todays volatile DRAM and persistent slow storage. We present Software Defined Persistent Memory, an approach that enables applications to use persistent memory in a variety of local and remote configurations. The heterogeneity is managed by a middleware that manages hardware specific needs and optimizations. We present the first ever design and implementation of such an architecture, and illustrate the key abstractions that are needed to hide hardware specific details from applications while exposing necessary characteristics for performance optimization. We evaluate the performance of our implementation on a set of microbenchmarks and database workloads using the MySQL database. Through our evaluation, we show that it is possible to apply Software Defined concepts to persistent memory, to improve performance while retaining functionality and optimizing for different hardware architectures.

Explore More