Michael M. Swift | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael M. Swift is active.

Explore More

Publication

Featured researches published by Michael M. Swift.

ACM Transactions on Computer Systems | 2005

Improving the reliability of commodity operating systems

Michael M. Swift; Brian N. Bershad; Henry M. Levy

Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85&percent; of recently reported failures.This article describes Nooks, a reliability subsystem that seeks to greatly enhance operating system (OS) reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to the existing driver and system code. Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a drivers use of kernel resources to facilitate automatic cleanup during recovery.To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. Under a wide range and number of fault conditions, we show that Nooks recovers automatically from 99&percent; of the faults that otherwise cause Linux to crash.While Nooks was designed for drivers, our techniques generalize to other kernel extensions. We demonstrate this by isolating a kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.

international symposium on computer architecture | 2007

Performance pathologies in hardware transactional memory

Jayaram Bobba; Kevin E. Moore; Haris Volos; Luke Yen; Mark D. Hill; Michael M. Swift; David A. Wood

Transactional memory is a promising approach to ease parallel programming. Hardware transactional memory system designs reflect choices along three key design dimensions: conflict detection, version management, and conflict resolution. The authors identify a set of performance pathologies that could degrade performance in proposed HTM designs. Improving conflict resolution could eliminate these pathologies so designers can build robust HTM systems.

architectural support for programming languages and operating systems | 2006

Supporting nested transactional memory in logTM

Michelle J. Moravan; Jayaram Bobba; Kevin E. Moore; Luke Yen; Mark D. Hill; Ben Liblit; Michael M. Swift; David A. Wood

Nested transactional memory (TM) facilitates software composition by letting one module invoke another without either knowing whether the other uses transactions. Closed nested transactions extend isolation of an inner transaction until the toplevel transaction commits. Implementations may flatten nested transactions into the top-level one, resulting in a complete abort on conflict, or allow partial abort of inner transactions. Open nested transactions allow a committing inner transaction to immediately release isolation, which increases parallelism and expressiveness at the cost of both software and hardware complexity.This paper extends the recently-proposed flat Log-based Transactional Memory (LogTM) with nested transactions. Flat LogTM saves pre-transaction values in a log, detects conflicts with read (R) and write (W) bits per cache block, and, on abort, invokes a software handler to unroll the log. Nested LogTM supports nesting by segmenting the log into a stack of activation records and modestly replicating R/W bits. To facilitate composition with nontransactional code, such as language runtime and operating system services, we propose escape actions that allow trusted code to run outside the confines of the transactional memory system.

european conference on computer systems | 2012

FlashTier: a lightweight, consistent and durable storage cache

Mohit Saxena; Michael M. Swift; Yiying Zhang

The availability of high-speed solid-state storage has introduced a new tier into the storage hierarchy. Low-latency and high-IOPS solid-state drives (SSDs) cache data in front of high-capacity disks. However, most existing SSDs are designed to be a drop-in disk replacement, and hence are mismatched for use as a cache. This paper describes FlashTier, a system architecture built upon solid-state cache (SSC), a flash device with an interface designed for caching. Management software at the operating system block layer directs caching. The FlashTier design addresses three limitations of using traditional SSDs for caching. First, FlashTier provides a unified logical address space to reduce the cost of cache block management within both the OS and the SSD. Second, FlashTier provides cache consistency guarantees allowing the cached data to be used following a crash. Finally, FlashTier leverages cache behavior to silently evict data blocks during garbage collection to improve performance of the SSC. We have implemented an SSC simulator and a cache manager in Linux. In trace-based experiments, we show that FlashTier reduces address translation space by 60% and silent eviction improves performance by up to 167%. Furthermore, FlashTier can recover from the crash of a 100GB cache in only 2.4 seconds.

ACM Transactions on Computer Systems | 2006

Recovering device drivers

Michael M. Swift; Muthukaruppan Annamalai; Brian N. Bershad; Henry M. Levy

This article presents a new mechanism that enables applications to run correctly when device drivers fail. Because device drivers are the principal failing component in most systems, reducing driver-induced failures greatly improves overall reliability. Earlier work has shown that an operating system can survive driver failures [Swift et al. 2005], but the applications that depend on them cannot. Thus, while operating system reliability was greatly improved, application reliability generally was not.To remedy this situation, we introduce a new operating system mechanism called a shadow driver. A shadow driver monitors device drivers and transparently recovers from driver failures. Moreover, it assumes the role of the failed driver during recovery. In this way, applications using the failed driver, as well as the kernel itself, continue to function as expected.We implemented shadow drivers for the Linux operating system and tested them on over a dozen device drivers. Our results show that applications and the OS can indeed survive the failure of a variety of device drivers. Moreover, shadow drivers impose minimal performance overhead. Lastly, they can be introduced with only modest changes to the OS kernel and with no changes at all to existing device drivers.

symposium on cloud computing | 2012

More for your money: exploiting performance heterogeneity in public clouds

Benjamin Farley; Ari Juels; Venkatanathan Varadarajan; Thomas Ristenpart; Kevin D. Bowers; Michael M. Swift

Infrastructure-as-a-system compute clouds such as Amazons EC2 allow users to pay a flat hourly rate to run their virtual machine (VM) on a server providing some combination of CPU access, storage, and network. But not all VM instances are created equal: distinct underlying hardware differences, contention, and other phenomena can result in vastly differing performance across supposedly equivalent instances. The result is striking variability in the resources received for the same price. We initiate the study of customer-controlled placement gaming: strategies by which customers exploit performance heterogeneity to lower their costs. We start with a measurement study of Amazon EC2. It confirms the (oft-reported) performance differences between supposedly identical instances, and leads us to identify fruitful targets for placement gaming, such as CPU, network, and storage performance. We then explore simple heterogeneity-aware placement strategies that seek out better-performing instances. Our strategies require no assistance from the cloud provider and are therefore immediately deployable. We develop a formal model for placement strategies and evaluate potential strategies via simulation. Finally, we verify the efficacy of our strategies by implementing them on EC2; our experiments show performance improvements of 5% for a real-world CPU-bound job and 34% for a bandwidth-intensive job.

computer and communications security | 2012

Resource-freeing attacks: improve your cloud performance (at your neighbor's expense)

Venkatanathan Varadarajan; Thawan Kooburat; Benjamin Farley; Thomas Ristenpart; Michael M. Swift

Cloud computing promises great efficiencies by multiplexing resources among disparate customers. For example, Amazons Elastic Compute Cloud (EC2), Microsoft Azure, Googles Compute Engine, and Rack-space Hosting all offer Infrastructure as a Service (IaaS) solutions that pack multiple customer virtual machines (VMs) onto the same physical server. The gained efficiencies have some cost: past work has shown that the performance of one customers VM can suffer due to interference from another. In experiments on a local testbed, we found that the performance of a cache-sensitive benchmark can degrade by more than 80% because of interference from another VM. This interference incentivizes a new class of attacks, that we call resource-freeing attacks (RFAs). The goal is to modify the workload of a victim VM in a way that frees up resources for the attackers VM. We explore in depth a particular example of an RFA. Counter-intuitively, by adding load to a co-resident victim, the attack speeds up a class of cache-bound workloads. In a controlled lab setting we show that this can improve performance of synthetic benchmarks by up to 60% over not running the attack. In the noisier setting of Amazons EC2, we still show improvements of up to 13%.

symposium on operating systems principles | 2009

Tolerating hardware device failures in software

Asim Kadav; Matthew J. Renzelmann; Michael M. Swift

Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.

architectural support for programming languages and operating systems | 2008

The design and implementation of microdrivers

Vinod Ganapathy; Matthew J. Renzelmann; Arini Balakrishnan; Michael M. Swift; Somesh Jha

Device drivers commonly execute in the kernel to achieve high performance and easy access to kernel services. However, this comes at the price of decreased reliability and increased programming difficulty. Driver programmers are unable to use user-mode development tools and must instead use cumbersome kernel tools. Faults in kernel drivers can cause the entire operating system to crash. User-mode drivers have long been seen as a solution to this problem, but suffer from either poor performance or new interfaces that require a rewrite of existing drivers. This paper introduces the Microdrivers architecture that achieves high performance and compatibility by leaving critical path code in the kernel and moving the rest of the driver code to a user-mode process. This allows data-handling operations critical to I/O performance to run at full speed, while management operations such as initialization and configuration run at reduced speed in user-level. To achieve compatibility, we present DriverSlicer, a tool that splits existing kernel drivers into a kernel-level component and a user-level component using a small number of programmer annotations. Experiments show that as much as 65% of driver code can be removed from the kernel without affecting common-case performance, and that only 1-6 percent of the code requires annotations.

acm sigops european workshop | 2002

Nooks: an architecture for reliable device drivers

Michael M. Swift; Steven Martin; Henry M. Levy; Susan J. Eggers

With the enormous growth in processor performance over the last decade, it is clear that reliability, rather than performance, is now the greatest challenge for computer systems research. This is particularly true in the context of Internet services that require 24x7 operation and home computers with no professional administration. While operating system products have matured and become more reliable, they are still the source of a significant number of failures. Furthermore, recent studies show that device drivers are frequently responsible for operating system failures. For example, a study at Stanford University found that Linux drivers have 3 to 7 times the bug frequency as the rest of the OS [4]. An analysis of product support calls for Windows 2000 showed that device drivers accounted for 27% of crashes, compared to 2% for the kernel itself [16].

Explore More