Adrian M. Caulfield | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adrian M. Caulfield is active.

Explore More

Publication

Featured researches published by Adrian M. Caulfield.

architectural support for programming languages and operating systems | 2011

NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories

Joel Coburn; Adrian M. Caulfield; Ameen Akel; Laura M. Grupp; Rajesh K. Gupta; Ranjit Jhala; Steven Swanson

Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent. We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

international symposium on microarchitecture | 2009

Characterizing flash memory: anomalies, observations, and applications

Laura M. Grupp; Adrian M. Caulfield; Joel Coburn; Steven Swanson; Eitan Yaakobi; Paul H. Siegel; Jack K. Wolf

Despite flash memorys promise, it suffers from many idiosyncrasies such as limited durability, data integrity problems, and asymmetry in operation granularity. As architects, we aim to find ways to overcome these idiosyncrasies while exploiting flash memorys useful characteristics. To be successful, we must understand the trade-offs between the performance, cost (in both power and dollars), and reliability of flash memory. In addition, we must understand how different usage patterns affect these characteristics. Flash manufacturers provide conservative guidelines about these metrics, and this lack of detail makes it difficult to design systems that fully exploit flash memorys capabilities. We have empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability. We demonstrate that performance varies significantly between vendors, devices, and from publicly available datasheets. We also demonstrate and quantify some unexpected device characteristics and show how we can use them to improve responsiveness and energy consumption of solid state disks by 44% and 13%, respectively, as well as increase flash device lifetime by 5.2x.

international symposium on computer architecture | 2014

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam; Adrian M. Caulfield; Eric S. Chung; Derek Chiou; Kypros Constantinides; John Demme; Hadi Esmaeilzadeh; Jeremy Fowers; Gopi Prashanth Gopal; Jan Gray; Michael Haselman; Scott Hauck; Stephen Heil; Amir Hormati; Joo-Young Kim; Sitaram Lanka; James R. Larus; Eric C. Peterson; Simon Pope; Aaron Smith; Jason Thong; Phillip Yi Xiao; Doug Burger

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6×8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution-or, while maintaining equivalent throughput, reduces the tail latency by 29%.

architectural support for programming languages and operating systems | 2009

Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications

Adrian M. Caulfield; Laura M. Grupp; Steven Swanson

As our society becomes more information-driven, we have begun to amass data at an astounding and accelerating rate. At the same time, power concerns have made it difficult to bring the necessary processing power to bear on querying, processing, and understanding this data. We describe Gordon, a system architecture for data-centric applications that combines low-power processors, flash memory, and data-centric programming systems to improve performance for data-centric applications while reducing power consumption. The paper presents an exhaustive analysis of the design space of Gordon systems, focusing on the trade-offs between power, energy, and performance that Gordon must make. It analyzes the impact of flash-storage and the Gordon architecture on the performance and power efficiency of data-centric applications. It also describes a novel flash translation layer tailored to data intensive workloads and large flash storage arrays. Our data show that, using technologies available in the near future, Gordon systems can out-perform disk-based clusters by 1.5× and deliver up to 2.5× more performance per Watt.

architectural support for programming languages and operating systems | 2012

Providing safe, user space access to fast, solid state disks

Adrian M. Caulfield; Todor I. Mollov; Louis Alex Eisner; Arup De; Joel Coburn; Steven Swanson

Emerging fast, non-volatile memories (e.g., phase change memories, spin-torque MRAMs, and the memristor) reduce storage access latencies by an order of magnitude compared to state-of-the-art flash-based SSDs. This improved performance means that software overheads that had little impact on the performance of flash-based systems can present serious bottlenecks in systems that incorporate these new technologies. We describe a novel storage hardware and software architecture that nearly eliminates two sources of this overhead: Entering the kernel and performing file system permission checks. The new architecture provides a private, virtualized interface for each process and moves file system protection checks into hardware. As a result, applications can access file data without operating system intervention, eliminating OS and file system costs entirely for most accesses. We describe the support the system provides for fast permission checks in hardware, our approach to notifying applications when requests complete, and the small, easily portable changes required in the file system to support the new access model. Existing applications require no modification to use the new interface. We evaluate the performance of the system using a suite of microbenchmarks and database workloads and show that the new interface improves latency and bandwidth for 4 KB writes by 60% and 7.2x, respectively, OLTP database transaction throughput by up to 2.0x, and Berkeley-DB throughput by up to 5.7x. A streamlined asynchronous file IO interface built to fully utilize the new interface enables an additional 5.5x increase in throughput with 1 thread and 2.8x increase in efficiency for 512 B transfers.

IEEE Computer | 2013

Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage

Steven Swanson; Adrian M. Caulfield

Emerging nonvolatile storage technologies promise orders-of-magnitude bandwidth increases and latency reductions, but fully realizing their potential requires minimizing storage software overhead and rethinking the roles of hardware and software in storage systems. The Web extra at http://youtu.be/pAALQ6k-CbE is an audio interview in which guest editor Simon S.Y. Shim interviews Steven Swanson about emerging nonvolatile storage technologies.

international symposium on computer architecture | 2013

QuickSAN: a storage area network for fast, distributed, solid state disks

Adrian M. Caulfield; Steven Swanson

Solid State Disks (SSDs) based on flash and other non-volatile memory technologies reduce storage latencies from 10s of milliseconds to 10s or 100s of microseconds, transforming previously inconsequential storage overheads into performance bottlenecks. This problem is especially acute in storage area network (SAN) environments where complex hardware and software layers (distributed file systems, block severs, network stacks, etc.) lie between applications and remote data. These layers can add hundreds of microseconds to requests, obscuring the performance of both flash memory and faster, emerging non-volatile memory technologies. We describe QuickSAN, a SAN prototype that eliminates most software overheads and significantly reduces hardware overheads in SANs. QuickSAN integrates a network adapter into SSDs, so the SSDs can communicate directly with one another to service storage accesses as quickly as possible. QuickSAN can also give applications direct access to both local and remote data without operating system intervention, further reducing software costs. Our evaluation of QuickSAN demonstrates remote access latencies of 20 μs for 4 KB requests, bandwidth improvements of as much as 163x for small accesses compared with an equivalent iSCSI implementation, and 2.3-3.0x application level speedup for distributed sorting. We also show that QuickSAN improves energy efficiency by up to 96% and that QuickSANs networking connectivity allows for improved cluster-level energy efficiency under varying load.

IEEE Micro | 2015

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

To advance datacenter capabilities beyond what commodity server designs can provide, the authors designed and built a composable, reconfigurable fabric to accelerate large-scale software services. Each instantiation of the fabric consists of a 6 x 8 2D torus of high-end field-programmable gate arrays (FPGAs) embedded into a half-rack of 48 servers. The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

international symposium on microarchitecture | 2010

Gordon: An Improved Architecture for Data-Intensive Applications

Adrian M. Caulfield; Laura M. Grupp; Steven Swanson

Gordon is a system architecture for data-centric applications combining low-power processors, flash memory, and data-centric programming systems to improve performance and efficiency for data-centric applications, the article explores the Gordon design space and the design of a specialized flash translation layer. Gordon systems can outperform disk-based clusters by 1.5x and deliver 2.5x more performance per watt.

global communications conference | 2010

Beyond the datasheet: Using test beds to probe non-volatile memories' dark secrets

Laura M. Grupp; Adrian M. Caulfield; Joel Coburn; John D. Davis; Steven Swanson

Non-volatile memories (such as NAND flash and phase change memories) have the potential to revolutionize computer systems. However, these technologies have complex behavior in terms of performance, reliability, and energy consumption that make fully exploiting their potential a complicated task. As device engineers push bit densities higher, this complexity will only increase. Managing and exploiting the complex and at times surprising behavior of these memories requires a deep understanding of the devices grounded in experimental results. Our research groups have developed several hardware test beds for flash and other memories that allow us to both characterize these memories and experimentally evaluate their performance on full-scale computer systems. We describe several of these test bed systems, outline some of the research findings they have enabled, and discuss some of the methodological challenges they raise.

Explore More