Joel Coburn
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joel Coburn.
architectural support for programming languages and operating systems | 2011
Joel Coburn; Adrian M. Caulfield; Ameen Akel; Laura M. Grupp; Rajesh K. Gupta; Ranjit Jhala; Steven Swanson
Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent. We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.
international symposium on microarchitecture | 2009
Laura M. Grupp; Adrian M. Caulfield; Joel Coburn; Steven Swanson; Eitan Yaakobi; Paul H. Siegel; Jack K. Wolf
Despite flash memorys promise, it suffers from many idiosyncrasies such as limited durability, data integrity problems, and asymmetry in operation granularity. As architects, we aim to find ways to overcome these idiosyncrasies while exploiting flash memorys useful characteristics. To be successful, we must understand the trade-offs between the performance, cost (in both power and dollars), and reliability of flash memory. In addition, we must understand how different usage patterns affect these characteristics. Flash manufacturers provide conservative guidelines about these metrics, and this lack of detail makes it difficult to design systems that fully exploit flash memorys capabilities. We have empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability. We demonstrate that performance varies significantly between vendors, devices, and from publicly available datasheets. We also demonstrate and quantify some unexpected device characteristics and show how we can use them to improve responsiveness and energy consumption of solid state disks by 44% and 13%, respectively, as well as increase flash device lifetime by 5.2x.
architectural support for programming languages and operating systems | 2012
Adrian M. Caulfield; Todor I. Mollov; Louis Alex Eisner; Arup De; Joel Coburn; Steven Swanson
Emerging fast, non-volatile memories (e.g., phase change memories, spin-torque MRAMs, and the memristor) reduce storage access latencies by an order of magnitude compared to state-of-the-art flash-based SSDs. This improved performance means that software overheads that had little impact on the performance of flash-based systems can present serious bottlenecks in systems that incorporate these new technologies. We describe a novel storage hardware and software architecture that nearly eliminates two sources of this overhead: Entering the kernel and performing file system permission checks. The new architecture provides a private, virtualized interface for each process and moves file system protection checks into hardware. As a result, applications can access file data without operating system intervention, eliminating OS and file system costs entirely for most accesses. We describe the support the system provides for fast permission checks in hardware, our approach to notifying applications when requests complete, and the small, easily portable changes required in the file system to support the new access model. Existing applications require no modification to use the new interface. We evaluate the performance of the system using a suite of microbenchmarks and database workloads and show that the new interface improves latency and bandwidth for 4 KB writes by 60% and 7.2x, respectively, OLTP database transaction throughput by up to 2.0x, and Berkeley-DB throughput by up to 5.7x. A streamlined asynchronous file IO interface built to fully utilize the new interface enables an additional 5.5x increase in throughput with 1 thread and 2.8x increase in efficiency for 512 B transfers.
architectural support for programming languages and operating systems | 2008
Jayanth Gummaraju; Joel Coburn; Yoshio Turner; Mendel Rosenblum
Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for programming multiple cores is the stream programming model, which provides a framework for writing programs in a sequential-style while greatly simplifying the task of automatic parallelization. It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style. In this paper, we investigate the potential to use the stream programming model to efficiently utilize commodity multicore general-purpose processors (e.g., Intel/AMD). Although several stream languages and stream compilers have recently been developed, they typically target special-purpose stream processors. In contrast, we propose a flexible software system, Streamware, which automatically maps stream programs onto a wide variety of general-purpose multicore processor configurations. We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual machine code. The runtime environment assigns work to processor cores considering processor/cache configurations and adapts to workload variations. We evaluate this approach for a few general-purpose scientific applications on real hardware and a cycle-level simulator set-up to showcase scaling and contention issues. The results show that the stream programming model is a good choice for efficiently exploiting modern and future multicore CPUs for an important class of applications.
symposium on operating systems principles | 2013
Joel Coburn; Trevor Bunker; Meir Schwarz; Rajesh K. Gupta; Steven Swanson
Transaction-based systems often rely on write-ahead logging (WAL) algorithms designed to maximize performance on disk-based storage. However, emerging fast, byte-addressable, non-volatile memory (NVM) technologies (e.g., phase-change memories, spin-transfer torque MRAMs, and the memristor) present very different performance characteristics, so blithely applying existing algorithms can lead to disappointing performance. This paper presents a novel storage primitive, called editable atomic writes (EAW), that enables sophisticated, highly-optimized WAL schemes in fast NVM-based storage systems. EAWs allow applications to safely access and modify log contents rather than treating the log as an append-only, write-only data structure, and we demonstrate that this can make implementating complex transactions simpler and more efficient. We use EAWs to build MARS, a WAL scheme that provides the same as features ARIES [26] (a widely-used WAL system for databases) but avoids making disk-centric implementation decisions. We have implemented EAWs and MARS in a next-generation SSD to demonstrate that the overhead of EAWs is minimal compared to normal writes, and that they provide large speedups for transactional updates to hash tables, B+trees, and large graphs. In addition, MARS outperforms ARIES by up to 3.7 x while reducing software complexity.
compilers, architecture, and synthesis for embedded systems | 2005
Joel Coburn; Srivaths Ravi; Anand Raghunathan; Srimat T. Chakradhar
In this work, we propose and investigate the idea of enhancing a System-on-Chip (SoC) communication architecture (the fabric that integrates system components and carries the communication traffic between them) to facilitate higher security. We observe that a wide range of common security attacks are manifested as abnormalities in the system-level communication traffic. Therefore, the communication architecture, with its global system-level visibility, can be used to detect them. The communication architecture can also effectively react to security attacks by disallowing the offending communication transactions, or by notifying appropriate components of a security violation. We describe the general principles involved in a security-enhanced communication architecture (SECA) and show how several security objectives can be encoded in terms of policies that govern the inter-component communication traffic. We detail the implementation of SECA in the context of a popular commercial on-chip bus architecture (the AMBA architecture from ARM) through a combination of a centralized security enforcement module, and enhancements to the bus interfaces of system components. We illustrate how SECA can be used to enhance embedded system security in several application scenarios. A simple instance of SECA has been implemented in a commercial application processor SoC for mobile phones. We provide results of experiments performed to validate the proposed concepts through system-level simulation, and evaluate their overheads through hardware implementation using a commercial design flow.
international conference on parallel architectures and compilation techniques | 2007
Jayanth Gummaraju; Mattan Erez; Joel Coburn; Mendel Rosenblum; William J. Dally
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programming called stream programming is needed to target these stream architectures, huge performance benefits can be achieved. In this paper, we minimally add architectural features to commodity general-purpose processors (e.g., Intel/AMD) to efficiently support the stream execution model. We design the extensions to reuse existing components of the general-purpose processor hardware as much as possible by investigating low-cost modifications to the CPU caches, hardware prefetcher, and the execution core. With a less than 1% increase in die area along with judicious use of a software runtime system, we can efficiently support stream programming on traditional processor cores. We evaluate our techniques by running scientific applications on a cycle-level simulation system. The results show that our system executes stream programs as efficiently as possible, limited only by the ALU performance and the memory bandwidth needed to feed the ALUs.
global communications conference | 2010
Laura M. Grupp; Adrian M. Caulfield; Joel Coburn; John D. Davis; Steven Swanson
Non-volatile memories (such as NAND flash and phase change memories) have the potential to revolutionize computer systems. However, these technologies have complex behavior in terms of performance, reliability, and energy consumption that make fully exploiting their potential a complicated task. As device engineers push bit densities higher, this complexity will only increase. Managing and exploiting the complex and at times surprising behavior of these memories requires a deep understanding of the devices grounded in experimental results. Our research groups have developed several hardware test beds for flash and other memories that allow us to both characterize these memories and experimentally evaluate their performance on full-scale computer systems. We describe several of these test bed systems, outline some of the research findings they have enabled, and discuss some of the methodological challenges they raise.
IEEE Design & Test of Computers | 2011
Devi Sravanthi Yalamarthy; Joel Coburn; Rajesh K. Gupta; Glen Edwards; Mark Kelly
The significant growth in the quantity of data in biology and related fields has spawned the need for novel computational solutions. The authors show how a key search task in proteomics, the large-scale study of proteins, can be accelerated by several orders of magnitude by the use of FPGA-based hardware.
ieee international conference on high performance computing data and analytics | 2010
Adrian M. Caulfield; Joel Coburn; Todor I. Mollov; Arup De; Ameen Akel; Jiahua He; Arun Jagatheesan; Rajesh K. Gupta; Allan Snavely; Steven Swanson