Stephen M. Rumble | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen M. Rumble is active.

Explore More

Publication

Featured researches published by Stephen M. Rumble.

PLOS Computational Biology | 2009

SHRiMP: Accurate Mapping of Short Color-space Reads

Stephen M. Rumble; Phil Lacroute; Adrian V. Dalca; Marc Fiume; Arend Sidow; Michael Brudno

The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.

Operating Systems Review | 2010

The case for RAMClouds: scalable high-performance storage entirely in DRAM

John K. Ousterhout; Parag Agrawal; David Erickson; Christos Kozyrakis; Jacob Leverich; David Mazières; Subhasish Mitra; Aravind Narayanan; Guru M. Parulkar; Mendel Rosenblum; Stephen M. Rumble; Eric Stratmann; Ryan Stutsman

Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of dataintensive applications.

european conference on computer systems | 2009

SnowFlock: rapid virtual machine cloning for cloud computing

Horacio Andres Lagar-cavilla; Joseph Andrew Whitney; Adin Scannell; Philip Patchin; Stephen M. Rumble; Eyal de Lara; Michael Brudno; Mahadev Satyanarayanan

Virtual Machine (VM) fork is a new cloud computing abstraction that instantaneously clones a VM into multiple replicas running on different hosts. All replicas share the same initial state, matching the intuitive semantics of stateful worker creation. VM fork thus enables the straightforward creation and efficient deployment of many tasks demanding swift instantiation of stateful workers in a cloud environment, e.g. excess load handling, opportunistic job placement, or parallel computing. Lack of instantaneous stateful cloning forces users of cloud computing into ad hoc practices to manage application state and cycle provisioning. We present SnowFlock, our implementation of the VM fork abstraction. To evaluate SnowFlock, we focus on the demanding scenario of services requiring on-the-fly creation of hundreds of parallel workers in order to solve computationally-intensive queries in seconds. These services are prominent in fields such as bioinformatics, finance, and rendering. SnowFlock provides sub-second VM cloning, scales to hundreds of workers, consumes few cloud I/O resources, and has negligible runtime overhead.

symposium on operating systems principles | 2011

Fast crash recovery in RAMCloud

Diego Ongaro; Stephen M. Rumble; Ryan Stutsman; John K. Ousterhout; Mendel Rosenblum

RAMCloud is a DRAM-based storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruct lost data. The system uses a log-structured approach for all its data, in DRAM as well as on disk: this provides high performance both during normal operation and during recovery. RAMCloud employs randomized techniques to manage the system in a scalable and decentralized fashion. In a 60-node cluster, RAMCloud recovers 35 GB of data from a failed server in 1.6 seconds. Our measurements suggest that the approach will scale to recover larger memory sizes (64 GB or more) in less time with larger clusters.

european conference on computer systems | 2011

Energy management in mobile devices with the cinder operating system

Arjun Roy; Stephen M. Rumble; Ryan Stutsman; Philip Levis; David Mazières; Nickolai Zeldovich

We argue that controlling energy allocation is an increasingly useful and important feature for operating systems, especially on mobile devices. We present two new low-level abstractions in the Cinder operating system, reserves and taps, which store and distribute energy for application use. We identify three key properties of control -- isolation, delegation, and subdivision -- and show how using these abstractions can achieve them. We also show how the architecture of the HiStar information-flow control kernel lends itself well to energy control. We prototype and evaluate Cinder on a popular smartphone, the Android G1.

Communications of The ACM | 2011

The case for RAMCloud

John K. Ousterhout; Parag Agrawal; David Erickson; Christos Kozyrakis; Jacob Leverich; David Mazières; Subhasish Mitra; Aravind Narayanan; Diego Ongaro; Guru M. Parulkar; Mendel Rosenblum; Stephen M. Rumble; Eric Stratmann; Ryan Stutsman

With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications.

ACM Transactions on Computer Systems | 2015

The RAMCloud Storage System

John K. Ousterhout; Arjun Gopalan; Ashish Gupta; Ankita Kejriwal; Collin Lee; B. Montazeri; Diego Ongaro; Seo Jin Park; Henry Qin; Mendel Rosenblum; Stephen M. Rumble; Ryan Stutsman; Stephen Yang

RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-structured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size.

ACM Transactions on Computer Systems | 2011

SnowFlock: Virtual Machine Cloning as a First-Class Cloud Primitive

H. Andrés Lagar-Cavilla; Joseph Andrew Whitney; Roy Bryant; Philip Patchin; Michael Brudno; Eyal de Lara; Stephen M. Rumble; Mahadev Satyanarayanan; Adin Scannell

A basic building block of cloud computing is virtualization. Virtual machines (VMs) encapsulate a user’s computing environment and efficiently isolate it from that of other users. VMs, however, are large entities, and no clear APIs exist yet to provide users with programatic, fine-grained control on short time scales. We present SnowFlock, a paradigm and system for cloud computing that introduces VM cloning as a first-class cloud abstraction. VM cloning exploits the well-understood and effective semantics of UNIX fork. We demonstrate multiple usage models of VM cloning: users can incorporate the primitive in their code, can wrap around existing toolchains via scripting, can encapsulate the API within a parallel programming framework, or can use it to load-balance and self-scale clustered servers. VM cloning needs to be efficient to be usable. It must efficiently transmit VM state in order to avoid cloud I/O bottlenecks. We demonstrate how the semantics of cloning aid us in realizing its efficiency: state is propagated in parallel to multiple VM clones, and is transmitted during runtime, allowing for optimizations that substantially reduce the I/O load. We show detailed microbenchmark results highlighting the efficiency of our optimizations, and macrobenchmark numbers demonstrating the effectiveness of the different usage models of SnowFlock.

networking systems and applications for mobile handhelds | 2009

Apprehending joule thieves with cinder

Stephen M. Rumble; Ryan Stutsman; Philip Levis; David Mazières; Nickolai Zeldovich

Energy is the critical limiting resource to mobile computing devices. Correspondingly, an operating system must track, provision, and ration how applications consume energy. The emergence of third-party application stores and marketplaces makes this concern even more pressing. A third-party application must not deny service through excessive, unforeseen energy expenditure, whether accidental or malicious. Previous research has shown promise in tracking energy usage and rationing it to meet device lifetime goals, but such mechanisms and policies are still nascent, especially regarding user interaction. We argue for a new operating system, called Cinder, which builds on top of the HiStar OS. Cinders energy awareness is based on hierarchical capacitors and task profiles. We introduce and explore these abstractions, paying particular attention to the ways in which policies could be generated and enforced in a dynamic system.

workshop on algorithms in bioinformatics | 2008

Read Mapping Algorithms for Single Molecule Sequencing Data

Vladimir Yanovsky; Stephen M. Rumble; Michael Brudno

Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.

Explore More