Alistair Veitch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alistair Veitch is active.

Explore More

Publication

Featured researches published by Alistair Veitch.

symposium on operating systems principles | 1999

Deciding when to forget in the Elephant file system

Douglas S. Santry; Michael J. Feeley; Norman C. Hutchinson; Alistair Veitch; Ross W. Carton; Jacob Ofir

Modern file systems associate the deletion of a file with the immediate release of storage, and file writes with the irrevocable change of file contents. We argue that this behavior is a relic of the past, when disk storage was a scarce resource. Today, large cheap disks make it possible for the file system to protect valuable data from accidental delete or overwrite.This paper describes the design, implementation, and performance of the Elephant file system, which automatically retains all important versions of user files. Users name previous file versions by combining a traditional pathname with a time when the desired version of a file or directory existed. Storage in Elephant is managed by the system using file-grain user-specified retention policies. This approach contrasts with checkpointing file systems such as Plan-9, AFS, and WAFL that periodically generate efficient checkpoints of entire file systems and thus restrict retention to be guided by a single policy for all files within that file system.Elephant is implemented as a new Virtual File System in the FreeBSD kernel.

symposium on operating systems principles | 2007

Sinfonia: a new paradigm for building scalable distributed systems

Marcos Kawazoe Aguilera; Arif Merchant; Mehul A. Shah; Alistair Veitch; Christos Karamanolis

We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols -- a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a novel minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.

ACM Transactions on Computer Systems | 2001

Minerva: An automated resource provisioning tool for large-scale storage systems

Guillermo A. Alvarez; Elizabeth Lynn Borowsky; Susie Go; Theodore H. Romer; Ralph Becker-Szendy; Richard A. Golding; Arif Merchant; Mirjana Spasojevic; Alistair Veitch; John Wilkes

Enterprise-scale storage systems, which can contain hundreds of host computers and storage devices and up to tens of thousands of disks and logical volumes, are difficult to design. The volume of choices that need to be made is massive, and many choices have unforeseen interactions. Storage system design is tedious and complicated to do by hand, usually leading to solutions that are grossly over-provisioned, substantially under-performing or, in the worst case, both.To solve the configuration nightmare, we present minerva: a suite of tools for designing storage systems automatically. Minerva uses declarative specifications of application requirements and device capabilities; constraint-based formulations of the various sub-problems; and optimization techniques to explore the search space of possible solutions.This paper also explores and evaluates the design decisions that went into Minerva, using specialized micro- and macro-benchmarks. We show that Minerva can successfully handle a workload with substantial complexity (a decision-support database benchmark). Minerva created a 16-disk design in only a few minutes that achieved the same performance as a 30-disk system manually designed by human experts. Of equal importance, Minerva was able to predict the resulting systems performance before it was built.

architectural support for programming languages and operating systems | 2004

FAB: building distributed enterprise disk arrays from commodity components

Yasushi Saito; Svend Frolund; Alistair Veitch; Arif Merchant; Susan Spence

This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-voting-based algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional master-slave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.

dependable systems and networks | 2004

A decentralized algorithm for erasure-coded virtual disks

Svend Frolund; Arif Merchant; Yasushi Saito; Susan Spence; Alistair Veitch

A federated array of bricks is a scalable distributed storage system composed from inexpensive storage bricks. It achieves high reliability with low cost by using erasure coding across the bricks to maintain data reliability in the face of brick failures. Erasure coding generates n encoded blocks from m data blocks (n > m) and permits the data blocks to be reconstructed from any m of these encoded blocks. We present a new fully decentralized erasure-coding algorithm for an asynchronous distributed system. Our algorithm provides fully linearizable read-write access to erasure-coded data and supports concurrent I/O controllers that may crash and recover. Our algorithm relies on a novel quorum construction where any two quorums intersect in m processes.

european conference on computer systems | 2012

LazyBase: trading freshness for performance in a scalable database

James Cipar; Gregory R. Ganger; Kimberly Keeton; Charles B. Morrey; Craig A. N. Soules; Alistair Veitch

The LazyBase scalable database system is specialized for the growing class of data analysis applications that extract knowledge from large, rapidly changing data sets. It provides the scalability of popular NoSQL systems without the query-time complexity associated with their eventual consistency models, offering a clear consistency model and explicit per-query control over the trade-off between latency and result freshness. With an architecture designed around batching and pipelining of updates, LazyBase simultaneously ingests atomic batches of updates at a very high throughput and offers quick read queries to a stale-but-consistent version of the data. Although slightly stale results are sufficient for many analysis queries, fully up-to-date results can be obtained when necessary by also scanning updates still in the pipeline. Compared to the Cassandra NoSQL system, LazyBase provides 4X--5X faster update throughput and 4X faster read query throughput for range queries while remaining competitive for point queries. We demonstrate LazyBases tradeoff between query latency and result freshness as well as the benefits of its consistency model. We also demonstrate specific cases where Cassandras consistency model is weaker than LazyBases.

Operating Systems Review | 2009

DataSeries: an efficient, flexible data format for structured serial data

Eric Anderson; Martin F. Arlitt; Charles B. Morrey; Alistair Veitch

Structured serial data is used in many scientific fields; such data sets consist of a series of records, and are typically written once, read many times, chronologically ordered, and read sequentially. In this paper we introduce DataSeries, an on-disk format, run-time library and set of tools for storing and analyzing structured serial data. We identify six key properties of a system to store and analyze this type of data, and describe how DataSeries was designed to provide these properties. We quantify the benefits of DataSeries through several experiments. In particular, we demonstrate that DataSeries exceeds the performance of common trace formats by at least a factor of two.

workshop on hot topics in operating systems | 2001

Towards global storage management and data placement

Alistair Veitch; Erik Riedel; Simon Towers; John Wilkes

As users and companies increasingly depend on shared, networked information services, we continue to see growth in data centers and service providers. This happens as services and servers are consolidated (for ease of management and reduced duplication), while also being distributed (for fault tolerance and to accommodate the global reach of customers). Since access to data is the lifeblood of any organization, a global storage system is a core element in such an infrastructure. Based on success in automatically managing local storage, we believe that the key attribute of such a system is the ability to flexibly adapt to a variety of application semantics and requirements as they arise and as they change over time. Our work has shown that it is possible to automatically design and configure a storage system of one or more disk arrays to meet a set of application requirements and to dynamically reconfigure as needs change, all without human intervention. Work on global data placement expands the scope of this system to a world of distributed data centers.

information interaction in context | 2008

Activity put in context : identifying implicit task context within the user's document interaction

Karl Gyllstrom; Craig A. N. Soules; Alistair Veitch

Modern desktop search is ill-fitted to our personal document workspace. On one hand, many of the methods which render web search effective cannot be applied on the desktop. On the other, desktop search does not take full advantage of attributes that are unique to our personal documents. In this work, we present Confluence, a desktop search system that addresses this problem by capturing the task context within which a user interacts with their documents. This context is then integrated with traditional desktop search techniques to enable task-based document retrieval. Building upon Connections, a system that identifies task context by passively monitoring the users interaction with their documents within the file system. Confluence also traces user activity within the user interface and incorporates methods to analyze and integrate this new stream of information. We show that this approach significantly improves the accuracy of task identification, achieving 25% to 30% better recall.

international acm sigir conference on research and development in information retrieval | 2007

Confluence : enhancing contextual desktop search

Karl Gyllstrom; Craig A. N. Soules; Alistair Veitch

We present Confluence, an enhancement to a desktop file search tool called Confluence which extracts conceptual relationships between files by their temporal access patterns in the file system. A limitation of a purely file-based approach is that as file operations are increasingly abstracted by applications, their correlation to a users activity weakens and thereby reduces the applicability of their temporal patterns. To deal with this problem, we augment the file event stream with a stream of window focus events from the UI layer. We present 3 algorithms that analyze this new stream, extracting the users task information which informs the existing Confluence algorithms. We present results and conclusions from a preliminary user study on Confluence.

Explore More