Is this you? Create Your Porfile

Ethan L. Miller

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ethan L. Miller is active.

Explore More

Publication

Featured researches published by Ethan L. Miller.

high performance distributed computing | 2004

Evaluation of distributed recovery in large-scale storage systems

Qin Xin; Ethan L. Miller; S.J.T.J.E. Schwarz

Storage clusters consisting of thousands of disk drives are now being used both for their large capacity and high throughput. However, their reliability is far worse than that of smaller storage systems due to the increased number of storage nodes. RAID technology is no longer sufficient to guarantee the necessary high data reliability for such systems, because disk rebuild time lengthens as disk capacity grows. We present fast recovery mechanism (FARM), a distributed recovery approach that exploits excess disk capacity and reduces data recovery time. FARM works in concert with replication and erasure-coding redundancy schemes to dramatically lower the probability of data loss in large-scale storage systems. We have examined essential factors that influence system reliability, performance, and costs, such as failure detections, disk bandwidth usage for recovery, disk space utilization, disk drive replacement, and system scales, by simulating system behavior under disk failures. Our results show the reliability improvement from FARM and demonstrate the impacts of various factors on system reliability. Using our techniques, system designers will be better able to build multipetabyte storage systems with much higher reliability at lower cost than previously possible.

international conference on distributed computing systems | 2006

Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage

Thomas S. J. Schwarz; Ethan L. Miller

The emerging use of the Internet for remote storage and backup has led to the problem of verifying that storage sites in a distributed system indeed store the data; this must often be done in the absence of knowledge of what the data should be. We use m/n erasure-correcting coding to safeguard the stored data and use algebraic signatures hash functions with algebraic properties for verification. Our scheme primarily utilizes one such algebraic property: taking a signature of parity gives the same result as taking the parity of the signatures. To make our scheme collusionresistant, we blind data and parity by XORing them with a pseudo-random stream. Our scheme has three advantages over existing techniques. First, it uses only small messages for verification, an attractive property in a P2P setting where the storing peers often only have a small upstream pipe. Second, it allows verification of challenges across random data without the need for the challenger to compare against the original data. Third, it is highly resistant to coordinated attempts to undetectably modify data. These signature techniques are very fast, running at tens to hundreds of megabytes per second. Because of these properties, the use of algebraic signatures will permit the construction of large-scale distributed storage systems in which large amounts of storage can be verified with minimal network bandwidth.

conference on high performance computing (supercomputing) | 2006

CRUSH: controlled, scalable, decentralized placement of replicated data

Sage A. Weil; Scott A. Brandt; Ethan L. Miller; Carlos Maltzahn

Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of user-defined policies that enforce separation of replicas across failure domains

workshop on storage security and survivability | 2008

Secure data deduplication

Mark W. Storer; Kevin M. Greenan; Darrell D. E. Long; Ethan L. Miller

As the world moves to digital storage for archival purposes, there is an increasing demand for systems that can provide secure data storage in a cost-effective manner. By identifying common chunks of data both within and between files and storing them only once, deduplication can yield cost savings by increasing the utility of a given amount of storage. Unfortunately, deduplication exploits identical content, while encryption attempts to make all content appear random; the same content encrypted with two different keys results in very different ciphertext. Thus, combining the space efficiency of deduplication with the secrecy aspects of encryption is problematic. We have developed a solution that provides both data security and space efficiency in single-server storage and distributed storage systems. Encryption keys are generated in a consistent manner from the chunk data; thus, identical chunks will always encrypt to the same ciphertext. Furthermore, the keys cannot be deduced from the encrypted chunk data. Since the information each user needs to access and decrypt the chunks that make up a file is encrypted using a key known only to the user, even a full compromise of the system cannot reveal which chunks are used by which users.

ieee conference on mass storage systems and technologies | 2003

Reliability mechanisms for very large storage systems

Qin Xin; Ethan L. Miller; Thomas J. E. Schwarz; Darrell D. E. Long; Scott A. Brandt; Witold Litwin

Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improved techniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.

conference on high performance computing (supercomputing) | 2004

Dynamic Metadata Management for Petabyte-Scale File Systems

Sage A. Weil; Kristal T. Pollack; Scott A. Brandt; Ethan L. Miller

In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.

ieee conference on mass storage systems and technologies | 2010

Design issues for a shingled write disk system

Ahmed Amer; Darrell D. E. Long; Ethan L. Miller; Jehan-Francois Paris; S. J. Thomas Schwarz

If the data density of magnetic disks is to continue its current 30–50% annual growth, new recording techniques are required. Among the actively considered options, shingled writing is currently the most attractive one because it is the easiest to implement at the device level. Shingled write recording trades the inconvenience of the inability to update in-place for a much higher data density by a using a different write technique that overlaps the currently written track with the previous track. Random reads are still possible on such devices, but writes must be done largely sequentially. In this paper, we discuss possible changes to disk-based data structures that the adoption of shingled writing will require. We first explore disk structures that are optimized for large sequential writes with little or no sequential writing, even of metadata structures, while providing acceptable read performance. We also examine the usefulness of non-volatile RAM and the benefits of object-based interfaces in the context of shingled disks. Finally, through the analysis of recent device traces, we demonstrate the surprising stability of written device blocks, with general purpose workloads showing that more than 93% of device blocks remain unchanged over a day, and that for more specialized workloads less than 0.5% of a shingled-write disks capacity would be needed to hold randomly updated blocks.

conference on high performance computing (supercomputing) | 1991

Input/output behavior of supercomputing applications

Ethan L. Miller; Randy H. Katz

No abstract available

ieee conference on mass storage systems and technologies | 2003

Efficient metadata management in large distributed storage systems

Scott A. Brandt; Ethan L. Miller; Darrell D. E. Long; Lan Xue

Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access rates. We present a new approach called lazy hybrid (LH) metadata management that combines the best aspects of these two approaches while avoiding their shortcomings.

international parallel and distributed processing symposium | 2004

Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution

R. J. Honicky; Ethan L. Miller

Summary form only given. Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (replication under scalable hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.

Explore More