Is this you? Create Your Porfile

Carlos Maltzahn

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos Maltzahn is active.

Explore More

Publication

Featured researches published by Carlos Maltzahn.

conference on high performance computing (supercomputing) | 2006

CRUSH: controlled, scalable, decentralized placement of replicated data

Sage A. Weil; Scott A. Brandt; Ethan L. Miller; Carlos Maltzahn

Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of user-defined policies that enforce separation of replicas across failure domains

ieee conference on mass storage systems and technologies | 2012

On the role of burst buffers in leadership-class storage systems

Ning Liu; Jason Cope; Philip H. Carns; Christopher D. Carothers; Robert B. Ross; Gary Grider; Adam Crume; Carlos Maltzahn

The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage system to absorb application I/O requests. In this paper, we simulate and explore this storage system design for use by large-scale HPC systems. First, we examine application I/O patterns on an existing large-scale HPC system to identify common burst patterns. Next, we describe enhancements to the CODES storage system simulator to enable our burst buffer simulations. These enhancements include the integration of a burst buffer model into the I/O forwarding layer of the simulator, the development of an I/O kernel description language and interpreter, the development of a suite of I/O kernels that are derived from observed I/O patterns, and fidelity improvements to the CODES models. We evaluate the I/O performance for a set of multiapplication I/O workloads and burst buffer configurations. We show that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived throughput goal.

ieee international conference on high performance computing data and analytics | 2011

SciHadoop: array-based query processing in Hadoop

Joe B. Buck; Noah Watkins; Jeff LeFevre; Kleoni Ioannidou; Carlos Maltzahn; Neoklis Polyzotis; Scott A. Brandt

Hadoop has become the de facto platform for large-scale data analysis in commercial applications, and increasingly so in scientific applications. However, Hadoops byte stream data model causes inefficiencies when used to process scientific data that is commonly stored in highly-structured, array-based binary file formats resulting in limited scalability of Hadoop applications in science. We introduce Sci- Hadoop, a Hadoop plugin allowing scientists to specify logical queries over array-based data models. Sci-Hadoop executes queries as map/reduce programs defined over the logical data model. We describe the implementation of a Sci-Hadoop prototype for NetCDF data sets and quantify the performance of five separate optimizations that address the following goals for several representative aggregate queries: reduce total data transfers, reduce remote reads, and reduce unnecessary reads. Two optimizations allow holistic aggregate queries to be evaluated opportunistically during the map phase; two additional optimizations intelligently partition input data to increase read locality, and one optimization avoids block scans by examining the data dependencies of an executing query to prune input partitions. Experiments involving a holistic function show run-time improvements of up to 8x, with drastic reductions of IO, both locally and over the network.

measurement and modeling of computer systems | 1997

Performance issues of enterprise level web proxies

Carlos Maltzahn; Kathy J. Richardson; Dirk Grunwald

Enterprise level web proxies relay world-wide web traffic between private networks and the Internet. They improve security, save network bandwidth, and reduce network latency. While the performance of web proxies has been analyzed based on synthetic workloads, little is known about their performance on real workloads. In this paper we present a study of two web proxies (CERN and Squid) executing real workloads on Digitals Palo Alto Gateway. We demonstrate that the simple CERN proxy architecture outperforms all but the latest version of Squid and continues to outperform cacheless configurations. For the measured load levels the Squid proxy used at least as many CPU, memory, and disk resources as CERN, in some configurations significantly more resources. At higher load levels the resource utilization requirements will cross and Squid will be the one using fewer resources. Lastly we found that cache hit rates of around 30% had very little effect on the requests service time.

petascale data storage workshop | 2007

RADOS: a scalable, reliable storage service for petabyte-scale storage clusters

Sage A. Weil; Andrew W. Leung; Scott A. Brandt; Carlos Maltzahn

Brick and object-based storage architectures have emerged as a means of improving the scalability of storage clusters. However, existing systems continue to treat storage nodes as passive devices, despite their ability to exhibit significant intelligence and autonomy. We present the design and implementation of RADOS, a reliable object storage service that can scales to many thousands of devices by leveraging the intelligence present in individual storage nodes. RADOS preserves consistent data access and strong safety semantics while allowing nodes to act semi-autonomously to self-manage replication, failure detection, and failure recovery through the use of a small cluster map. Our implementation offers excellent performance, reliability, and scalability while providing clients with the illusion of a single logical object store.

hawaii international conference on system sciences | 1997

The Chautauqua workflow system

Clarence A. Ellis; Carlos Maltzahn

Chautauqua is an exploratory workflow management system designed and implemented within the Collaboration Technology Research group (CTRG) at the University of Colorado. This system represents a tightly knit merger of workflow technology and groupware technology. Chautauqua has been in test usage at the University of Colorado since 1995. The article discusses Chautauqua-its motivation, its design, and its implementation. The emphasis is on its novel features, and the techniques for implementing these features.

european conference on computer systems | 2008

Efficient guaranteed disk request scheduling with fahrrad

Anna Povzner; Tim Kaldewey; Scott A. Brandt; Richard A. Golding; Theodore M. Wong; Carlos Maltzahn

Guaranteed I/O performance is needed for a variety of applications ranging from real-time data collection to desktop multimedia to large-scale scientific simulations. Reservations on throughput, the standard measure of disk performance, fail to effectively manage disk performance due to the orders of magnitude difference between best-, average-, and worst-case response times, allowing reservation of less than 0.01% of the achievable bandwidth. We show that by reserving disk resources in terms of utilization it is possible to create a disk scheduler that supports reservation of nearly 100% of the disk resources, provides arbitrarily hard or soft guarantees depending upon application needs, and yields efficiency as good or better than best-effort disk schedulers tuned for performance. We present the architecture of our scheduler, prove the correctness of its algorithms, and provide results demonstrating its effectiveness.

ieee conference on mass storage systems and technologies | 2005

Richer file system metadata using links and attributes

Alexander K. Ames; Nikhil Bobb; Scott A. Brandt; Adam Hiatt; Carlos Maltzahn; Ethan L. Miller; Alisa Neeman; Deepa Tuteja

Traditional file systems provide a weak and inadequate structure for meaningful representations of file interrelationships and other context-providing metadata. Existing designs, which store additional file-oriented metadata either in a database, on disk, or both are limited by the technologies upon which they depend. Moreover, they do not provide for user-defined relationships among files. To address these issues, we created the linking file system (LiFS), a file system design in which files may have both arbitrary user- or application-specified attributes, and attributed links between files. In order to assure performance when accessing links and attributes, the system is designed to store metadata in non-volatile memory. This paper discusses several use cases that take advantage of this approach and describes the user-space prototype we developed to test the concepts presented.

Software Engineering Journal | 1991

A decision-based configuration process environment

Thomas Rose; Matthias Jarke; Michael Gocek; Carlos Maltzahn; Hans W. Nissen

In the context of the ESPRIT project DAIDA, we have developed an experimental environment intended to achieve consistency-in-the-large in a multi-person setting. Our conceptual model of configuration processes, the CAD° model, centres around decisions that work on configured objects and are subject to structured conversations. The environment, extending the knowledge-based software information system ConceptBase, supports co-operation within development teams by integrating models and tools for argumentation and co-ordination with those for versioning and configuration. Versioning decisions are discussed and decided on within an argument editor, and executed by specialised tools for programming-in-the-small. Tasks are assigned and monitored through a contract tool, and carried out within co-ordinated workspaces under a conflict-tolerant transaction protocol. Consistent configuration and reconfiguration of local results is supported by a logic-based configuration assistant.

high performance distributed computing | 2013

I/O acceleration with pattern detection

Jun He; John M. Bent; Aaron Torres; Gary Grider; Garth A. Gibson; Carlos Maltzahn; Xian-He Sun

The I/O bottleneck in high-performance computing is becoming worse as application data continues to grow. In this work, we explore how patterns of I/O within these applications can significantly affect the effectiveness of the underlying storage systems and how these same patterns can be utilized to improve many aspects of the I/O stack and mitigate the I/O bottleneck. We offer three main contributions in this paper. First, we develop and evaluate algorithms by which I/O patterns can be efficiently discovered and described. Second, we implement one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads by up to 40 and 480 percent respectively. Third, we build a prototype file system with pattern-aware prefetching and evaluate it to show a 46 percent reduction in I/O latency. Finally, we believe that efficient pattern discovery and description, coupled with the observed predictability of complex patterns within many high-performance applications, offers significant potential to enable many additional I/O optimizations.

Explore More