Dean Hildebrand | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dean Hildebrand is active.

Explore More

Publication

Featured researches published by Dean Hildebrand.

ieee international conference on cloud engineering | 2015

Finding the Big Data Sweet Spot: Towards Automatically Recommending Configurations for Hadoop Clusters on Docker Containers

Rui Zhang; Min Li; Dean Hildebrand

The complexity of cloud-based analytics environments threatens to undermine their otherwise tremendous values. In particular, configuring such environments presents a great challenge. We propose to alleviate this issue with an engine that recommends configurations for a newly submitted analytics job in an intelligent and timely manner. The engine is rooted in a modified k-nearest neighbor algorithm, which finds desirable configurations from similar past jobs that have performed well. We apply the method to configuring an important class of analytics environments: Hadoop on container-driven clouds. Preliminary evaluation suggests up to 28% performance gain could result from our method.

measurement and modeling of computer systems | 2015

Newer Is Sometimes Better: An Evaluation of NFSv4.1

Ming Chen; Dean Hildebrand; Geoff Kuenning; Soujanya Shankaranarayana; Bharat Singh; Erez Zadok

The popular Network File System (NFS) protocol is 30 years old. The latest version, NFSv4, is more than ten years old but has only recently gained stability and acceptance. NFSv4 is vastly different from its predecessors: it offers a stateful server, strong security, scalability/WAN features, and callbacks, among other things. Yet NFSv4s efficacy and ability to meet its stated design goals had not been thoroughly studied until now. This paper compares NFSv4.1s performance with NFSv3 using a wide range of micro- and macro-benchmarks on a testbed configured to exercise the core protocol features. We (1) tested NFSv4s unique features, such as delegations and statefulness; (2) evaluated performance comprehensively with different numbers of threads and clients, and different network latencies and TCP/IP features; (3) found, fixed, and reported several problems in Linuxs NFSv4.1 implementation, which helped improve performance by up to 11X; and (4) discovered, analyzed, and explained several counter-intuitive results. Depending on the workload, NFSv4.1 was up to 67\% slower than NFSv3 in a low-latency network, but exceeded NFSv3s performance by up to 2.9X in a high-latency environment. Moreover, NFSv4.1 outperformed NFSv3 by up to 172X when delegations were used.

symposium on operating systems principles | 2015

A fast and slippery slope for file systems

Ricardo Santana; Raju Rangaswami; Vasily Tarasov; Dean Hildebrand

There is a vast number and variety of file systems currently available, each optimizing for an ever growing number of storage devices and workloads. Users have an unprecedented, and somewhat overwhelming, number of data management options. At the same time, the fastest storage devices are only getting faster, and it is unclear on how well the existing file systems will adapt. Using emulation techniques, we evaluate five popular Linux file systems across a range of storage device latencies typical to low-end hard drives, latest high-performance persistent memory block devices, and in between. Our findings are often surprising. Depending on the workload, we find that some file systems can clearly scale with faster storage devices much better than others. Further, as storage device latency decreases, we find unexpected performance inversions across file systems. Finally, file system scalability in the higher device latency range is not representative of scalability in the lower, sub-millisecond, latency range. We then focus on Nilfs2 as an especially alarming example of an unexpectedly poor scalability and present detailed instructions for identifying bottlenecks in the I/O stack.

petascale data storage workshop | 2009

pNFS, POSIX, and MPI-IO: a tale of three semantics

Dean Hildebrand; Arifa Nisar; Roger L. Haskin

MPI-IO is emerging as the standard mechanism for file I/O within HPC applications. While pNFS demonstrates high-performance I/O for bulk data transfers, its performance and scalability with MPI-IO is unproven. To attain success, the consistency semantics and interfaces of pNFS, POSIX, and MPI-IO must all be reconciled and efficiently translated. This paper investigates and discusses the challenges of using pNFS to support the consistency semantics of HPC applications.

Operating Systems Review | 2016

A Fast and Slippery Slope for File Systems

Ricardo Santana; Raju Rangaswami; Vasily Tarasov; Dean Hildebrand

There is a vast number and variety of file systems currently available, each optimizing for an ever growing number of storage devices and workloads. Users have an unprecedented, and somewhat overwhelming, number of data management options. At the same time, the fastest storage devices are only getting faster, and it is unclear on how well the existing file systems will adapt. Using emulation techniques, we evaluate five popular Linux file systems across a range of storage device latencies typical to low-end hard drives, latest high-performance persistent memory block devices, and in between. Our findings are often surprising. Depending on the workload, we find that some file systems can clearly scale with faster storage devices much better than others. Further, as storage device latency decreases, we find unexpected performance inversions across file systems. Finally, file system scalability in the higher device latency range is not representative of scalability in the lower, submillisecond, latency range. We then focus on Nilfs2 as an especially alarming example of an unexpectedly poor scalability and present detailed instructions for identifying bottlenecks in the I/O stack.

ACM Transactions on Storage | 2017

vNFS: Maximizing NFS Performance with Compounds and Vectorized I/O

Ming Chen; Geetika Babu Bangera; Dean Hildebrand; Farhaan Jalia; Geoff Kuenning; Henry Nelson; Erez Zadok

Modern systems use networks extensively, accessing both services and storage across local and remote networks. Latency is a key performance challenge, and packing multiple small operations into fewer large ones is an effective way to amortize that cost, especially after years of significant improvement in bandwidth but not latency. To this end, the NFSv4 protocol supports a compounding feature to combine multiple operations. Yet compounding has been underused since its conception because the synchronous POSIX file-system API issues only one (small) request at a time. We propose vNFS, an NFSv4.1-compliant client that exposes a vectorized high-level API and leverages NFS compound procedures to maximize performance. We designed and implemented vNFS as a user-space RPC library that supports an assortment of bulk operations on multiple files and directories. We found it easy to modify several UNIX utilities, an HTTP/2 server, and Filebench to use vNFS. We evaluated vNFS under a wide range of workloads and network latency conditions, showing that vNFS improves performance even for low-latency networks. On high-latency networks, vNFS can improve performance by as much as two orders of magnitude.

international conference on big data | 2014

In unity there is strength: Showcasing a unified big data platform with MapReduce Over both object and file storage

Rui Zhang; Dean Hildebrand; Renu Tewari

Big Data platforms often need to support emerging data sources and applications while accommodating existing ones. Since different data and applications have varying requirements, multiple types of data stores (e.g. file-based and object-based) frequently co-exist in the same solution today without proper integration. Hence cross-store data access, key to effective data analytics, can not be achieved without laborious application re-programming, prohibitively expensive data migration, and/or costly maintenance of multiple data copies. We address this vital issue by introducing a first unified big data platform over heterogeneous storage. In particular, we present a prototype joining Apache Hadoop MapReduce with OpenStacks open-source object store Swift and IBMs cluster file system GPFSTM. A sentiment analysis application using 3 months of real Twitter data is employed to test and showcase our prototype. We have found that our prototype achieves 50% data capacity savings, eliminates data migration overhead, offers stronger reliability and enterprise support. Through our case study, we have learned important theoretical lessons concerning performance and reliability, as well as practical ones related to platform configuration. We have also identified several potentially high-impact research directions.

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) | 2017

In Search of the Ideal Storage Configuration for Docker Containers

Vasily Tarasov; Lukas Rupprecht; Dimitris Skourtis; Amit Warke; Dean Hildebrand; Mohamed Mohamed; Nagapramod Mandagere; Wenji Li; Raju Rangaswami; Ming Zhao

Containers are a widely successful technology today popularized by Docker. Containers improve system utilization by increasing workload density. Docker containers enable seamless deployment of workloads across development, test, and production environments. Dockers unique approach to data management, which involves frequent snapshot creation and removal, presents a new set of exciting challenges for storage systems. At the same time, storage management for Docker containers has remained largely unexplored with a dizzying array of solution choices and configuration options. In this paper we unravel the multi-faceted nature of Docker storage and demonstrate its impact on system and workload performance. As we uncover new properties of the popular Docker storage drivers, this is a sobering reminder that widespread use of new technologies can often precede their careful evaluation.

Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware | 2015

Towards a Big Data Benchmarking and Demonstration Suite for the Online Social Network Era with Realistic Workloads and Live Data

Rui Zhang; Irene Manotas; Min Li; Dean Hildebrand

The growing popularity of online social networks has taken big data analytics into uncharted territories. Newly developed platforms and analytics in these environments are in dire need for customized frameworks of evaluation and demonstration. This paper presents the first big data benchmark centering on online social network analytics and their underlying distributed platforms. The benchmark comprises of a novel data generator rooted in live online social network feeds, a uniquely comprehensive set of online social network analytics workloads, and evaluation metrics that are both system-aware and analytics-aware. In addition, the benchmark also provides application plug-ins that allow for compelling demonstration of big data solutions. We describe the benchmark design challenges, an early prototype and three use cases.

Ibm Journal of Research and Development | 2014

GPFS-based implementation of a hyperconverged system for software defined infrastructure

Alain Azagury; Robert Haas; Dean Hildebrand; Steven W. Hunter; Todd Neville; Sven Oehme; Anees Shaikh

The need for an increasingly dynamic and more cost-efficient data-center infrastructure has led to the adoption of a software defined model that is characterized by: the creation of a federated control plane to judiciously allocate and control appropriate heterogeneous infrastructure resources in an automated fashion, the ability for applications to specify criteria, such as performance, capacity, and service levels, without detailed knowledge of the underlying infrastructure; and the migration of data-plane capabilities previously embodied as purpose-built devices or firmware into software running on a standard operating systems in commercial off-the-shelf servers. This last trend of hardware-based capabilities migrating to software is enabling yet another shift to hyperconvergence, which refers to merger of traditionally separate networking, compute, and storage capabilities in integrated system software. This paper examines the convergence of the software defined infrastructure stack, and introduces a hyperconverged compute and storage architecture, in which the IBM General Parallel File System (GPFS®) implements the software defined data plane that dynamically supports workloads ranging from high-I/O virtual desktop infrastructure applications to more compute-oriented analytics applications. The performance and scalability characteristics of this architecture are evaluated with a prototype implementation.

Explore More