Chris Douglas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Douglas is active.

Explore More

Publication

Featured researches published by Chris Douglas.

symposium on cloud computing | 2014

Reservation-based Scheduling: If You're Late Don't Blame Us!

Carlo Curino; Djellel Eddine Difallah; Chris Douglas; Subru Krishnan; Raghu Ramakrishnan; Sriram Rao

The continuous shift towards data-driven approaches to business, and a growing attention to improving return on investments (ROI) for cluster infrastructures is generating new challenges for big-data frameworks. Systems originally designed for big batch jobs now handle an increasingly complex mix of computations. Moreover, they are expected to guarantee stringent SLAs for production jobs and minimize latency for best-effort jobs. In this paper, we introduce reservation-based scheduling, a new approach to this problem. We develop our solution around four key contributions: 1) we propose a reservation definition language (RDL) that allows users to declaratively reserve access to cluster resources, 2) we formalize planning of current and future cluster resources as a Mixed-Integer Linear Programming (MILP) problem, and propose scalable heuristics, 3) we adaptively distribute resources between production jobs and best-effort jobs, and 4) we integrate all of this in a scalable system named Rayon, that builds upon Hadoop / YARN. We evaluate Rayon on a 256-node cluster against workloads derived from Microsoft, Yahoo!, Facebook, and Cloud-eras clusters. To enable practical use of Rayon, we open-sourced our implementation as part of Apache Hadoop 2.6.

international conference on management of data | 2012

Walnut: a unified cloud object store

Jianjun Chen; Chris Douglas; Michi Mutsuzaki; Patrick Quaid; Raghu Ramakrishnan; Sriram Rao; Russell Sears

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage. In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.

international conference on management of data | 2015

REEF: Retainable Evaluator Execution Framework

Markus Weimer; Yingda Chen; Byung-Gon Chun; Tyson Condie; Carlo Curino; Chris Douglas; Yunseong Lee; Tony Majestro; Dahlia Malkhi; Sergiy Matusevych; Brandon Myers; Shravan M. Narayanamurthy; Raghu Ramakrishnan; Sriram Rao; Russell Sears; Beysim Sezgin; Julia Wang

Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning algorithm, and a port of the CORFU [4] system. REEF is also currently an Apache Incubator project that has attracted contributors from several instititutions.1 http://reef.incubator.apache.org

international conference on management of data | 2017

Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics

Raghu Ramakrishnan; Baskar Sridharan; John R. Douceur; Pavan Kasturi; Balaji Krishnamachari-Sampath; Karthick Krishnamoorthy; Peng Li; Mitica Manu; Spiro Michaylov; Rogerio Ramos; Neil Sharman; Zee Xu; Youssef Barakat; Chris Douglas; Richard P. Draves; Shrikant S. Naidu; Shankar Shastry; Atul Sikaria; Simon Sun; Ramarathnam Venkatesan

Azure Data Lake Store (ADLS) is a fully-managed, elastic, scalable, and secure file system that supports Hadoop distributed file system (HDFS) and Cosmos semantics. It is specifically designed and optimized for a broad spectrum of Big Data analytics that depend on a very high degree of parallel reads and writes, as well as collocation of compute and data for high bandwidth and low-latency access. It brings together key components and features of Microsoft?s Cosmos file system-long used by internal customers at Microsoft and HDFS, and is a unified file storage solution for analytics on Azure. Internal and external workloads run on this unified platform. Distinguishing aspects of ADLS include its design for handling multiple storage tiers, exabyte scale, and comprehensive security and data sharing features. We present an overview of ADLS architecture, design points, and performance.

ACM Transactions on Computer Systems | 2017

Apache REEF: Retainable Evaluator Execution Framework

Byung-Gon Chun; Tyson Condie; Yingda Chen; Brian Cho; Andrew Chung; Carlo Curino; Chris Douglas; Matteo Interlandi; Beomyeol Jeon; Joo Seong Jeong; Gyewon Lee; Yunseong Lee; Tony Majestro; Dahlia Malkhi; Sergiy Matusevych; Brandon Myers; Mariia Mykhailova; Shravan M. Narayanamurthy; Joseph Noor; Raghu Ramakrishnan; Sriram Rao; Russell Sears; Beysim Sezgin; Taegeon Um; Julia Wang; Markus Weimer; Youngseok Yang

Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk-data transfers). This article presents REEF, a development framework that provides a control plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource reuse for data caching and state management abstractions that greatly ease the development of elastic data processing pipelines on cloud platforms that support a Resource Manager service. We illustrate the power of REEF by showing applications built atop: a distributed shell application, a machine-learning framework, a distributed in-memory caching system, and a port of the CORFU system. REEF is currently an Apache top-level project that has attracted contributors from several institutions and it is being used to develop several commercial offerings such as the Azure Stream Analytics service.

symposium on cloud computing | 2018

Netco: Cache and I/O Management for Analytics over Disaggregated Stores.

Virajith Jalaparti; Chris Douglas; Mainak Ghosh; Ashvin Agrawal; Avrilia Floratou; Srikanth Kandula; Ishai Menache; Joseph Naor; Sriram Rao

We consider a common setting where storage is disaggregated from the compute in data-parallel systems. Colocating caching tiers with the compute machines can reduce load on the interconnect but doing so leads to new resource management challenges. We design a system Netco, which prefetches data into the cache (based on workload predictability), and appropriately divides the cache space and network bandwidth between the prefetches and serving ongoing jobs. Netco makes various decisions (what content to cache, when to cache and how to apportion bandwidth) to support end-to-end optimization goals such as maximizing the number of jobs that meet their service-level objectives (e.g., deadlines). Our implementation of these ideas is available within the open-source Apache HDFS project. Experiments on a public cloud, with production-trace inspired workloads, show that Netco uses up to 5x less remote I/O compared to existing techniques and increases the number of jobs that meet their deadlines up to 80%.

international conference on data engineering | 2015

Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData

Chris Douglas; Carlo Curino

This tutorial is organized in two parts. In the first half, we will present an overview of applications and services in the BigData ecosystem. We will use known distributed database and systems literature as landmarks to orient the attendees in this fast-evolving space. Throughout, we will contrast models of resource management, performance, and the constraints that shape the architectures of prominent systems. We will also discuss the role of academia and industry in the development of open-source infrastructure, with an emphasis on open problems and strategies for collaboration. We assume only basic familiarity with distributed systems. In the second half, we will delve into Apache Hadoop YARN. YARN (Yet Another Resource Negotiator) transformed Hadoop from a MapReduce engine to a general-purpose cluster scheduler. Since its introduction, it has been deployed in production and extended to support use cases beyond large-scale batch processing. The tutorial will present the active research and development supporting such heterogeneous workloads, with particular attention to multi-tenant scheduling. Topics include security, resource isolation, protocols, and preemption. This portion will be detailed, but accessible to anyone with a background in distributed systems and all attendees of the first half of the tutorial.

symposium on cloud computing | 2013

Apache Hadoop YARN: yet another resource negotiator

Vinod Kumar Vavilapalli; Arun C. Murthy; Chris Douglas; Sharad Agarwal; Mahadev Konar; Robert Evans; Thomas Graves; Jason Lowe; Hitesh Shah; Siddharth Seth; Bikas Saha; Carlo Curino; Owen O'Malley; Sanjay Radia; Benjamin Reed; Eric Baldeschwieler

usenix annual technical conference | 2015

Mercury: hybrid centralized and distributed scheduling in large shared clusters

Konstantinos Karanasos; Sriram Rao; Carlo Curino; Chris Douglas; Kishore R. Chaliparambil; Giovanni Matteo Fumarola; Solom Heddaya; Raghu Ramakrishnan; Sarvesh Sakalanaga

very large data bases | 2013

REEF: retainable evaluator execution framework

Byung-Gon Chun; Tyson Condie; Carlo Curino; Chris Douglas; Sergiy Matusevych; Brandon Myers; Shravan M. Narayanamurthy; Raghu Ramakrishnan; Sriram Rao; Josh Rosen; Russell Sears; Markus Weimer

Explore More