Russell Sears | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Russell Sears is active.

Explore More

Publication

Featured researches published by Russell Sears.

international conference on datalog in academia and industry | 2010

Dedalus: datalog in time and space

Peter Alvaro; William R. Marczak; Neil Conway; Joseph M. Hellerstein; David Maier; Russell Sears

Recent research has explored using Datalog-based languages to express a distributed system as a set of logical invariants. Two properties of distributed systems proved difficult to model in Datalog. First, the state of any such system evolves with its execution. Second, deductions in these systems may be arbitrarily delayed, dropped, or reordered by the unreliable network links they must traverse. Previous efforts addressed the former by extending Datalog to include updates, key constraints, persistence and events, and the latter by assuming ordered and reliable delivery while ignoring delay. These details have a semantics outside Datalog, which increases the complexity of the language and its interpretation, and forces programmers to think operationally. We argue that the missing component from these previous languages is a notion of time. In this paper we present Dedalus, a foundation language for programming and reasoning about distributed systems. Dedalus reduces to a subset of Datalog with negation, aggregate functions, successor and choice, and adds an explicit notion of logical time to the language. We show that Dedalus provides a declarative foundation for the two signature features of distributed systems: mutable state, and asynchronous processing and communication. Given these two features, we address two important properties of programs in a domain-specific manner: a notion of safety appropriate to non-terminating computations, and stratified monotonic reasoning with negation over time. We also provide conservative syntactic checks for our temporal notions of safety and stratification. Our experience implementing full-featured systems in variants of Datalog suggests that Dedalus is well-suited to the specification of rich distributed services and protocols, and provides both cleaner semantics and richer tests of correctness.

international conference on management of data | 2012

Walnut: a unified cloud object store

Jianjun Chen; Chris Douglas; Michi Mutsuzaki; Patrick Quaid; Raghu Ramakrishnan; Sriram Rao; Russell Sears

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage. In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.

Operating Systems Review | 2010

I do declare: consensus in a logic language

Peter Alvaro; Tyson Condie; Neil Conway; Joseph M. Hellerstein; Russell Sears

The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implement in practice. We recount our experience building Paxos in Overlog, a distributed declarative programming language. We found that the Paxos algorithm is easily translated to declarative logic, in large part because the primitives used in consensus protocol specifications map directly to simple Overlog constructs such as aggregation and selection. We discuss the programming idioms that appear frequently in our implementation, and the applicability of declarative programming to related application domains.

international conference on management of data | 2015

REEF: Retainable Evaluator Execution Framework

Markus Weimer; Yingda Chen; Byung-Gon Chun; Tyson Condie; Carlo Curino; Chris Douglas; Yunseong Lee; Tony Majestro; Dahlia Malkhi; Sergiy Matusevych; Brandon Myers; Shravan M. Narayanamurthy; Raghu Ramakrishnan; Sriram Rao; Russell Sears; Beysim Sezgin; Julia Wang

Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning algorithm, and a port of the CORFU [4] system. REEF is also currently an Apache Incubator project that has attracted contributors from several instititutions.1 http://reef.incubator.apache.org

international conference on management of data | 2011

A batch of PNUTS: experiences connecting cloud batch and serving systems

Adam Silberstein; Russell Sears; Wenchao Zhou; Brian F. Cooper

Cloud data management systems are growing in prominence, particularly at large Internet companies like Google, Yahoo!, and Amazon, which prize them for their scalability and elasticity. Each of these systems trades off between low-latency serving performance and batch processing throughput. In this paper, we discuss our experience running batch-oriented Hadoop on top of Yahoos serving-oriented PNUTS system instead of the standard HDFS file system. Though PNUTS is optimized for and primarily used for serving, a number of applications at Yahoo! must run batch-oriented jobs that read or write data that is stored in PNUTS. Combining these systems reveals several key areas where the fundamental properties of each system are mismatched. We discuss our approaches to accommodating these mismatches, by either bending the batch and serving abstractions, or inventing new ones. Batch systems like Hadoop provide coarse task-level recovery, while serving systems like PNUTS provide finer record or transaction-level recovery. We combine both types to log record-level errors, while detecting and recovering from large-scale errors. Batch systems optimize for read and write throughput of large requests, while serving systems use indexing to provide low latency access to individual records. To improve latency-insensitive write throughput to PNUTS, we introduce a batch write path. The systems provide conflicting consistency models, and we discuss techniques to isolate them from one another.

ACM Transactions on Computer Systems | 2017

Apache REEF: Retainable Evaluator Execution Framework

Byung-Gon Chun; Tyson Condie; Yingda Chen; Brian Cho; Andrew Chung; Carlo Curino; Chris Douglas; Matteo Interlandi; Beomyeol Jeon; Joo Seong Jeong; Gyewon Lee; Yunseong Lee; Tony Majestro; Dahlia Malkhi; Sergiy Matusevych; Brandon Myers; Mariia Mykhailova; Shravan M. Narayanamurthy; Joseph Noor; Raghu Ramakrishnan; Sriram Rao; Russell Sears; Beysim Sezgin; Taegeon Um; Julia Wang; Markus Weimer; Youngseok Yang

Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault tolerance, task scheduling and coordination) and reimplement common mechanisms (e.g., caching, bulk-data transfers). This article presents REEF, a development framework that provides a control plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource reuse for data caching and state management abstractions that greatly ease the development of elastic data processing pipelines on cloud platforms that support a Resource Manager service. We illustrate the power of REEF by showing applications built atop: a distributed shell application, a machine-learning framework, a distributed in-memory caching system, and a port of the CORFU system. REEF is currently an Apache top-level project that has attracted contributors from several institutions and it is being used to develop several commercial offerings such as the Azure Stream Analytics service.

ACM Transactions on Storage | 2018

Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems

Haryadi S. Gunawi; Riza O. Suminto; Russell Sears; Casey Golliher; Swaminathan Sundararaman; Xing Lin; Tim Emami; Weiguang Sheng; Nematollah Bidokhti; Caitie McCaffrey; Deepthi Srinivasan; Biswaranjan Panda; Andrew Baptist; Gary Grider; Parks Fields; Kevin Harms; Robert B. Ross; Andree Jacobson; Robert Ricci; Kirk Webb; Peter Alvaro; H. Birali Runesha; Mingzhe Hao; Huaicheng Li

Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 14 institutions. We show that all hardware types such as disk, SSD, CPU, memory, and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.

symposium on cloud computing | 2010