Adam Silberstein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam Silberstein is active.

Explore More

Publication

Featured researches published by Adam Silberstein.

very large data bases | 2008

PNUTS: Yahoo!'s hosted data serving platform

Brian F. Cooper; Raghu Ramakrishnan; Utkarsh Srivastava; Adam Silberstein; Philip Bohannon; Hans-Arno Jacobsen; Nick Puz; Daniel Weaver; Ramana Yerneni

We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

international conference on management of data | 2009

Asynchronous view maintenance for VLSD databases

Parag Agrawal; Adam Silberstein; Brian F. Cooper; Utkarsh Srivastava; Raghu Ramakrishnan

The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.

international conference on management of data | 2008

Efficient bulk insertion into a distributed ordered table

Adam Silberstein; Brian F. Cooper; Utkarsh Srivastava; Erik Vee; Ramana Yerneni; Raghu Ramakrishnan

We study the problem of bulk-inserting records into tables in a system that horizontally range-partitions data over a large cluster of shared-nothing machines. Each table partition contains a contiguous portion of the tables key range, and must accept all records inserted into that range. Examples of such systems include BigTable[8] at Google, and PNUTS [15] at Yahoo! During bulk inserts into an existing table, if most of the inserted records end up going into a small number of data partitions, the obtained throughput may be very poor due to ineffective use of cluster parallelism. We propose a novel approach in which a planning phase is invoked before the actual insertions. By creating new partitions and intelligently distributing partitions across machines, the planning phase ensures that the insertion load will be well-balanced. Since there is a tradeoff between the cost of moving partitions and the resulting throughput gain, the planning phase must minimize the sum of partition movement time and insertion time. We show that this problem is a variation of NP-hard bin-packing, reduce it to a problem of packing vectors, and then give a solution with provable approximation guarantees. We evaluate our approach on a prototype system deployed on a cluster of 50 machines, and show that it yields significant improvements over more naïve techniques.

international conference on management of data | 2011

A batch of PNUTS: experiences connecting cloud batch and serving systems

Adam Silberstein; Russell Sears; Wenchao Zhou; Brian F. Cooper

Cloud data management systems are growing in prominence, particularly at large Internet companies like Google, Yahoo!, and Amazon, which prize them for their scalability and elasticity. Each of these systems trades off between low-latency serving performance and batch processing throughput. In this paper, we discuss our experience running batch-oriented Hadoop on top of Yahoos serving-oriented PNUTS system instead of the standard HDFS file system. Though PNUTS is optimized for and primarily used for serving, a number of applications at Yahoo! must run batch-oriented jobs that read or write data that is stored in PNUTS. Combining these systems reveals several key areas where the fundamental properties of each system are mismatched. We discuss our approaches to accommodating these mismatches, by either bending the batch and serving abstractions, or inventing new ones. Batch systems like Hadoop provide coarse task-level recovery, while serving systems like PNUTS provide finer record or transaction-level recovery. We combine both types to log record-level errors, while detecting and recovering from large-scale errors. Batch systems optimize for read and write throughput of large requests, while serving systems use indexing to provide low latency access to individual records. To improve latency-insensitive write throughput to PNUTS, we introduce a batch write path. The systems provide conflicting consistency models, and we discuss techniques to isolate them from one another.

very large data bases | 2009

Adaptively parallelizing distributed range queries

Ymir Vigfusson; Adam Silberstein; Brian F. Cooper; Rodrigo Fonseca

We consider the problem of how to best parallelize range queries in a massive scale distributed database. In traditional systems the focus has been on maximizing parallelism, for example by laying out data to achieve the highest throughput. However, in a massive scale database such as our PNUTS system [11] or BigTable [10], maximizing parallelism is not necessarily the best strategy: the system has more than enough servers to saturate a single client by returning results faster than the client can consume them, and when there are multiple concurrent queries, maximizing parallelism for all of them will cause disk contention, reducing everybodys performance. How can we find the right parallelism level for each query in order to achieve high, consistent throughput for all queries? We propose an adaptive approach with two aspects. First, we adaptively determine the ideal parallelism for a single query execution, which is the minimum number of parallel scanning servers needed to satisfy the client, depending on query selectivity, client load, client-server bandwidth, and so on. Second, we adaptively schedule which servers will be assigned to different query executions, to minimize disk contention on servers and ensure that all queries receive good performance. Our scheduler can be tuned based on different policies, such as favoring short versus long queries or high versus low priority queries. An experimental study demonstrates the effectiveness of our techniques in the PNUTS system.

symposium on cloud computing | 2010

Benchmarking cloud serving systems with YCSB

Brian F. Cooper; Adam Silberstein; Erwin Tam; Raghu Ramakrishnan; Russell Sears

international conference on management of data | 2010

Feeding frenzy: selectively materializing users' event feeds

Adam Silberstein; Jeff Terrace; Brian F. Cooper; Raghu Ramakrishnan

usenix annual technical conference | 2008

Automatic optimization of parallel dataflow programs

Christopher Olston; Benjamin Reed; Adam Silberstein; Utkarsh Srivastava

IEEE Data(base) Engineering Bulletin | 2009

Challenges, Techniques and Directions in Building XSeek: an XML Search Engine.

Brian F. Cooper; Eric Baldeschwieler; Rodrigo Fonseca; James J. Kistler; P. P. S. Narayan; Chuck Neerdaels; Toby Negrin; Raghu Ramakrishnan; Adam Silberstein; Utkarsh Srivastava; Raymie Stata

Proceedings of The Vldb Endowment | 2011