Ryan Huebsch
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryan Huebsch.
very large data bases | 2003
Ryan Huebsch; Joseph M. Hellerstein; Nick Lanham; Boon Thau Loo; Scott Shenker; Ion Stoica
The database research community prides itself on scalable technologies. Yet database systems traditionally do not excel on one important scalability dimension: the degree of distribution. This limitation has hampered the impact of database technologies on massively distributed systems like the Internet. In this paper, we present the initial design of PIER, a massively distributed query engine based on overlay networks, which is intended to bring database query processing facilities to new, widely distributed environments. We motivate the need for massively distributed queries, and argue for a relaxation of certain traditional database research goals in the pursuit of scalability and widespread adoption. We present simulation results showing PIER gracefully running relational queries across thousands of machines, and show results from the same software base in actual deployment on a large experimental cluster.
international workshop on peer to peer systems | 2002
Matthew Harren; Joseph M. Hellerstein; Ryan Huebsch; Boon Thau Loo; Scott Shenker; Ion Stoica
Recently a new generation of P2P systems, offering distributed hash table (DHT) functionality, have been proposed. These systems greatly improve the scalability and exact-match accuracy of P2P systems, but offer only the exact-match query facility. This paper outlines a research agenda for building complex query facilities on top of these DHT-based P2P systems. We describe the issues involved and outline our research plan and current status.
international workshop on peer to peer systems | 2004
Boon Thau Loo; Ryan Huebsch; Ion Stoica; Joseph M. Hellerstein
Popular P2P file-sharing systems like Gnutella and Kazaa use unstructured network designs. These networks typically adopt flooding-based search techniques to locate files. While flooding-based techniques are effective for locating highly replicated items, they are poorly suited for locating rare items. As an alternative, a wide variety of structured P2P networks such as distributed hash tables (DHTs) have been recently proposed. Structured networks can efficiently locate rare items, but they incur significantly higher overheads than unstructured P2P networks for popular files. Through extensive measurements of the Gnutella network from multiple vantage points, we argue for a hybrid search solution, where structured search techniques are used to index and locate rare items, and flooding techniques are used for locating highly replicated content. To illustrate, we present experimental results of a prototype implementation that runs at multiple sites on PlanetLab and participates live on the Gnutella network.
very large data bases | 2004
Boon Thau Loo; Joseph M. Hellerstein; Ryan Huebsch; Scott Shenker; Ion Stoica
In this paper, we address the problem of designing a scalable, accurate query processor for peer-to-peer filesharing and similar distributed keyword search systems. Using a globally-distributed monitoring infrastructure, we perform an extensive study of the Gnutella filesharing network, characterizing its topology, data and query workloads. We observe that Gnutellas query processing approach performs well for popular content, but quite poorly for rare items with few replicas. We then consider an alternate approach based on Distributed Hash Tables (DHTs). We describe our implementation of PIERSearch, a DHT-based system, and propose a hybrid system where Gnutella is used to locate popular items, and PIERSearch for handling rare items. We develop an analytical model of the two approaches, and use it in concert with our Gnutella traces to study the trade-off between query recall and system overhead of the hybrid system. We evaluate a variety of localized schemes for identifying items that are rare and worth handling via the DHT. Lastly, we show in a live deployment on fifty nodes on two continents that it nicely complements Gnutella in its ability to handle rare items.
international conference on management of data | 2007
Ryan Huebsch; Minos N. Garofalakis; Joseph M. Hellerstein; Ion Stoica
An emerging challenge in modern distributed querying is to efficiently process multiple continuous aggregation queries simultaneously. Processing each query independently may be infeasible, so multi-query optimizations are critical for sharing work across queries. The challenge is to identify overlapping computations that may not be obvious in the queries themselves. In this paper, we reveal new opportunities for sharing work in the context of distributed aggregation queries that vary in their selection predicates. We identify settings in which a large set of q such queries can be answered by executing k << q different queries. The k queries are revealed by analyzing a boolean matrix capturing the connection between data and the queries that they satisfy, in a manner akin to familiar techniques like Gaussian elimination. Indeed, we identify a class of linear aggregate functions (including SUM, COUNT and AVERAGE), and show that the sharing potential for such queries can be optimally recovered using standard matrix decompositions from computational linear algebra. For some other typical aggregation functions (including MIN and MAX) we find that optimal sharing maps to the NP-hard set basis problem. However, for those scenarios, we present a family of heuristic algorithms and demonstrate that they perform well for moderate-sized matrices. We also present a dynamic distributed system architecture to exploit sharing opportunities, and experimentally evaluate the benefits of our techniques via a novel, flexible random workload generator we develop for this setting.
international conference on management of data | 2004
Brent N. Chun; Joseph M. Hellerstein; Ryan Huebsch; Shawn R. Jeffery; Boon Thau Loo; Sam Mardanbeigi; Timothy Roscoe; Sean Rhea; Scott Shenker; Ion Stoica
We are developing a distributed query processor called PIER, which is designed to run on the scale of the entire Internet. PIER utilizes a Distributed Hash Table (DHT) as its communication substrate in order to achieve scalability, reliability, decentralized control, and load balancing. PIER enhances DHTs with declarative and algebraic query interfaces, and underneath those interfaces implements multihop, in-network versions of joins, aggregation, recursion, and query/result dissemination. PIER is currently being used for diverse applications, including network monitoring, keyword-based filesharing search, and network topology mapping. We will demonstrate PIERs functionality by showing system monitoring queries running on PlanetLab, a testbed of over 300 machines distributed across the globe.
conference on innovative data systems research | 2005
Ryan Huebsch; Brent N. Chun; Joseph M. Hellerstein; Boon Thau Loo; Petros Maniatis; Timothy Roscoe; Scott Shenker; Ion Stoica; Aydan R. Yumerefendi
Archive | 2010
Prashanth Pappu; Asad K. Awan; Aditya Ganjam; Ryan Huebsch
Archive | 2003
Boon Thau Loo; Ryan Huebsch; Joseph M. Hellerstein; Timothy Roscoe; Ion Stoica
Archive | 2003
Ryan Huebsch; Shawn R. Jeffery