Is this you? Create Your Porfile

Tyson Condie

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tyson Condie is active.

Explore More

Publication

Featured researches published by Tyson Condie.

symposium on operating systems principles | 2005

Implementing declarative overlays

Boon Thau Loo; Tyson Condie; Joseph M. Hellerstein; Petros Maniatis; Timothy Roscoe; Ion Stoica

Overlay networks are used today in a variety of distributed systems ranging from file-sharing and storage systems to communication infrastructures. However, designing, building and adapting these overlays to the intended application and the target environment is a difficult and time consuming process.To ease the development and the deployment of such overlay networks we have implemented P2, a system that uses a declarative logic language to express overlay networks in a highly compact and reusable form. P2 can express a Narada-style mesh network in 16 rules, and the Chord structured overlay in only 47 rules. P2 directly parses and executes such specifications using a dataflow architecture to construct and maintain overlay networks. We describe the P2 approach, how our implementation works, and show by experiment its promising trade-off point between specification complexity and performance.

acm special interest group on data communication | 2006

ROFL: routing on flat labels

Matthew Caesar; Tyson Condie; Jayanthkumar Kannan; Karthik Lakshminarayanan; Ion Stoica; Scott Shenker

It is accepted wisdom that the current Internet architecture conflates network locations and host identities, but there is no agreement on how a future architecture should distinguish the two. One could sidestep this quandary by routing directly on host identities themselves, and eliminating the need for network-layer protocols to include any mention of network location. The key to achieving this is the ability to route on flat labels. In this paper we take an initial stab at this challenge, proposing and analyzing our ROFL routing algorithm. While its scaling and efficiency properties are far from ideal, our results suggest that the idea of routing on flat labels cannot be immediately dismissed.

international conference on management of data | 2006

Declarative networking: language, execution and optimization

Boon Thau Loo; Tyson Condie; Minos N. Garofalakis; Joseph M. Hellerstein; Petros Maniatis; Raghu Ramakrishnan; Timothy Roscoe; Ion Stoica

The networking and distributed systems communities have recently explored a variety of new network architectures, both for application-level overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query processing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the traditional semi-naïve query evaluation technique, to overcome fundamental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the iheventual consistencyl. of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applications of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed.

international world wide web conferences | 2005

LSH forest: self-tuning indexes for similarity search

Mayank Bawa; Tyson Condie; Prasanna Ganesan

We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of settings: Web search engines desire fast, parallel, main-memory-based indexes for similarity search on text data; database systems desire disk-based similarity indexes for high-dimensional data, including text and images; peer-to-peer systems desire distributed similarity indexes with low communication cost. We propose an indexing scheme called LSH Forest which is applicable in all the above contexts. Our index uses the well-known technique of locality-sensitive hashing (LSH), but improves upon previous designs by (a) eliminating the different data-dependent parameters for which LSH must be constantly hand-tuned, and (b) improving on LSHs performance guarantees for skewed data distributions while retaining the same storage and query overhead. We show how to construct this index in main memory, on disk, in parallel systems, and in peer-to-peer systems. We evaluate the design with experiments on multiple text corpora and demonstrate both the self-tuning nature and the superior performance of LSH Forest.

Communications of The ACM | 2009

Declarative networking

Boon Thau Loo; Tyson Condie; Minos N. Garofalakis; Joseph M. Hellerstein; Petros Maniatis; Raghu Ramakrishnan; Timothy Roscoe; Ion Stoica

Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. This paper provides an introduction to basic issues in declarative networking, including language design, optimization, and dataflow execution. We present the intuition behind declarative programming of networks, including roots in Datalog, extensions for networked environments, and the semantics of long-running queries over network state. We focus on a sublanguage we call Network Datalog (NDlog), including execution strategies that provide crisp eventual consistency semantics with significant flexibility in execution. We also describe a more general language called Overlog, which makes some compromises between expressive richness and semantic guarantees. We provide an overview of declarative network protocols, with a focus on routing protocols and overlay networks. Finally, we highlight related work in declarative networking, and new declarative approaches to related problems.

international conference on peer-to-peer computing | 2004

Adaptive peer-to-peer topologies

Tyson Condie; Sepandar D. Kamvar; Hector Garcia-Molina

We present a peer-level protocol for forming adaptive, self-organizing topologies for data-sharing P2P networks. This protocol is based on the idea that a peer should directly connect to those peers from which it is most likely to download satisfactory content. We show that the resulting topologies are more efficient than standard Gnutella topologies. Furthermore, we show that these adaptive topologies have the added benefits of increased resistance to certain types of attacks, intrinsic rewards for active peers and punishments for malicious peers and free riders.

very large data bases | 2008

Evita raced: metacompilation for declarative networks

Tyson Condie; David Chu; Joseph M. Hellerstein; Petros Maniatis

Declarative languages have recently been proposed for many new applications outside of traditional data management. Since these are relatively early research efforts, it is important that the architectures of these declarative systems be extensible, in order to accommodate unforeseen needs in these new domains. In this paper, we apply the lessons of declarative systems to the internals of a declarative engine. Specifically, we describe our design and implementation of Evita Raced, an extensible compiler for the OverLog language used in our declarative networking system, P2. Evita Raced is a metacompiler: an OverLog compiler written in OverLog. We describe the minimalist architecture of Evita Raced, including its extensibility interfaces and its reuse of P2s data model and runtime engine. We demonstrate that a declarative language like OverLog is well-suited to expressing traditional and novel query optimizations as well as other query manipulations, in a compact and natural fashion. Finally, we present initial results of Evita Raced extended with various optimization programs, running on both Internet overlay networks and wireless sensor networks.

very large data bases | 2014

Pregelix: Big(ger) graph analytics on a dataflow engine

Yingyi Bu; Vinayak R. Borkar; Jianfeng Jia; Michael J. Carey; Tyson Condie

There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15X speedup compared to Apache Giraph and up to 35X speedup compared to distributed GraphLab), and more effective use of available machine resources to support Big(ger) Graph Analytics.

very large data bases | 2015

Titian: data provenance support in Spark

Matteo Interlandi; Kshitij Shah; Sai Deep Tetali; Muhammad Ali Gulzar; Seunghyun Yoo; Miryung Kim; Todd D. Millstein; Tyson Condie

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Today’s DISC systems offer very little tooling for debugging programs, and as a result programmers spend countless hours collecting evidence (e.g., from log files) and performing trial and error debugging. To aid this effort, we built Titian, a library that enables data provenance—tracking data through transformations—in Apache Spark. Data scientists using the Titian Spark extension will be able to quickly identify the input data at the root cause of a potential bug or outlier result. Titian is built directly into the Spark platform and offers data provenance support at interactive speeds—orders-of-magnitude faster than alternative solutions—while minimally impacting Spark job performance; observed overheads for capturing data lineage rarely exceed 30% above the baseline job execution time.

international conference on management of data | 2013

Machine learning for big data

Tyson Condie; Paul Mineiro; Neoklis Polyzotis; Markus Weimer

Statistical Machine Learning has undergone a phase transition from a pure academic endeavor to being one of the main drivers of modern commerce and science. Even more so, recent results such as those on tera-scale learning [1] and on very large neural networks [2] suggest that scale is an important ingredient in quality modeling. This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. The tutorial covers current large scale applications of Machine Learning, their computational model and the workflow behind building those. Based on this foundation, we present the current state-of-the-art in systems support in the bulk of the tutorial. We also identify critical gaps in the state-of-the-art. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.

Explore More