Shawn R. Jeffery
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shawn R. Jeffery.
international conference on pervasive computing | 2006
Shawn R. Jeffery; Gustavo Alonso; Michael J. Franklin; Wei Hong; Jennifer Widom
Pervasive applications rely on data captured from the physical world through sensor devices. Data provided by these devices, however, tend to be unreliable. The data must, therefore, be cleaned before an application can make use of them, leading to additional complexity for application development and deployment. Here we present Extensible Sensor stream Processing (ESP), a framework for building sensor data cleaning infrastructures for use in pervasive applications. ESP is designed as a pipeline using declarative cleaning mechanisms based on spatial and temporal characteristics of sensor data. We demonstrate ESPs effectiveness and ease of use through three real-world scenarios.
international conference on management of data | 2008
Shawn R. Jeffery; Michael J. Franklin; Alon Y. Halevy
A primary challenge to large-scale data integration is creating semantic equivalences between elements from different data sources that correspond to the same real-world entity or concept. Dataspaces propose a pay-as-you-go approach: automated mechanisms such as schema matching and reference reconciliation provide initial correspondences, termed candidate matches, and then user feedback is used to incrementally confirm these matches. The key to this approach is to determine in what order to solicit user feedback for confirming candidate matches. In this paper, we develop a decision-theoretic framework for ordering candidate matches for user confirmation using the concept of the value of perfect information (VPI). At the core of this concept is a utility function that quantifies the desirability of a given state; thus, we devise a utility function for dataspaces based on query result quality. We show in practice how to efficiently apply VPI in concert with this utility function to order user confirmations. A detailed experimental evaluation on both real and synthetic datasets shows that the ordering of user feedback produced by this VPI-based approach yields a dataspace with a significantly higher utility than a wide range of other ordering strategies. Finally, we outline the design of Roomba, a system that utilizes this decision-theoretic framework to guide a dataspace in soliciting user feedback in a pay-as-you-go manner.
international conference on data engineering | 2006
Shawn R. Jeffery; Gustavo Alonso; Michael J. Franklin; Wei Hong; Jennifer Widom
Data captured from the physical world through sensor devices tends to be noisy and unreliable. The data cleaning process for such data is not easily handled by standard data warehouse-oriented techniques, which do not take into account the strong temporal and spatial components of receptor data. We present Extensible receptor Stream Processing (ESP), a declarative query-based framework designed to clean the data streams produced by sensor devices.
very large data bases | 2003
Leonidas Galanis; Yuan Wang; Shawn R. Jeffery; David J. DeWitt
Querying large numbers of data sources is gaining importance due to increasing numbers of independent data providers. One of the key challenges is executing queries on all relevant information sources in a scalable fashion and retrieving fresh results. The key to scalability is to send queries only to the relevant servers and avoid wasting resources on data sources which will not provide any results. Thus, a catalog service, which would determine the relevant data sources given a query, is an essential component in efficiently processing queries in a distributed environment. This paper proposes a catalog framework which is distributed across the data sources themselves and does not require any central infrastructure. As new data sources become available, they automatically become part of the catalog service infrastructure, which allows scalability to large numbers of nodes. Furthermore, we propose techniques for workload adaptability. Using simulation and real-world data we show that our approach is valid and can scale to thousands of data sources.
very large data bases | 2008
Shawn R. Jeffery; Michael J. Franklin; Minos N. Garofalakis
Sensor devices produce data that are unreliable, low-level, and seldom able to be used directly by applications. In this paper, we propose metaphysical data independence (MDI), a layer of independence that shields applications from the challenges that arise when interacting directly with sensor devices. The key philosophy behind MDI is that applications do not deal with any aspect of physical device data, but rather interface with a high-level reconstruction of the physical world created by a sensor infrastructure. As a concrete instantiation of MDI in such a sensor infrastructure, we detail MDI-SMURF, a Radio Frequency Identification (RFID) middleware system that alleviates issues associated with using RFID data through adaptive techniques based on a novel statistical framework.
international conference on management of data | 2005
Shariq Rizvi; Shawn R. Jeffery; Sailesh Krishnamurthy; Michael J. Franklin; Nathan Burkhart; Anil Edakkunni; Linus Liang
The emergence of large-scale receptor-based systems has enabled applications to execute complex business logic over data generated from monitoring the physical world. An important functionality required by these applications is the detection and response to complex events, often in real-time. Bridging the gap between low-level receptor technology and such high-level needs of applications remains a significant challenge.We demonstrate our solution to this problem in the context of HiFi, a system we are building to solve the data management problems of large-scale receptor-based systems. Specifically, we show how HiFi generates simple events out of receptor data at its edges and provides high-functionality complex event processing mechanisms for sophisticated event detection using a real-world library scenario.
conference on advanced information systems engineering | 2003
Leonidas Galanis; Yuan Wang; Shawn R. Jeffery; David J. DeWitt
While current search engines seem to easily handle the size of the data available on the Internet, they cannot provide fresh results. The most up-to-date data always resides on the data sources. Efficiently interconnecting data providers, however, is not an easy problem. Peer-to-peer computing is the latest technology to address this problem. However, efficient query processing in peer-to-peer networks remains an open research area. In this paper, we present a performance study of a system that facilitates efficient searches of large numbers of independent data providers on the Internet. In our scenario, each data provider becomes an autonomous node in a large peer-to-peer system. Using small indices on each node, we can efficiently direct queries submitted on any node to the relevant sources. Experiments with a large peer-to-peer network demonstrate the feasibility of our approach.
very large data bases | 2004
Owen Cooper; Anil Edakkunni; Michael J. Franklin; Wei Hong; Shawn R. Jeffery; Sailesh Krishnamurthy; Fredrick Reiss; Shariq Rizvi; Eugene Wu
Advances in data acquisition and sensor technologies are leading towards the development of “High Fan-in” architectures: widely distributed systems whose edges consist of numerous receptors such as sensor networks and RFID readers and whose interior nodes consist of traditional host computers organized using the principle of successive aggregation. Such architectures pose significant new data management challenges. The HiFi system, under development at UC Berkeley, is aimed at addressing these challenges. We demonstrate an initial prototype of HiFi that uses data stream query processing to acquire, filter, and aggregate data from multiple devices including sensor motes, RFID readers, and low power gateways organized as a High Fan-in system.
international conference on management of data | 2004
Brent N. Chun; Joseph M. Hellerstein; Ryan Huebsch; Shawn R. Jeffery; Boon Thau Loo; Sam Mardanbeigi; Timothy Roscoe; Sean Rhea; Scott Shenker; Ion Stoica
We are developing a distributed query processor called PIER, which is designed to run on the scale of the entire Internet. PIER utilizes a Distributed Hash Table (DHT) as its communication substrate in order to achieve scalability, reliability, decentralized control, and load balancing. PIER enhances DHTs with declarative and algebraic query interfaces, and underneath those interfaces implements multihop, in-network versions of joins, aggregation, recursion, and query/result dissemination. PIER is currently being used for diverse applications, including network monitoring, keyword-based filesharing search, and network topology mapping. We will demonstrate PIERs functionality by showing system monitoring queries running on PlanetLab, a testbed of over 300 machines distributed across the globe.
very large data bases | 2006
Shawn R. Jeffery; Minos N. Garofalakis; Michael J. Franklin