Julie Letchner | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julie Letchner is active.

Explore More

Publication

Featured researches published by Julie Letchner.

international conference on management of data | 2008

Event queries on correlated probabilistic streams

Christopher Ré; Julie Letchner; Magdalena Balazinksa; Dan Suciu

A major problem in detecting events in streams of data is that the data can be imprecise (e.g. RFID data). However, current state-ofthe-art event detection systems such as Cayuga [14], SASE [46] or SnoopIB[1], assume the data is precise. Noise in the data can be captured using techniques such as hidden Markov models. Inference on these models creates streams of probabilistic events which cannot be directly queried by existing systems. To address this challenge we propose Lahar1, an event processing system for probabilistic event streams. By exploiting the probabilistic nature of the data, Lahar yields a much higher recall and precision than deterministic techniques operating over only the most probable tuples. By using a novel static analysis and novel algorithms, Lahar processes data orders of magnitude more efficiently than a naïve approach based on sampling. In this paper, we present Lahars static analysis and core algorithms. We demonstrate the quality and performance of our approach through experiments with our prototype implementation and comparisons with alternate methods.

SAE World Congress & Exhibition | 2007

Map Matching with Travel Time Constraints

John Krumm; Eric Horvitz; Julie Letchner

Map matching determines which road a vehicle is on based on inaccurate measured locations, such as GPS points. Simple algorithms, such as nearest road matching, fail often. We introduce a new algorithm that finds a sequence of road segments which simultaneously match the measured locations and which are traversable in the time intervals associated with the measurements. The time constraint, implemented with a hidden Markov model, greatly reduces the errors made by nearest road matching. We trained and tested the new algorithm on data taken from a large pool of real drivers.

international conference on mobile systems, applications, and services | 2008

Cascadia: A System for Specifying, Detecting, and Managing RFID Events

Evan Welbourne; Nodira Khoussainova; Julie Letchner; Yang Li; Magdalena Balazinska; Gaetano Borriello; Dan Suciu

Cascadia is a system that provides RFID-based pervasive computing applications with an infrastructure for specifying, extracting and managing meaningful high-level events from raw RFID data. Cascadia provides three important services. First, it allows application developers and even users to specify events using either a declarative query language or an intuitive visual language based on direct manipulation. Second, it provides an API that facilitates the development of applications which rely on RFID-based events. Third, it automatically detects the specified events, forwards them to registered applications and stores them for later use (e.g., for historical queries). We present the design and implementation of Cascadia along with an evaluation that includes both a user study and measurements on traces collected in a building-wide RFID deployment. To demonstrate how Cascadia facilitates application development, we built a simple digital diary application in the form of a calendar that populates itself with RFID-based events. Cascadia copes with ambiguous RFID data and limitations in an RFID deployment by transforming RFID readings into probabilistic events. We show that this approach outperforms deterministic event detection techniques while avoiding the need to specify and train sophisticated models.

international conference on data engineering | 2009

Access Methods for Markovian Streams

Julie Letchner; Christopher Ré; Magdalena Balazinska; Matthai Philipose

Model-based views have recently been proposed as an effective method for querying noisy sensor data. Commonly used models from the AI literature (e.g., the hidden Markov model) expose to applications a stream of probabilistic and correlated state estimates computed from the sensor data. Many applications want to detect sophisticated patterns of states from these Markovian streams. Such queries are called event queries. In this paper, we present a new Markovian stream storage manager, Caldera. We develop and evaluate Caldera as a component of Lahar, a Markovian stream event query processing system developed in previous work. At the heart of Caldera is a set of access methods for Markovian streams that can improve event query performance by orders of magnitude compared to existing techniques, which must scan the entire stream. Our access methods use new adaptations of traditional B+ tree indexes, and a new index, called the Markov-chain index. They efficiently extract only the relevant timesteps from a stream, while retaining the streams Markovian properties. We have implemented our prototype system on BDB and demonstrate its effectiveness on both synthetic data and real data from a building-wide RFID deployment.

international conference on data engineering | 2010

Approximation trade-offs in Markovian stream processing: An empirical study

Julie Letchner; Christopher Ré; Magdalena Balazinska; Matthai Philipose

A large amount of the worlds data is both sequential and imprecise. Such data is commonly modeled as Markovian streams; examples include words/sentences inferred from raw audio signals, or discrete location sequences inferred from RFID or GPS data. The rich semantics and large volumes of these streams make them difficult to query efficiently. In this paper, we study the effects-on both efficiency and accuracy-of two common stream approximations. Through experiments on a realworld RFID data set, we identify conditions under which these approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary.

very large data bases | 2009

Lahar demonstration: warehousing Markovian streams

Julie Letchner; Christopher Ré; Magdalena Balazinska; Matthai Philipose

Lahar is a warehousing system for Markovian streams---a common class of uncertain data streams produced via inference on probabilistic models. Example Markovian streams include text inferred from speech, location streams inferred from GPS or RFID readings, and human activity streams inferred from sensor data. Lahar supports OLAP-style queries on Markovian stream archives by leveraging novel approximation and indexing techniques that efficiently manipulate stream probabilities. This demonstration allows users to interactively query a warehouse of imprecise text streams inferred automatically from audio podcasts. Through this interaction, the demo introduces users to the challenges of Markovian stream processing as well as technical contributions developed to address these challenges.

IEEE Internet Computing | 2008

Challenges for Event Queries over Markovian Streams

Julie Letchner; Christopher Ré; Magdalena Balazinska; Matthai Philipose

Building applications on top of sensor data streams is challenging because sensor data is noisy. A model-based view can reduce noise by transforming raw sensor streams into streams of probabilistic state estimates, which smooth out errors and gaps. The authors propose a novel model-based view, the Markovian stream, to represent correlated probabilistic sequences. Applications interested in evaluating event queries-extracting sophisticated state sequences-can improve robustness by querying a Markovian stream view instead of querying raw data directly. The primary challenge is to properly handle the Markovian streams correlations.

Information Systems | 2014

Approximation trade-offs in a Markovian stream warehouse: An empirical study

Julie Letchner; Magdalena Balazinska; Christopher Ré; Matthai Philipose

A large amount of the worlds data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these imprecise streams are difficult to query efficiently because of their rich semantics and large volumes, forcing applications to sacrifice either performance or accuracy. There exists little work, however, that characterizes this trade-off space and helps applications make an appropriate choice. In this paper, we study the effects - on both efficiency and accuracy - of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models. We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries-we develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries.

international conference on management of data | 2008

A demonstration of Cascadia through a digital diary application

Nodira Khoussainova; Evan Welbourne; Magdalena Balazinska; Gaetano Borriello; Garrett Cole; Julie Letchner; Yang Li; Christopher Ré; Dan Suciu; Jordan Walke

The Cascadia system provides RFID-based pervasive computing applications with an infrastructure for specifying, extracting and managing meaningful high-level events from raw RFID data. Cascadia allows application developers and even users to specify events of interest using either a declarative query language or a graphical interface with an intuitive visual language. Cascadia then effectively extracts these events from data in spite of the unreliability of RFID technology and the inherent ambiguity in event extraction. We demonstrate Cascadias technique through a digital diary application in the form of a calendar. Cascadia automatically populates the calendar with meaningful events for the user. We use data collected in a building-wide RFID deployment.

data engineering for wireless and mobile access | 2011

Lineage for Markovian stream event queries

Julie Letchner; Magdalena Balazinska

Imprecise, sequential data, such as location sequences inferred from RFID/GPS, are often represented as Markovian (probabilistic, temporally-correlated) streams. Event queries, which detect instances of specific patterns in these streams, have become the standard tool for analysis of these streams; however, many data mining applications require richer information such as how a pattern is matched, how long the match is, or what stream elements matched specific pattern predicates. Such queries can dramatically increase the power of applications, but they cannot be answered by existing tools. In this paper, we present novel techniques for processing the above queries on Markovian streams. Central to our approach are algorithms for computing and manipulating the lineage of Markovian stream event queries. We provide formal definitions and linear-time algorithms for computing lineage, which may be exponentially-sized in the length of the input stream. We additionally demonstrate the importance of flexible lineage projections, and provide definitions of, and two efficient algorithms for, these projections. We evaluate all algorithms on two real-world data sets (location from RFID and words from spoken audio), and demonstrate that lineage can greatly increase the analytical power of applications while incurring small processing overhead.

Explore More