Kristin Tufte | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kristin Tufte is active.

Explore More

Publication

Featured researches published by Kristin Tufte.

international conference on management of data | 2005

Semantics and evaluation techniques for window aggregates in data streams

Jin Li; David Maier; Kristin Tufte; Vassilis Papadimos; Peter A. Tucker

A windowed query operator breaks a data stream into possibly overlapping subsets of data and computes a result over each. Many stream systems can evaluate window aggregate queries. However, current stream systems suffer from a lack of an explicit definition of window semantics. As a result, their implementations unnecessarily confuse window definition with physical stream properties. This confusion complicates the stream system, and even worse, can hurt performance both in terms of memory usage and execution time. To address this problem, we propose a framework for defining window semantics, which can be used to express almost all types of windows of which we are aware, and which is easily extensible to other types of windows that may occur in the future. Based on this definition, we explore a one-pass query evaluation strategy, the Window-ID (WID) approach, for various types of window aggregate queries. WID significantly reduces both required memory space and execution time for a large class of window definitions. In addition, WID can leverage punctuations to gracefully handle disorder. Our experimental study shows that WID has better execution-time performance than existing window aggregate query evaluation options that retain and reprocess tuples, and has better latency-accuracy tradeoffs for disordered input streams compared to using a fixed delay for handling disorder.

very large data bases | 2008

Out-of-order processing: a new architecture for high-performance stream systems

Jin Li; Kristin Tufte; Vladislav Shkapenyuk; Vassilis Papadimos; Theodore Johnson; David Maier

Many stream-processing systems enforce an order on data streams during query evaluation to help unblock blocking operators and purge state from stateful operators. Such in-order processing (IOP) systems not only must enforce order on input streams, but also require that query operators preserve order. This order-preserving requirement constrains the implementation of stream systems and incurs significant performance penalties, particularly for memory consumption. Especially for high-performance, potentially distributed stream systems, the cost of enforcing order can be prohibitive. We introduce a new architecture for stream systems, out-of-order processing (OOP), that avoids ordering constraints. The OOP architecture frees stream systems from the burden of order maintenance by using explicit stream progress indicators, such as punctuation or heartbeats, to unblock and purge operators. We describe the implementation of OOP stream systems and discuss the benefits of this architecture in depth. For example, the OOP approach has proven useful for smoothing workload bursts caused by expensive end-of-window operations, which can overwhelm internal communication paths in IOP approaches. We have implemented OOP in two stream systems, Gigascope and NiagaraST. Our experimental study shows that the OOP approach can significantly outperform IOP in a number of aspects, including memory, throughput and latency.

international conference on management of data | 1997

Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation

Jignesh M. Patel; Jie-Bing Yu; Navin Kabra; Kristin Tufte; Biswadeep Nag; Josef Burger; Nancy Hall; Karthikeyan Ramasamy; Roger Lueder; Curt J. Ellmann; Jim Kupsch; Shelly Guo; Johan Larson; David J. De Witt; Jeffrey F. Naughton

This paper presents a number of new techniques for parallelizing geo-spatial database systems and discusses their implementation in the Paradise object-relational database system. The effectiveness of these techniques is demonstrated using a variety of complex geo-spatial queries over a 120 GB global geo-spatial data set.

international workshop on the web and databases | 2000

Architecting a Network Query Engine for Producing Partial Results

Jayavel Shanmugasundaram; Kristin Tufte; David J. DeWitt; David Maier; Jeffrey F. Naughton

The growth of the Internet has made it possible to query data in all corners of the globe. This trend is being abetted by the emergence of standards for data representation, such as XML. In face of this exciting opportunity, however, existing query engines need to be changed in order to use them to effectively query the Internet. One of the challenges is providing partial results of query computation, based on the initial portion of the input, because it may be undesirable to wait for all of the input. This situation is due to (a) limited data transfer bandwidth (b) temporary unavailability of sites and (c) intrinsically long-running queries (e.g., continual queries or triggers). A major issue in providing partial results is dealing with non-monotonic operators, such as sort, average, negation and nest, because these operators need to see all of their input before they can produce the correct output. While previous work on producing partial results has looked at a limited set of non-monotonic operators, emerging hierarchical standards such as XML, which are heavily nested, and sophisticated queries require more general solutions to the problem. In this paper, we define the semantics of partial results and outline mechanisms for ensuring these semantics for queries with arbitrary non-monotonic operators. Re-architecting a query engine to produce partial results requires modifications to the implementations of operators. We explore implementation alternatives and quantitatively compare their effectiveness using the Niagara prototype system.

international conference on database theory | 2005

Semantics of data streams and operators

David Maier; Jin Li; Peter A. Tucker; Kristin Tufte; Vassilis Papadimos

What does a data stream mean? Much of the extensive work on query operators and query processing for data streams has proceeded without the benefit of an answer to this question. While such imprecision may be tolerable when dealing with simple cases, such as flat data, guaranteed physical order and element-wise operations, it can lead to ambiguities when dealing with nested data, disordered streams and windowed operators. We propose reconstitution functions to make the denotation and representation of data streams more precise, and use these functions to investigate the connection between monotonicity and non-blocking behavior of stream operators. We also touch on a reconstitution function for XML data. Other aspects of data stream semantics we consider are the use of punctuation to delineate finite subsets of a stream, adequacy of descriptions of stream disorder, and the formal specification of windowed operators.

international conference on management of data | 2007

Travel time estimation using NiagaraST and latte

Kristin Tufte; Jin Li; David Maier; Vassilis Papadimos; Robert L. Bertini; James Rucker

To address increasing traffic congestion and its associated consequences, traffic managers are turning to intelligent transportation management. The latte project is extending data stream technology to handle queries that combine live streams with large data archives, motivated by needs in the Intelligent Transportation Systems (ITS) domain. In particular, we focus on queries that combine live data streams with large data archives. We demonstrate such stream-archive queries via the travel-time estimation problem. The demonstration uses the new latte system which has been developed using the NiagaraST stream processing system and the PORTAL transportation data archive.

international conference on data engineering | 1998

Array-based evaluation of multi-dimensional queries in object-relational database systems

Yihong Zhao; Karthikeyan Ramasamy; Kristin Tufte; Jeffrey F. Naughton

Since multi-dimensional arrays are a natural data structure for supporting multi-dimensional queries, and object-relational (O/R) database systems support multi-dimensional array ADTs (abstract data types), it is natural to ask if a multi-dimensional array-based ADT can be used to improve O/R DBMS performance on multi-dimensional queries. As an initial step toward answering this question, we have implemented a multi-dimensional array in the Paradise O/R DBMS. In this paper, we describe the implementation of this compressed-array ADT and explore its performance for queries including star-join consolidations and selections. We show that, in many cases, the array ADT can provide significantly higher performance than can be obtained by applying techniques such as bitmap indices and star-join algorithms to relational tables.

Transportation Research Record | 2008

Toward Understanding and Reducing Errors in Real-Time Estimation of Travel Times

Sirisha Kothuri; Kristin Tufte; Enas Fayed; Robert L. Bertini

In recent years, the increased deployment of the infrastructure of intelligent transportation systems has enabled the provision of real-time traveler information to the public. Many states as well as private contractors are providing real-time travel-time estimates to commuters to help improve the quality and efficiency of their trips. Accuracy of travel-time estimates is important: inaccurate estimates can be detrimental to travelers, particularly when such estimates are less accurate than a persons ability to predict traffic on the basis of experience. Improving the accuracy of real-time estimates involves identifying and understanding the sources of error. The errors found during the evaluation of real-time travel-time estimates in Portland, Oregon, were explored and solutions are provided for reducing estimation error. The midpoint algorithm used by the Oregon Department of Transportation was used to estimate travel times from speeds obtained from loop detectors. The estimates were assessed for accuracy by comparisons with ground truth probe vehicle runs. The findings from the study indicate that 85% of the travel-time runs had errors less than 20% and, further, that accuracy varied widely between segments. The evaluation of high-error runs revealed the main causes of errors as transition traffic conditions, failure of detectors, and detector spacing. Potential solutions were identified for each source of error. In addition, a method was tested for evaluating the benefits of additional detectors by simulation of virtual detectors. The results indicated that additional detection helps in reducing the mean average percentage error in most cases, but the location of detectors is critical to error reduction.

distributed event-based systems | 2012

Capturing episodes: may the frame be with you

David Maier; Michael Grossniklaus; Sharmadha Moorthy; Kristin Tufte

We are interested in detecting episodes in a data stream that are characterized by a period of time over which a condition holds, usually with a minimum duration. For example, we might want to know whenever any router has a packet-drop rate above 0.3% continuously for more than two minutes. Such episodes can be interesting in their own right for monitoring purposes, but they can also specify target regions for examination over the original or other stream. For instance, for each router-drop episode we detect, we might want to count the number of control messages the router received. We assert the key requirements are to detect the episodes, detect them accurately, and detect them promptly. Current capabilities for data-stream management systems (DSMSs) include functionality, such as pattern-matching and windowed aggregates, that can help with detecting some kinds of episodes. We offer a third alternative, frames, which generalizes the other two. Frames are intervals that segment a data stream into regions of interest. In contrast to windows, frame boundaries can be data dependent, such as when a predicate holds for a given duration, or the maximum and minimum values of an attribute diverge more than a certain amount. We introduce frames and their theory, plus their implementation in the NiagaraST DSMS. We then demonstrate some advantages of frames versus windows, such as better characterization of episodes, on real data sets and explore an extension, fragments, to deal with long episodes.

distributed event-based systems | 2016

Frames: data-driven windows

Michael Grossniklaus; David Maier; James Roger Miller; Sharmadha Moorthy; Kristin Tufte

Traditional Data Stream Management Systems (DSMS) segment data streams using windows that are defined either by a time interval or a number of tuples. Such windows are fixed---the definition unvarying over the course of a stream---and are defined based on external properties unrelated to the data content of the stream. However, streams and their content do vary over time---the rate of a data stream may vary or the data distribution of the content may vary. The mismatch between a fixed stream segmentation and a variable stream motivates the need for a more flexible, expressive and physically independent stream segmentation. We introduce a new stream segmentation technique, called frames. Frames segment streams based on data content. We present a theory and implementation of frames and show the utility of frames for a variety of applications.

Explore More