Mitch Cherniack | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mitch Cherniack is active.

Explore More

Publication

Featured researches published by Mitch Cherniack.

very large data bases | 2003

Aurora: a new model and architecture for data stream management

Daniel J. Abadi; Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; Sangdon Lee; Michael Stonebraker; Nesime Tatbul; Stanley B. Zdonik

Abstract.This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.

very large data bases | 2002

Monitoring streams: a new class of data management applications

Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; Sangdon Lee; Greg Seidman; Michael Stonebraker; Nesime Tatbul; Stanley B. Zdonik

This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for real-time operation.

very large data bases | 2003

Load shedding in a data stream manager

Nesime Tatbul; Ugur Çetintemel; Stanley B. Zdonik; Mitch Cherniack; Michael Stonebraker

A Data Stream Manager accepts push-based inputs from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results. This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range with minimal degradation in answer quality.

very large data bases | 2003

Operator scheduling in a data stream manager

Donald Carney; Ugur Çetintemel; Alex Rasin; Stanley B. Zdonik; Mitch Cherniack; Michael Stonebraker

Many stream-based applications have sophisticated data processing requirements and real-time performance expectations that need to be met under high-volume, time-varying data streams. In order to address these challenges, we propose novel operator scheduling approaches that specify (1) which operators to schedule (2) in which order to schedule the operators, and (3) how many tuples to process at each execution step. We study our approaches in the context of the Aurora data stream manager. We argue that a fine-grained scheduling approach in combination with various scheduling techniques (such as batching of operators and tuples) can significantly improve system efficiency by reducing various system overheads. We also discuss application-aware extensions that make scheduling decisions according to per-application Quality of Service (QoS) specifications. Finally, we present prototype-based experimental results that characterize the efficiency and effectiveness of our approaches under various stream workloads and processing scenarios.

international conference on management of data | 2003

Aurora: a data stream management system

Daniel J. Abadi; Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; C. Erwin; Eduardo F. Galvez; M. Hatoun; Anurag S. Maskey; Alex Rasin; A. Singer; Michael Stonebraker; Nesime Tatbul; Ying Xing; R. Yan; Stanley B. Zdonik

The Aurora system [1] is an experimental data stream management system with a fully functional prototype. It includes both a graphical development environment, and a runtime system. We propose to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities. We will also demonstrate the effect of various system alternatives on various workloads. For example, we will show how different scheduling algorithms affect tuple latency and internal queue lengths. We will use some of our visualization tools to accomplish this. Data Stream Management Aurora is a data stream management system for monitoring applications. Streams are continuous data feeds from such sources as sensors, satellites and stock feeds. Monitoring applications track the data from numerous streams, filtering them for signs of abnormal activity and processing them for purposes of aggregation, reduction and correlation. The management requirements for monitoring applications differ profoundly from those satisfied by a traditional DBMS: o A traditional DBMS assumes a passive model where most data processing results from humans issuing transactions and queries. Data stream management requires a more active approach, monitoring data feeds from unpredictable external sources (e.g., sensors) and alerting humans when abnormal activity is detected. o A traditional DBMS manages data that is currently in its tables. Data stream management often requires processing data that is bounded by some finite window of values, and not over an unbounded past. o A traditional DBMS provides exact answers to exact queries, and is blind to real-time deadlines. Data stream management often must respond to real-time deadlines (e.g., military applications monitoring positions of enemy platforms) and therefore must often provide reasonable approximations to queries. o A traditional query processor optimizes all queries in the same way (typically focusing on response time). A stream data manager benefits from application specific optimization criteria (QoS). o A traditional DBMS assumes pull-based queries to be the norm. Push-based data processing is the norm for a data stream management system. A Brief Summary of Aurora Aurora has been designed to deal with very large numbers of data streams. Users build queries out of a small set of operators (a.k.a. boxes). The current implementation provides a user interface for tapping into pre-existing inputs and network flows and for wiring boxes together to produces answers at the outputs. While it is certainly possible to accept input as declarative queries, we feel that for a very large number of such queries, the process of common sub-expression elimination is too difficult. An example of an Aurora network is given in Screen Shot 1. A simple stream is a potentially infinite sequence of tuples that all have the same stream ID. An arc carries multiple simple streams. This is important so that simple streams can be added and deleted from the system without having to modify the basic network. A query, then, is a sub-network that ends at a single output and includes an arbitrary number of inputs. Boxes can connect to multiple downstream boxes. All such path splits carry identical tuples. Multiple streams can be merged since some box types accept more than one input (e.g., Join, Union). We do not allow any cycles in an operator network. Each output is supplied with a Quality of Service (QoS) specification. Currently, QoS is captured by three functions (1) a latency graph, (2) a value-based graph, and (3) a loss-tolerance graph. The latency graph indicates how utility drops as an answer is delayed. The value-based graph shows which values of the output space are most important. The loss-tolerance graph is a simple way to describe how averse the application is to approximate answers. Tuples arrive at the input and are queued for processing. A scheduler selects a box with waiting tuples and executes that box on one or more of the input tuples. The output tuples of a box are queued at the input of the next box in sequence. In this way, tuples make their way from the inputs to the outputs. If the system is overloaded, QoS is adversely affected. In this case, we invoke a load shedder to strategically eliminate Aurora supports persistent storage in two different ways. First, when box queues consume more storage than available RAM, the system will spill tuples that are less likely to be needed soon to secondary storage. Second, ad hoc queries can be connected to (and disconnected from) any arc for which a connection point has been defined. A connection point stores a historical portion of a stream that has flowed on the arc. For example, one could define a connection point as the last hour’s worth of data that has been seen on a given arc. Any ad hoc query that connects to a connection point has access to the full stored history as well as any additional data that flows past while the query is connected.

very large data bases | 2004

Retrospective on Aurora

Hari Balakrishnan; Magdalena Balazinska; Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; Eduardo F. Galvez; Jon Salz; Michael Stonebraker; Nesime Tatbul; Richard Tibbetts; Stanley B. Zdonik

Abstract.This experience paper summarizes the key lessons we learned throughout the design and implementation of the Aurora stream-processing engine. For the past 2 years, we have built five stream-based applications using Aurora. We first describe in detail these applications and their implementation in Aurora. We then reflect on the design of Aurora based on this experience. Finally, we discuss our initial ideas on a follow-on project, called Borealis, whose goal is to eliminate the limitations of Aurora as well as to address new key challenges and applications in the stream-processing domain.

very large data bases | 2008

Towards a streaming SQL standard

Namit Jain; Shailendra Mishra; Anand Srinivasan; Johannes Gehrke; Jennifer Widom; Hari Balakrishnan; Ugur Çetintemel; Mitch Cherniack; Richard Tibbetts; Stanley B. Zdonik

This paper describes a unification of two different SQL extensions for streams and its associated semantics. We use the data models from Oracle and StreamBase as our examples. Oracle uses a time-based execution model while StreamBase uses a tuple-based execution model. Time-based execution provides a way to model simultaneity while tuple-based execution provides a way to react to primitive events as soon as they are seen by the system. The result is a new model that gives the user control over the granularity at which one can express simultaneity. Of course, it is possible to ignore simultaneity altogether. The proposed model captures ordering and simultaneity through partial orders on batches of tuples. The batching and the ordering are encapsulated in and can be modified by means of a powerful new operator that we call SPREAD. This paper describes the semantics of SPREAD and gives several examples of its use.

IEEE Personal Communications | 2001

Expressing user profiles for data recharging

Mitch Cherniack; Michael J. Franklin; Stanley B. Zdonik

Mobile devices need two basic renewable resources - power and data. Power recharging is easy; data recharging is a much more problematic activity. It requires complex interaction between a user and a collection of data sources. We provide an automatic data recharging capability based on user profiles written in an expressive profile language. A profile identifies relevant information and orders it by its usefulness. We discuss the issues involved in designing a profile language for data recharging.

very large data bases | 2003

Avoiding sorting and grouping in processing queries

Xiaoyu Wang; Mitch Cherniack

Sorting and grouping are amongst the most costly operations performed during query evaluation. System R [6] used simple inference strategies to determine orderings held of intermediate relations to avoid unnecessary sorting, and to influence join plan selection. Since then, others have proposed using integrity constraint information to infer orderings of intermediate query results. However, these proposals do not consider how to avoid grouping operations by inferring groupings, nor do they consider secondary orderings (where records in the same group satisfy some ordering). In this paper, we introduce a formalism for expressing and reasoning about order properties: ordering and grouping constraints that hold of physical representations of relations. In so doing, we can reason about how the relation is ordered or grouped, both in terms of primary and secondary orders. After formally defining order properties, we introduce a plan refinement algorithm that infers order properties for intermediate and final query results on the basis of those known to hold of query inputs, and then exploits these inferences to avoid unnecessary sorting and grouping. We then show empirical results demonstrating the benefits of plan refinement, and show that the overhead that our algorithm adds to query optimization is low.

international conference on management of data | 1996

Rule languages and internal algebras for rule-based optimizers

Mitch Cherniack; Stanley B. Zdonik

Rule-based optimizers and optimizer generators use rules to specify query transformations. Rules act directly on query representations, which typically are based on query algebras. But most algebras complicate rule formulation, and rules over these algebras must often resort to calling to externally defined bodies of code. Code makes rules difficult to formulate, prove correct and reason about, and therefore compromises the effectiveness of rule-based systems.In this paper we present KOLA: a combinator-based algebra designed to simplify rule formulation. KOLA is not a user language, and KOLAs variable-free queries are difficult for humans to read. But KOLA is an effective internal algebra because its combinator-style makes queries manipulable and structurally revealing. As a result, rules over KOLA queries are easily expressed without the need for supplemental code. We illustrate this point, first by showing some transformations that despite their simplicity, require head and body routines when expressed over algebras that include variables. We show that these transformations are expressible without supplemental routines in KOLA. We then show complex transformations of a class of nested queries expressed over KOLA. Nested query optimization, while having been studied before, have seriously challenged the rule-based paradigm.

Explore More