Is this you? Create Your Porfile

Michael Stonebraker

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Stonebraker is active.

Explore More

Publication

Featured researches published by Michael Stonebraker.

very large data bases | 2003

Aurora: a new model and architecture for data stream management

Daniel J. Abadi; Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; Sangdon Lee; Michael Stonebraker; Nesime Tatbul; Stanley B. Zdonik

Abstract.This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.

international conference on management of data | 2009

A comparison of approaches to large-scale data analysis

Andrew Pavlo; Erik Paulson; Alexander Rasin; Daniel J. Abadi; David J. DeWitt; Samuel Madden; Michael Stonebraker

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each systems performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

ACM Transactions on Database Systems | 1976

The design and implementation of INGRES

Michael Stonebraker; Gerald Held; Eugene Wong; Peter Kreps

The currently operational (March 1976) version of the INGRES database management system is described. This multiuser system gives a relational view of data, supports two high level nonprocedural data sublanguages, and runs as a collection of user processes on top of the UNIX operating system for Digital Equipment Corporation PDP 11/40, 11/45, and 11/70 computers. Emphasis is on the design decisions and tradeoffs related to (1) structuring the system into processes, (2) embedding one command language in a general purpose programming language, (3) the algorithms implemented to process interactions, (4) the access methods implemented, (5) the concurrency and recovery control currently provided, and (6) the data structures used for system catalogs and the role of the database administrator. Also discussed are (1) support for integrity constraints (which is only partly operational), (2) the not yet supported features concerning views and protection, and (3) future plans concerning the system.

very large data bases | 2002

Monitoring streams: a new class of data management applications

Donald Carney; Ugur Çetintemel; Mitch Cherniack; Christian Convey; Sangdon Lee; Greg Seidman; Michael Stonebraker; Nesime Tatbul; Stanley B. Zdonik

This paper introduces monitoring applications, which we will show differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS that is currently under construction at Brandeis University, Brown University, and M.I.T. We describe the basic system architecture, a stream-oriented set of operators, optimization tactics, and support for real-time operation.

international conference on management of data | 1984

Implementation techniques for main memory database systems

David J. DeWitt; Randy H. Katz; Frank Olken; Leonard D. Shapiro; Michael Stonebraker; David A. Wood

With the availability of very large, relatively inexpensive main memories, it is becoming possible keep large databases resident in main memory In this paper we consider the changes necessary to permit a relational database system to take advantage of large amounts of main memory We evaluate AVL vs B+-tree access methods for main memory databases, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory As expected, B+-trees are the preferred storage mechanism unless more than 80--90% of the database fits in main memory A somewhat surprising result is that hash based query processing strategies are advantageous for large memory situations

international conference on management of data | 1986

The design of POSTGRES

Michael Stonebraker; Lawrence A. Rowe

This paper presents the preliminary design of a new database management system, called POSTGRES, that is the successor to the INGRES relational database system. The main design goals of the new system are toprovide better support for complex objects, provide user extendibility for data types, operators and access methods, provide facilities for active databases (i.e., alerters and triggers) and inferencing including forward- and backward-chaining, simplify the DBMS code for crash recovery, produce a design that can take advantage of optical disks, workstations composed of multiple tightly-coupled processors, and custom designed VLSI chips, and make as few changes as possible (preferably none) to the relational model. The paper describes the query language, programming language interface, system architecture, query processing strategy, and storage system for the new system.

IEEE Computer | 1995

Chabot: retrieval from a relational database of images

Virginia E. Ogle; Michael Stonebraker

Selecting from a large, expanding collection of images requires carefully chosen search criteria. We present an approach that integrates a relational database retrieval system with a color analysis technique. The Chabot project was initiated at our university to study storage and retrieval of a vast collection of digitized images. These images are from the State of California Department of Water Resources. The goal was to integrate a relational database retrieval system with content analysis techniques that would give our querying system a better method for handling images. Our simple color analysis method, if used in conjunction with other search criteria, improves our ability to retrieve images efficiently. The best result is obtained when text-based search criteria are combined with content-based criteria and when a coarse granularity is used for content analysis. >

very large data bases | 2003

Load shedding in a data stream manager

Nesime Tatbul; Ugur Çetintemel; Stanley B. Zdonik; Mitch Cherniack; Michael Stonebraker

A Data Stream Manager accepts push-based inputs from a set of data sources, processes these inputs with respect to a set of standing queries, and produces outputs based on Quality-of-Service (QoS) specifications. When input rates exceed system capacity, the system will become overloaded and latency will deteriorate. Under these conditions, the system will shed load, thus degrading the answer, in order to improve the observed latency of the results. This paper examines a technique for dynamically inserting and removing drop operators into query plans as required by the current load. We examine two types of drops: the first drops a fraction of the tuples in a randomized fashion, and the second drops tuples based on the importance of their content. We address the problems of determining when load shedding is needed, where in the query plan to insert drops, and how much of the load should be shed at that point in the plan. We describe efficient solutions and present experimental evidence that they can bring the system back into the useful operating range with minimal degradation in answer quality.

IEEE Transactions on Knowledge and Data Engineering | 1990

The implementation of POSTGRES

Michael Stonebraker; Lawrence A. Rowe; Michael Hirohama

The design and implementation decisions made for the three-dimensional data manager POSTGRES are discussed. Attention is restricted to the DBMS backend functions. The POSTGRES data model and query language, the rules system, the storage system, the POSTGRES implementation and the current status and performance are discussed. >

Communications of The ACM | 2010