Mirek Riedewald | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mirek Riedewald is active.

Explore More

Publication

Featured researches published by Mirek Riedewald.

international conference on management of data | 2003

Approximate join processing over data streams

Abhinandan Das; Johannes Gehrke; Mirek Riedewald

We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We first discuss alternate architectural models for data stream join processing, and we survey suitable measures for the quality of an approximation of a set-valued query result. We then consider the number of generated result tuples as the quality measure, and we give optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data we show the efficacy of our solutions. For applications with demand for exact results we introduce a new Archive-metric which captures the amount of work needed to complete the join in case the streams are archived for later processing.

extending database technology | 2006

Towards expressive publish/subscribe systems

Alan J. Demers; Johannes Gehrke; Mingsheng Hong; Mirek Riedewald; Walker M. White

Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring RSS streams, stock tickers, or management of RFID data streams require the ability to handle stateful subscriptions. In this paper, we introduce Cayuga, a stateful pub/sub system based on nondeterministic finite state automata (NFA). Cayuga allows users to express subscriptions that span multiple events, and it supports powerful language features such as parameterization and aggregation, which significantly extend the expressive power of standard pub/sub systems. Based on a set of formally defined language operators, the subscription language of Cayuga provides non-ambiguous subscription semantics as well as unique opportunities for optimizations. We experimentally demonstrate that common optimization techniques used in NFA-based systems such as state merging have only limited effectiveness, and we propose novel efficient indexing methods to speed up subscription processing. In a thorough experimental evaluation we show the efficacy of our approach.

international conference on management of data | 2007

Cayuga: a high-performance event processing engine

Lars Brenna; Alan J. Demers; Johannes Gehrke; Mingsheng Hong; Joel Ossher; Biswanath Panda; Mirek Riedewald; Mohit Thatte; Walker M. White

We propose a demonstration of Cayuga, a complex event monitoring system for high speed data streams. Our demonstration will show Cayuga applied to monitoring Web feeds; the demo will illustrate the expressiveness of the Cayuga query language, the scalability of its query processing engine to high stream rates, and a visualization of the internals of the query processing engine.

BioScience | 2009

Data-intensive Science: A New Paradigm for Biodiversity Studies

Steve Kelling; Wesley M. Hochachka; Daniel Fink; Mirek Riedewald; Rich Caruana; Grant Ballard; Giles Hooker

The increasing availability of massive volumes of scientific data requires new synthetic analysis techniques to explore and identify interesting patterns that are otherwise not apparent. For biodiversity studies, a “data-driven” approach is necessary because of the complexity of ecological systems, particularly when viewed at large spatial and temporal scales. Data-intensive science organizes large volumes of data from multiple sources and fields and then analyzes them using techniques tailored to the discovery of complex patterns in high-dimensional data through visualizations, simulations, and various types of model building. Through interpreting and analyzing these models, truly novel and surprising patterns that are “born from the data” can be discovered. These patterns provide valuable insight for concrete hypotheses about the underlying ecological processes that created the observed data. Data-intensive science allows scientists to analyze bigger and more complex systems efficiently, and complements more traditional scientific processes of hypothesis generation and experimental testing to refine our understanding of the natural world.

Journal of Wildlife Management | 2007

Data-mining discovery of pattern and process in ecological systems

Wesley M. Hochachka; Rich Caruana; Daniel Fink; Art Munson; Mirek Riedewald; Daria Sorokina; Steve Kelling

Abstract Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research.

international conference on data engineering | 2006

Hilda: A High-Level Language for Data-DrivenWeb Applications

Fan Yang; Jayavel Shanmugasundaram; Mirek Riedewald; Johannes Gehrke

We propose Hilda, a high-level language for developing data-driven web applications. The primary benefits of Hilda over existing development platforms are: (a) it uses a unified data model for all layers of the application, (b) it is declarative, (c) it models both application queries and updates, (d) it supports structured programming for web sites, and (e) it enables conflict detection for concurrent updates. We also describe the implementation of a simple proof-ofconcept Hilda compiler, which translates a Hilda application program into Java Servlet code.

international conference on database theory | 2001

Flexible Data Cubes for Online Aggregation

Mirek Riedewald; Divyakant Agrawal; Amr El Abbadi

Applications like Online Analytical Processing depend heavily on the ability to quickly summarize large amounts of information. Techniques were proposed recently that speed up aggregate range queries on MOLAP data cubes by storing pre-computed aggregates. These approaches try to handle data cubes of any dimensionality by dealing with all dimensions at the same time and treat the different dimensions uniformly. The algorithms are typically complex, and it is difficult to prove their correctness and to analyze their performance. We present a new technique to generate Iterative Data Cubes (IDC) that addresses these problems. The proposed approach provides a modular framework for combining one-dimensional aggregation techniques to create space-optimal high-dimensional data cubes. A large variety of cost tradeoffs for high-dimensional IDC can be generated, making it easy to find the right configuration based on the application requirements.

international conference on management of data | 2007

Massively multi-query join processing in publish/subscribe systems

Mingsheng Hong; Alan J. Demers; Johannes Gehrke; Christoph Koch; Mirek Riedewald; Walker M. White

There has been much recent interest in XML publish/subscribe systems. Some systems scale to thousands of concurrent queries, but support a limited query language (usually a fragment of XPath 1.0). Other systems support more expressive languages, but do not scale well with the number of concurrent queries. In this paper, we propose a set of novel query processing techniques, referred to as Massively Multi-Query Join Processing techniques, for processing a large number of XML stream queries involving value joins over multiple XML streams and documents. These techniques enable the sharing of representations of inputs to multiple joins, and the sharing of join computation. Our techniques are also applicable to relational event processing systems and publish/subscribe systems that support join queries. We present experimental results to demonstrate the effectiveness of our techniques. We are able to process thousands of XML messages with hundreds of thousands of join queries on real RSS feed streams. Our techniques gain more than two orders of magnitude speedup compared to the naive approach of evaluating such join queries.

statistical and scientific database management | 2000

pCube: Update-efficient online aggregation with progressive feedback and error bounds

Mirek Riedewald; Divyakant Agrawal; A. El Abbadi

Multidimensional data cubes are used in large data warehouses as a tool for online aggregation of information. As the number of dimensions increases, supporting efficient queries as well as updates to the data cube becomes difficult. Another problem that arises with increased dimensionality is the sparseness of the data space. In this paper we develop a new data structure referred to as the pCube (data cube for progressive querying), to support efficient querying and updating of multidimensional data cubes in large data warehouses. While the pCube concept is very general and can be applied to any type of query, we mainly focus on range queries that summarize the contents of regions of the data cube. pCube provides intermediate results with absolute error bounds (to allow trading accuracy for fast response time), efficient updates, scalability with increasing dimensionality, and pre-aggregation to support summarization of large ranges. We present both a general solution and an implementation of pCube and report the results of experimental evaluations.

european conference on machine learning | 2007

Additive Groves of Regression Trees

Daria Sorokina; Rich Caruana; Mirek Riedewald

We present a new regression algorithm called Groves of trees and show empirically that it is superior in performance to a number of other established regression methods. A Grove is an additive model usually containing a small number of large trees. Trees added to the Grove are trained on the residual error of other trees already in the Grove. We begin the training process with a single small tree in the Grove and gradually increase both the number of trees in the Grove and their size. This procedure ensures that the resulting model captures the additive structure of the response. A single Grove may still overfit to the training set, so we further decrease the variance of the final predictions with bagging. We show that in addition to exhibiting superior performance on a suite of regression test problems, bagged Groves of trees are very resistant to overfitting.

Explore More