Dan Olteanu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Olteanu is active.

Explore More

Publication

Featured researches published by Dan Olteanu.

extending database technology | 2002

XPath: Looking Forward

Dan Olteanu; Holger Meuss; Tim Furche; François Bry

The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath.

international conference on data engineering | 2008

Fast and Simple Relational Processing of Uncertain Data

Lyublena Antova; Thomas Jansen; Christoph Koch; Dan Olteanu

This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the U-relational representation. The translation scheme essentially preserves the size of the query in terms of number of operations and, in particular, number of joins. Standard techniques employed in off-the-shelf relational database management systems are effective for optimizing and processing queries on U-relations. In our experiments we show that query evaluation on U-relations scales to large amounts of data with high degrees of uncertainty.

international conference on data engineering | 2007

10106Worlds and Beyond: Efficient Representation and Processing of Incomplete Information

Lyublena Antova; Christoph Koch; Dan Olteanu

We present a decomposition-based approach to managing incomplete information. We introduce world-set decompositions (WSDs), a space-efficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on world-sets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

international conference on management of data | 2009

MayBMS: a probabilistic database management system

Jiewen Huang; Lyublena Antova; Christoph Koch; Dan Olteanu

MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on what-if analysis of team dynamics using data available at www.nba.com.

very large data bases | 2008

Conditioning probabilistic databases

Christoph Koch; Dan Olteanu

Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values. It is known that exact confidence computation is an NP-hard problem. This has led researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the Davis-Putnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.

international conference on data engineering | 2007

MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions

Lyublena Antova; Christoph Koch; Dan Olteanu

Managing incomplete information is important in many real world applications. In this demonstration we present MayBMS - a system for representing and managing finite sets of possible worlds - that successfully combines expressiveness and efficiency. Some features of MayBMS are: completeness of the representation system for finite world-sets; space-efficient representation of large world-sets; scalable evaluation and support for full relational algebra queries; and probabilistic extension of the representation system and the query language. MayBMS is implemented on top of PostgreSQL. It models incomplete data using the so-called world-set decompositions (WSDs) (Ruggles et al., 2004). For this demonstration, we introduce a probabilistic extension of world-sets and WSDs, where worlds or correlations between worlds have probabilities. The main idea underlying probabilistic WSDs is to use relational factorization combined with probabilistic independence in order to efficiently decompose large world-sets into a set of independent smaller relations. Queries in MayBMS can be expressed in an SQL-like language with special constructs that deal with incompleteness and probabilities. MayBMS rewrites and optimizes user queries into a sequence of relational queries on world-set decompositions.

international conference on management of data | 2007

From complete to incomplete information and back

Lyublena Antova; Christoph Koch; Dan Olteanu

Incomplete information arises naturally in numerous data management applications. Recently, several researchers have studied query processing in the context of incomplete information. Most work has combined the syntax of a traditional query language like relational algebra with a nonstandard semantics such as certain or ranked possible answers. There are now also languages with special features to deal with uncertainty. However, to the standards of the data management community, to date no language proposal has been made that can be considered a natural analog to SQL or relational algebra for the case of incomplete information. In this paper we propose such a language, World-set Algebra, which satisfies the robustness criteria and analogies to relational algebra that we expect. The language supports the contemplation on alternatives and can thus map from a complete database to an incomplete one comprising several possible worlds. We show that World-set Algebra is conservative over relational algebra in the sense that any query that maps from a complete database to a complete database (a complete-to-complete query) is equivalent to a relational algebra query. Moreover, we give an efficient algorithm for effecting this translation. We then study algebraic query optimization of such queries. We argue that query languages with explicit constructs for handling uncertainty allow for the more natural and simple expression of many real-world decision support queries. The results of this paper not only suggest a language for specifying queries in this way, but also allow for their efficient evaluation in any relational database management system.

international conference on data engineering | 2009

SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

Dan Olteanu; Jiewen Huang; Christoph Koch

A paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondary-storage operator for exact computation of queries on tuple-independent probabilistic databases. We consider the conjunctive queries without self-joins that are known to be tractable on any tuple-independent database, and queries that are not tractable in general but become tractable on probabilistic databases restricted by functional dependencies. Our operator is semantically equivalent to a sequence of aggregations and can be naturally integrated into existing relational query plans. As a proof of concept, we developed an extension of the PostgreSQL 8.3.3 query engine called SPROUT. We study optimizations that push or pull our operator or parts thereof past joins. The operator employs static information, such as the query structure and functional dependencies, to decide which constituent aggregations can be evaluated together in one scan and how many scans are needed for the overall confidence computation task. A case study on the TPC-H benchmark reveals that most TPC-H queries obtained by removing aggregations can be evaluated efficiently using our operator. Experimental evaluation on probabilistic TPC-H data shows substantial efficiency improvements when compared to the state of the art.

international conference on data engineering | 2003

An evaluation of regular path expressions with qualifiers against XML streams

Dan Olteanu; Tobias Kiesling; François Bry

We present SPEX, a streamed and progressive evaluation of regular path expressions with XPath-like qualifiers against XML streams. SPEX proceeds as follows. An expression is translated in linear time into a network of transducers, most of them having 1-DPDT equivalents. Every stream message is then processed once by the entire network and result fragments are output on the fly. In most practical cases SPEX needs a time linear in the stream size and for transducer stacks a memory quadratic in the stream depth. Experiments with a prototype implementation point to a very good efficiency of the SPEX approach.

IEEE Transactions on Knowledge and Data Engineering | 2007

SPEX: Streamed and Progressive Evaluation of XPath

Dan Olteanu

Streams are preferable over data stored in memory in contexts where data is too large or volatile, or a standard approach to data processing based on storing is too time or space consuming. Emerging applications such as publish-subscribe systems, data monitoring in sensor networks, financial and traffic monitoring, and routing of MPEG-7 call for querying streams. In many such applications, XML streams are arguably more appropriate than flat streams, for they convey (possibly unbounded) unranked ordered trees with labeled nodes. However, the flexibility enabled by XML streams in data modeling makes query evaluation different from traditional settings and challenging. This paper describes SPEX, a streamed and progressive evaluation of XML Path Language (XPath). SPEX compiles queries into networks of simple and independent transducers and processes XML streams with polynomial combined complexity. This makes SPEX especially suitable for implementation on devices with low memory and simple logic as used, for example, in mobile computing.

Explore More