Andreas Koeller | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Koeller is active.

Explore More

Publication

Featured researches published by Andreas Koeller.

Communications of The ACM | 2000

Maintaining data warehouses over changing information sources

Elke A. Rundensteiner; Andreas Koeller; Xin Zhang

I n recent years, the number of digital information storage and retrieval systems has increased immensely. These information sources are generally interconnected via some network, and hence the task of integrating data from different sources to serve it up to users is an increasingly important one [10]. Applications that could benefit from this wealth of digital information are thus experiencing a pressing need for suitable integration tools that allow them to make effective use of such distributed and diverse data sets. In contrast to the on-demand approach to information integration, the approach of tailored information repository construction, commonly referred to as data warehousing, is characterized by the following properties:

data warehousing and olap | 1998

Integrating the rewriting and ranking phases of view synchronization

Andreas Koeller; Elke A. Rundensteiner; Nabil I. Hachem

Material&d views (data warehouses) are becoming increasingly important in the context of distributed modem environments such as the World Wide Web. Information sources (I%) in such an environment may change their capabilities (schema), causing a data warehouse to become undefined. This process to evolve (rewrite) view queries after capability changes of ISs is referred to ss view synchronization. Current view synchronization algorithms generate a potentially large number of valid solutions for the rewriting of a view query and according to our analysis in this paper have high complexity (in O(n!)). We propose to reduce this complexity by representing the synchronization problem as a graph traversal problem. Once this mapping has been applied, the problem can be reduced to a single-source shortestpath problem in graphs, which can be solved with O(n3) complexity using the Bellman-Ford algorithm.

information quality in information systems | 2005

Approximate matching of textual domain attributes for information source integration

Andreas Koeller; Vinay Keelara

A key problem in the integration of information sources is the identification of related attributes or objects across independent sources. Inferring such meta-information from source data (rather than a-priori available meta-data, such as attribute names) is sometimes possible. For example, existing algorithms attempt to integrate information sources by finding patterns such as Inclusion Dependencies (INDs) across them. However, INDs are based on exact set inclusion and are thus very strict patterns that rarely hold across independent real-world databases.We propose two error-tolerant measures, termed Similarity Score and Distribution Score, that help identify related attributes across two independent databases, based on similarities in their data. Those measures specifically address the problem of identifying semantic relationships between textual attributes of databases that have few or no equal values.We also present implementations of those measures and some experimental results.

international conference on management of data | 1999

Evolvable view environment (EVE): non-equivalent view maintenance under schema changes

Elke A. Rundensteiner; Andreas Koeller; Xin Zhang; A. J. Lee; Anisoara Nica; A. Van Wyk; Y. Lee

Supporting independent ISs and integrating them in distributed data warehouses (materialized views) is becoming more important with the growth of the WWW. However, views defined over autonomous ISs are susceptible to schema changes. In the <italic>EVE</italic> project we are developing techniques to support the maintenance of data warehouses defined over distributed <italic>dynamic</italic> ISs [5, 6, 7]. The <italic>EVE</italic> system is the first to allow views to survive <italic>schema changes</italic> of their underlying ISs while also adapting to changing data in those sources. <italic>EVE</italic> achieves this is two steps: applying view query rewriting algorithms that exploit information about alternative ISs and the information they contain, and incrementally adapting the view extent to the view definition changes. Those processes are referred to as <italic>view synchronization</italic> and <italic>view adaption</italic>, respectively. They increase the survivability of materialized views in changing environments and reduce the necessity of human interaction in system maintenance.

international conference on data engineering | 2003

Discovery of high-dimensional inclusion dependencies

Andreas Koeller; Elke A. Rundensteiner

Determining relationships such as functional or inclusion dependencies within and across databases is important for many applications in information integration. When such information is not available as explicit meta data, it is possible to discover potential dependencies from the source database extents. However, the complexity of such discovery problems is typically exponential in the number of attributes. We have developed an algorithm for the discovery of inclusion dependencies across high-dimensional relations in the order of 100 attributes. This algorithm is the first to efficiently solve the inclusion-dependency discovery problem. This is achieved by mapping it into a progressive series of clique-finding problems in k-uniform hypergraphs and solving those. Extensive experimental studies confirm the algorithms efficiency on a variety of real-world data sets.

extending database technology | 2002

Incremental Maintenance of Schema-Restructuring Views

Andreas Koeller; Elke A. Rundensteiner

An important issue in data integration is the integration of semantically equivalent but schematically heterogeneous data sources. Declarative mechanisms supporting powerful source restructuring for such databases have been proposed in the literature, such as the SQL extension SchemaSQL. However, the issue of incremental maintenance of views defined in such languages remains an open problem.We present an incremental view maintenance algorithm for schema-restructuring views. Our algorithm transforms a source update into an incremental view update, by propagating updates through the operators of a SchemaSQL algebra tree. We observe that schema-restructuring view maintenance requires transformation of data into schema changes and vice versa. Our maintenance algorithm handles any combination of data updates or schema changes and produces a correct sequence of data updates, schema changes, or both as output. In experiments performed on our prototype implementation, we find that incremental view maintenance in SchemaSQL is significantly faster than recomputation in many cases.

international conference on data engineering | 1999

Data warehouses evolution: trade-offs between quality and cost of query rewritings

Amy Jyh-Liang Lee; Andreas Koeller; Anisoara Nica; Elke A. Rundensteiner

Query rewriting with relaxed semantics has been proposed as a means of retaining the validity of a data warehouse (i.e., materialized queries) in a changing environment. Attributes in the query interface can be classified as essential or dispensable (if it cannot be retained) according to the query definers preferences. Similarly, preferences for query extent can be specified, for example, to indicate whether a subset of the original result is acceptable or not. The paper discusses the trade-off between quality and cost of query rewriting.

IEEE Transactions on Knowledge and Data Engineering | 2004

Incremental maintenance of schema-restructuring views in SchemaSQL

Andreas Koeller; Elke A. Rundensteiner

The integration of data, especially from heterogeneous sources, is a hard and widely studied problem. One particularly challenging issue is the integration of sources that are semantically equivalent but schematically heterogeneous. While two such data sources may represent the same information, one may store the information inside tuples (data) while the other may store it in attribute or relation names (schema). The SchemaSQL query language is a recent solution to this problem powerful enough to restructure such sources into each other without the loss of information. We propose the first incremental view maintenance strategy for such schema-restructuring views. Our strategy, based on an algebraic representation of the view query, correctly transforms a data update or a schema change to a source into sequences of schema and data updates to be applied to the view. We also introduce an optimization of incremental maintenance using batching. We present a proof of correctness of the propagation approach. We also describe the implementation of our SchemaSQL Query Processor and View Maintainer. Last, our experimental results demonstrate that, in many cases, incremental SchemaSQL view maintenance is significantly faster than complete view recomputation.

cooperative information systems | 2004

Heuristic Strategies for Inclusion Dependency Discovery

Andreas Koeller; Elke A. Rundensteiner

Lecture Notes in Computer Science | 2006

Heuristic strategies for the discovery of inclusion dependencies and other patterns

Andreas Koeller; Elke A. Rundensteiner

Inclusion dependencies (INDs) between databases are assertions of subset-relationships between sets of attributes (dimensions) in two relations. Such dependencies are useful for a number of purposes related to information integration, such as database similarity discovery and foreign key discovery. An exhaustive approach at discovering INDs between two relations suffers from the dimensionality curse, since the number of potential mappings between the attributes of two relations is exponential in the number of attributes. For this reason, levelwise (Apriori-like) approaches at discovery do not scale beyond relations with 8 to 10 attributes. Approaches modeling the similarity space as graphs or hypergraphs are promising, but also do not scale very well. This paper discusses approaches to scale discovery algorithms for INDs and some other similarity patterns in databases. The major obstacle to scalability is the exponentially growing size of the data structure representing potential INDs. Therefore, the focus of our solution is on heuristic techniques that reduce the number of IND candidates considered by the algorithm. Despite the use of heuristics, the accuracy of the results is good for real-world data. Experiments are presented assessing the quality of the discovery results versus the runtime savings. We conclude that the heuristic approach is useful and improves scalability significantly. It is particularly applicable for relations that have attributes with few distinct values.

Explore More