Carl-Christian Kanne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carl-Christian Kanne is active.

Explore More

Publication

Featured researches published by Carl-Christian Kanne.

international conference on data engineering | 2000

Efficient Storage of XML Data

Carl-Christian Kanne; Guido Moerkotte

We introduce NATIX, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditionallarge object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim.

very large data bases | 2002

Anatomy of a native XML base management system

Thorsten Fiebig; Sven Helmer; Carl-Christian Kanne; Guido Moerkotte; Julia Neumann; Robert Schiele; Till Westmann

Abstract. Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management - comprising recovery and multi-user synchronization - as well as query processing for XML.

international conference on data engineering | 2005

Full-fledged algebraic XPath processing in Natix

Matthias Brantner; Sven Helmer; Carl-Christian Kanne; Guido Moerkotte

We present the first complete translation of XPath into an algebra, paving the way for a comprehensive, state-of-the-art XPath (and later on, XQuery) compiler based on algebraic optimization techniques. Our translation includes all XPath features such as nested expressions, position-based predicates and node-set functions. The translated algebraic expressions can be executed using the proven, scalable, iterator-based approach, as we demonstrate in form of a corresponding physical algebra in our native XML DBMS Natix. A first glance at performance results shows that even without further optimization of the expressions, we provide a competitive evaluation technique for XPath queries.

international conference on management of data | 2004

Evaluating lock-based protocols for cooperation on XML documents

Sven Helmer; Carl-Christian Kanne; Guido Moerkotte

We discuss four different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into a native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results. Furthermore, we also discuss extensions of these core protocols to full-fledged protocols. Further, we show how to achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type Definitions (DTDs).

web information systems engineering | 2002

Optimized translation of XPath into algebraic expressions parameterized by programs containing navigational primitives

Sven Helmer; Carl-Christian Kanne

We propose a new approach for the efficient evaluation of XPath expressions. This is important, since XPath is not only used as a simple, stand-alone query language, but is also an essential ingredient of XQuery and XSLT. The main idea of our approach is to translate XPath into algebraic expressions parameterized with programs. These programs are mainly built from navigational primitives like accessing the first child or the next sibling. The goals of the approach are: 1) to enable pipelined evaluation, 2) to avoid producing duplicate (intermediate) result nodes, 3) to visit as few document nodes as possible, and 4) to avoid visiting nodes more than once. This improves the existing approaches, because our method is highly efficient.

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems | 2002

Natix: A Technology Overview

Thorsten Fiebig; Sven Helmer; Carl-Christian Kanne; Guido Moerkotte; Julia Neumann; Robert Schiele; Till Westmann

Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we review Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we indicate how almost every component in a database system is affected in terms of adequacy and performance. We show what kind of problems have to be tackled when designing and optimizing areas such as storage, transaction management comprising recovery and multi-user synchronization as well as query processing for XML.

international conference on management of data | 2005

Cost-sensitive reordering of navigational primitives

Carl-Christian Kanne; Matthias Brantner; Guido Moerkotte

We present a method to evaluate path queries based on the novel concept of partial path instances. Our method (1) maximizes performance by means of sequential scans or asynchronous I/O, (2) does not require a special storage format, (3) relies on simple navigational primitives on trees, and (4) can be complemented by existing logical and physical optimizations such as duplicate elimination, duplicate prevention and path rewriting.We use a physical algebra which separates those navigation operations that require I/O from those that do not. All I/O operations necessary for the evaluation of a path are isolated in a single operator, which may employ efficient I/O scheduling strategies such as sequential scans or asynchronous I/O.Performance results for queries from the XMark benchmark show that reordering the navigation operations can increase performance up to a factor of four.

database and expert systems applications | 2003

Lock-based protocols for cooperation on XML documents

Sven Helmer; Carl-Christian Kanne; Guido Moerkotte

The extensible Markup Language (XML) is well accepted in several different Web application areas. As soon as many users and applications work concurrently on the same collection of XML documents - e.g. on an XML database via a Web interface - isolating accesses and modifications of different transactions becomes an important issue. We discuss four different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols.

international conference on management of data | 2010

Histograms reloaded: the merits of bucket diversity

Carl-Christian Kanne; Guido Moerkotte

Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by investigating the estimation precision of three commercial database systems which also follow the above paradigm. It turns out that huge errors are quite common. We then introduce new bucket types and investigate their accuracy when building optimal histograms with them. The results are ambiguous. There is no clear winner among the bucket types. At this point, we (1) switch to heterogeneous histograms, where different buckets of the same histogram possibly are of different types, and (2) design more bucket types. The nice consequence of introducing heterogeneous histograms is that we can guarantee decent upper error bounds while at the same time heterogeneous histograms require far less space than homogeneous histograms.

international xml database symposium | 2006

Index vs. navigation in XPath evaluation

Norman May; Matthias Brantner; Alexander Böhm; Carl-Christian Kanne; Guido Moerkotte

A well-known rule of thumb claims, it is better to scan than to use an index when more than 10% of the data are accessed. This rule was formulated for relational databases. But is it still valid for XML queries? In this paper we develop similar rules of thumb for XML queries by experimentally comparing different execution strategies, e.g. using navigation or indices. These rules can be used immediately for heuristic optimization of XML queries, and in the long run, they may serve as a foundation for cost-based query optimization in XQuery.

Explore More