Kai-Uwe Sattler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai-Uwe Sattler is active.

Explore More

Publication

Featured researches published by Kai-Uwe Sattler.

international world wide web conferences | 2010

Data summaries for on-demand queries over linked data

Andreas Harth; Katja Hose; Marcel Karnstedt; Axel Polleres; Kai-Uwe Sattler; Jürgen Umbrich

Typical approaches for querying structured Web Data collect (crawl) and pre-process (index) large amounts of data in a central data repository before allowing for query answering. However, this time-consuming pre-processing phase however leverages the benefits of Linked Data -- where structured data is accessible live and up-to-date at distributed Web resources that may change constantly -- only to a limited degree, as query results can never be current. An ideal query answering system for Linked Data should return current answers in a reasonable amount of time, even on corpora as large as the Web. Query processors evaluating queries directly on the live sources require knowledge of the contents of data sources. In this paper, we develop and evaluate an approximate index structure summarising graph-structured content of sources adhering to Linked Data principles, provide an algorithm for answering conjunctive queries over Linked Data on theWeb exploiting the source summary, and evaluate the system using synthetically generated queries. The experimental results show that our lightweight index structure enables complete and up-to-date query results over Linked Data, while keeping the overhead for querying low and providing a satisfying source ranking at no additional cost.

international conference on management of data | 2004

Report on the Dagstuhl Seminar

Michael Gertz; M. Tamer Özsu; Gunter Saake; Kai-Uwe Sattler

Over the past few years, techniques for managing, querying, and integrating data on the Web have significantly matured. Well-founded and practical approaches to assess or even guarantee a required degree of quality of the data in these frameworks, however, are still missing. This can be contributed to the lack of welldefined data quality metrics and assessment techniques, and the difficulty of handling information about data quality during data integration and query processing. Data quality problems arise in many settings, such as the integration of business data, in Web mining, data dissemination, and in querying the Web using search engines. Data quality (DQ) addresses various forms of data, including structured and semistructured data, text documents, multimedia, and streaming data. Different forms of metadata describing the quality of data is becoming increasingly important since they provide applications and users with information about the value and reliability of (integrated) data on the Web. The Dagstuhl Seminar “Data Quality on the Web”, organized by Michael Gertz, Tamer Ozsu, Gunter Saake, and Kai-Uwe Sattler, took place between August 31st and September 5th 2003 at Schloss Dagstuhl, Germany. The objective of the seminar was to (1) foster collaboration among researchers that deal with DQ in different areas, (2) assess existing results in managing the quality of data, and (3) establish a framework for future research in the area of DQ. The application contexts considered during the seminar included in particular (Web-based) data integration and information retrieval scenarios, scientific databases, and application domains in the computational sciences and Bioinformatics. In all these areas, data quality plays a crucial role and therefore different, tailored solutions have been developed. Sharing and exchanging this knowledge could result in significant synergy effects.

international workshop on testing database systems | 2011

The mixed workload CH-benCHmark

Richard L. Cole; Florian Funke; Leo Giakoumakis; Wey Guy; Alfons Kemper; Stefan Krompass; Harumi A. Kuno; Raghunath Nambiar; Thomas Neumann; Meikel Poess; Kai-Uwe Sattler; Michael Seibold; Eric Simon; Florian Waas

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to the definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark. This benchmark bridges the gap between the established single-workload suites of TPC-C for OLTP and TPC-H for OLAP, and executes a complex mixed workload: a transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite run in parallel on the same tables in a single database system. As it is derived from these two most widely used TPC benchmarks, the CH-benCHmark produces results highly relevant to both hybrid and classic single-workload systems.

conference on information and knowledge management | 2006

Processing relaxed skylines in PDMS using distributed data summaries

Katja Hose; Christian Lemke; Kai-Uwe Sattler

Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by applying approximate query answering techniques can significantly reduce costs. Since most users are not interested in the exact answers to their queries, rank-aware query operators like top-k or skyline play an important role in query processing. In this paper, we present the novel concept of relaxed skylines that combines the advantages of both rank-aware query operators and approximate query processing techniques. Furthermore, we propose a strategy for processing relaxed skylines in distributed environments that allows for giving guarantees for the completeness of the result using distributed data summaries as routing indexes.

conference on information and knowledge management | 2001

Advanced grouping and aggregation for data integration

Eike Schallehn; Kai-Uwe Sattler; Gunter Saake

New applications from the areas of analytical data processing and data integration require powerful features to condense and reconcile available data. As outlined in [1], the general concept of grouping and aggregation appears to be a fitting paradigm for a number of these issues, but in its common form of equality based groups or with current extensions like simple user-defined functions to derive group-by values on a per tuple basis and restricted aggregate functions a number of problems remain unsolved. We describe two extensions to the grouping mechanism, a generic one to support holistic user-defined grouping functions and higher level construct that provides similarity based grouping suitable in a number of applications like duplicate detection and elimination.

conference on information and knowledge management | 2001

SQL database primitives for decision tree classifiers

Kai-Uwe Sattler; Oliver Dunemann

Scalable data mining in large databases is one of todays challenges to database technologies. Thus, substantial effort is dedicated to a tight coupling of database and data mining systems leading to database primitives supporting data mining tasks. In order to support a wide range of tasks and to be of general usage these primitives should be rather building blocks than implementations of specific algorithms. In this paper, we describe primitives for building and applying decision tree classifiers. Based on the analysis of available algorithms and previous work in this area we have identified operations which are useful for a number of classification algorithms. We discuss the implementation of these primitives on top of a commercial DBMS and present experimental results demonstrating the performance benefit.

data and knowledge engineering | 2004

Efficient similarity-based operations for data integration

Eike Schallehn; Kai-Uwe Sattler; Gunter Saake

Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of this operator, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples of application from the context of a data reconciliation project for looted art.

international conference on data engineering | 2007

UniStore: Querying a DHT-based Universal Storage

Marcel Karnstedt; Kai-Uwe Sattler; Martin Richtarsky; Jessica Muller; Manfred Hauswirth; Roman Schmidt; Renault John

The idea of collecting and combining large public data sets and services became more and more popular. The special characteristics of such systems and the requirements of the participants demand for strictly decentralized solutions. However, this comes along with several ambitious challenges a corresponding system has to overcome. In this demonstration paper, we present a lightweight distributed universal storage capable of dealing with those challenges, and providing a powerful and flexible way of building Internet-scale public data management systems. We introduce our approach based on a triple storage on top of a distributed hash table (DHT) overlay system, based on the ideas of a universal relation model and the resource description framework (RDF), and outline solved challenges as well as open issues.

international conference on management of data | 2008

Distributed databases and peer-to-peer databases: past and present

Angela Bonifati; Panos K. Chrysanthis; Aris M. Ouksel; Kai-Uwe Sattler

The need for large-scale data sharing between autonomous and possibly heterogeneous decentralized systems on the Web gave rise to the concept of P2P database systems. Decentralized databases are, however, not new. Whereas a definition for a P2P database system can be readily provided, a comparison with the more established decentralized models, commonly referred to as distributed, federated and multi-databases, is more likely to provide a better insight to this new P2P data management technology. Thus, in the paper, by distinguishing between db-centric and P2P-centric features, we examine features common to these database systems as well as other ad-hoc features that solely characterize P2P databases. We also provide a non-exhaustive taxonomy of the most prominent research efforts toward the realization of full-fledged P2P databases.

international conference on data engineering | 2007

Autonomous Management of Soft Indexes

M. Liihring; Kai-Uwe Sattler; Karsten Schmidt; Eike Schallehn

In recent years the support for index tuning as pan of physical database design has gained focus in research and product development, which resulted in index and design advisors. Nevertheless, these tools provide a one-off solution for a continuous task and are not deeply integrated with the DBMS functionality by only applying the query optimizer for index recommendation and profit estimation and decoupling the decision about and execution of index configuration changes from the core system functionality. In this paper we propose an approach that continuously collects statistics for recommended indexes and based on this, repetitively solves the Index Selection Problem (lSP).A key novelty is the on-the-fly index generation during query processing implemented by new query plan operators In-dexBuildScan and SwitchPlan. Finally, we present the implementation and evaluation of the introduced concepts as part of the PostgreSQL system.

Explore More