Henning Köhler
University of Queensland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Henning Köhler.
international conference on management of data | 2011
Henning Köhler; Jing Yang; Xiaofang Zhou
The skyline of a set of multi-dimensional points (tuples) consists of those points for which no clearly better point exists in the given set, using component-wise comparison on domains of interest. Skyline queries, i.e., queries that involve computation of a skyline, can be computationally expensive, so it is natural to consider parallelized approaches which make good use of multiple processors. We approach this problem by using hyperplane projections to obtain useful partitions of the data set for parallel processing. These partitions not only ensure small local skyline sets, but enable efficient merging of results as well. Our experiments show that our method consistently outperforms similar approaches for parallel skyline computation, regardless of data distribution, and provides insights on the impacts of different optimization strategies.
very large data bases | 2015
Henning Köhler; Sebastian Link; Xiaofang Zhou
Driven by the dominance of the relational model, the requirements of modern applications, and the veracity of data, we revisit the fundamental notion of a key in relational databases with NULLs. In SQL database systems primary key columns are NOT NULL by default. NULL columns may occur in unique constraints which only guarantee uniqueness for tuples which do not feature null markers in any of the columns involved, and therefore serve a different function than primary keys. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that can originate from an SQL table, respectively. Possible keys coincide with the unique constraint of SQL, and thus provide a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns, and thus form a sufficient and necessary condition to identify tuples uniquely, while primary keys are only sufficient for that purpose. In addition to basic characterization, axiomatization, and simple discovery approaches for possible and certain keys, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs do occur in real-world databases, and that related computational problems can be solved efficiently. Certain keys are therefore semantically well-founded and able to maintain data quality in the form of Codds entity integrity rule while handling the requirements of modern applications, that is, higher volumes of incomplete data from different formats.
semantics in data and knowledge bases | 2008
Sven Hartmann; Henning Köhler; Sebastian Link; Thu Trinh; Jing Wang
Ongoing efforts in academia and industry to advance the management of XML data have created an increasing interest in research on integrity constraints for XML. In particular keys have recently gained much attention. Keys help to discover and capture relevant semantics of XML data, and are crucial for developing better methods and tools for storing, querying and manipulating XML data. Various notions of keys have been proposed and investigated over the past few years. Due to the different ways of picking and comparing data items involved, these proposals give rise to constraint classes that differ in their expressive power and tractability of the associated decision problems. This paper provides an overview of XML key proposals that enjoy popularity in the research literature.
very large data bases | 2016
Henning Köhler; Uwe Leck; Sebastian Link; Xiaofang Zhou
Driven by the dominance of the relational model and the requirements of modern applications, we revisit the fundamental notion of a key in relational databases with NULL. In SQL, primary key columns are NOT NULL, and UNIQUE constraints guarantee uniqueness only for tuples without NULL. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that originate from an SQL table, respectively. Possible keys coincide with UNIQUE, thus providing a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns and can uniquely identify entities whenever feasible, while primary keys may not. In addition to basic characterization, axiomatization, discovery, and extremal combinatorics problems, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs occur in real-world data, and related computational problems can be solved efficiently. Certain keys are therefore semantically well founded and able to meet Codd’s entity integrity rule while handling high volumes of incomplete data from different formats.
international conference on management of data | 2016
Henning Köhler; Sebastian Link
Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.
Annals of Mathematics and Artificial Intelligence | 2015
Sven Hartmann; Henning Köhler; Uwe Leck; Sebastian Link; Bernhard Thalheim; Jing Wang
Integrity constraints capture relevant requirements of an application that should be satisfied by every state of the database. The theory of integrity constraints is largely a theory over relations. To make data processing more efficient, SQL permits database states to be partial bags that can accommodate incomplete and duplicate information. Integrity constraints, however, interact differently on partial bags than on the idealized special case of relations. In this current paper, we study the implication problem of the combined class of general cardinality constraints and not-null constraints on partial bags. We investigate structural properties of Armstrong tables for general cardinality constraints and not-null constraints, and prove exact conditions for their existence. For the fragment of general max-cardinality constraints, unary min-cardinality constraints and not-null constraints we show that the effort for constructing Armstrong tables is precisely exponential. For the same fragment we provide an axiomatic characterization of the implication problem. The major tool for establishing our results is the Hajnal and Szemerédi theorem on the equitable colorings of graphs.
conference on information and knowledge management | 2015
Henning Köhler; Sebastian Link
Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data cleaning and profiling, entity resolution and schema matching. Surprisingly, the implication problem of inclusion dependencies has not been investigated in the context of SQL, the de-facto industry standard. Codds relational model of data represents the idealized special case of SQL in which all attributes are declared NOT NULL. Driven by the SQL standard recommendation, we investigate inclusion dependencies and NOT NULL constraints under simple and partial semantics. Partial semantics is not natively supported by any SQL implementation but we show how classical results on the implication problem carry over into this context. Interestingly, simple semantics is natively supported by every SQL implementation, but we show that the implication problem is not finitely axiomatizable in this context. Resolving this conundrum we establish an optimal solution by identifying the desirable class of not-null inclusion dependencies (NNINDs) that subsumes simple and partial semantics as special cases, and whose associated implication problem has the same computational properties as inclusion dependencies in the relational model. That is, NNIND implication is 2-ary axiomatizable and PSPACE-complete to decide. Our proof techniques bring also forward a chase procedure for deciding NNIND implication, the NP-hard subclass of typed acyclic NNINDs, and the tractable subclasses of NNINDs whose arity is bounded.
workshop on logic language information and computation | 2011
Flavio Ferrarotti; Sven Hartmann; Henning Köhler; Sebastian Link; Millist W. Vincent
In the relational model of data the Boyce-Codd-Heath normal form, commonly just known as Boyce-Codd normal form, guarantees the elimination of data redundancy in terms of functional dependencies. For efficient means of data processing the industry standard SQL permits partial data and duplicate rows of data to occur in database systems. Consequently, the combined class of uniqueness constraints and functional dependencies is more expressive than the class of functional dependencies itself. Hence, the Boyce-Codd-Heath normal form is not suitable for SQL databases. We characterize the associated implication problem of the combined class in the presence of NOT NULL constraints axiomatically, algorithmically and logically. Based on these results we are able to establish a suitable normal form for SQL.
foundations of information and knowledge systems | 2012
Sven Hartmann; Henning Köhler; Sebastian Link; Bernhard Thalheim
Data dependencies capture meaningful information about an application domain within the target database. The theory of data dependencies is largely a theory over relations. To make data processing more efficient in practice, partial bags are permitted as database instances to accommodate partial and duplicate information. However, data dependencies interact differently over partial bags than over the idealized special case of relations. In this paper, we study the implication problem of the combined class of functional dependencies and cardinality constraints over partial bags. We establish an axiomatic and an algorithmic characterization of the implication problem. These findings have important applications in database design and data processing. Finally, we investigate structural and computational properties of Armstrong databases for the class of data dependencies under consideration. These results can be utilized to consolidate and communicate the understanding of the application domain between different stake-holders of a database.
international conference on conceptual modeling | 2016
Pieta Brown; Jeeva Ganesan; Henning Köhler; Sebastian Link
Probabilistic databases accommodate well the requirements of modern applications that produce large volumes of uncertain data from a variety of sources. We propose an expressive class of probabilistic keys which empowers users to specify lower and upper bounds on the marginal probabilities by which keys should hold in a data set of acceptable quality. Indeed, the bounds help organizations balance the consistency and completeness targets for their data quality. For this purpose, algorithms are established for an agile schema- and data-driven acquisition of the right lower and upper bounds in a given application domain, and for reasoning about these keys. The efficiency of our acquisition framework is demonstrated theoretically and experimentally.