Henrik Nottelmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henrik Nottelmann is active.

Explore More

Publication

Featured researches published by Henrik Nottelmann.

international acm sigir conference on research and development in information retrieval | 2003

Evaluating different methods of estimating retrieval quality for resource selection

Henrik Nottelmann; Norbert Fuhr

In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be routed. Most existing resource selection algorithms compute a library ranking in a heuristic way. In contrast, the decision-theoretic framework (DTF) follows a different approach on a better theoretic foundation: It computes a selection which minimises the overall costs (e.g. retrieval quality, time, money) of the distributed retrieval. For estimating retrieval quality the recall-precision function is proposed. In this paper, we introduce two new methods: The first one computes the empirical distribution of the probabilities of relevance from a small library sample, and assumes it to be representative for the whole library. The second method assumes that the indexing weights follow a normal distribution, leading to a normal distribution for the document scores. Furthermore, we present the first evaluation of DTF by comparing this theoretical approach with the heuristical state-of-the-art system CORI; here we find that DTF outperforms CORI in most cases.

Information Processing and Management | 2007

Information retrieval and machine learning for probabilistic schema matching

Henrik Nottelmann; Umberto Straccia

Schema matching is the problem of finding correspondences (mapping rules, e.g. logical formulae) between heterogeneous schemas e.g. in the data exchange domain, or for distributed IR in federated digital libraries. This paper introduces a probabilistic framework, called sPLMap, for automatically learning schema mapping rules, based on given instances of both schemas. Different techniques, mostly from the IR and machine learning fields, are combined for finding suitable mapping candidates. Our approach gives a probabilistic interpretation of the prediction weights of the candidates, selects the rule set with highest matching probability, and outputs probabilistic rules which are capable to deal with the intrinsic uncertainty of the mapping process. Our approach with different variants has been evaluated on several test sets.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2006

ADDING PROBABILITIES AND RULES TO OWL LITE SUBSETS BASED ON PROBABILISTIC DATALOG

Henrik Nottelmann; Norbert Fuhr

This paper proposes two probabilistic extensions of variants of the OWL Lite description language, which are essential for advanced applications like information retrieval. The first step follows the axiomatic approach of combining description logics and Horn clauses: Subsets of OWL Lite are mapped in a sound and complete way onto Horn predicate logics (Datalog variants). Compared to earlier approaches, a larger fraction of OWL Lite can be transformed by switching to Datalog with equality in the head; however, some OWL Lite constructs cannot be transformed completely into Datalog. By using probabilistic Datalog, the new probabilistic OWL Lite subsets (both with support for Horn rules) are defined, and the semantics are given by the semantics of the corresponding probabilistic Datalog program. As inference engines for probabilistic Datalog are available, description logics and information retrieval systems can easily be combined.

european conference on information retrieval | 2005

sPLMap: a probabilistic approach to schema matching

Henrik Nottelmann; Umberto Straccia

This paper introduces the first formal framework for learning mappings between heterogeneous schemas which is based on logics and probability theory. This task, also called “schema matching”, is a crucial step in integrating heterogeneous collections. As schemas may have different granularities, and as schema attributes do not always match precisely, a general-purpose schema mapping approach requires support for uncertain mappings, and mappings have to be learned automatically. The framework combines different classifiers for finding suitable mapping candidates (together with their weights), and selects that set of mapping rules which is the most likely one. Finally, the framework with different variants has been evaluated on two different data sets.

Information Retrieval | 2003

From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications

Henrik Nottelmann; Norbert Fuhr

Information Retrieval systems typically sort the result with respect to document retrieval status values (RSV). According to the Probability Ranking Principle, this ranking ensures optimum retrieval quality if the RSVs are monotonously increasing with the probabilities of relevance (as e.g. for probabilistic IR models). However, advanced applications like filtering or distributed retrieval require estimates of the actual probability of relevance. The relationship between the RSV of a document and its probability of relevance can be described by a “normalisation” function which maps the retrieval status value onto the probability of relevance (“mapping functions”). In this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upper-bound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distributed retrieval (only merging, without resource selection). These experiments show that good estimates of the actual probability of relevance can be achieved, and that the logistic model outperforms the linear one. Retrieval quality for distributed retrieval is only slightly improved by using the logistic function.

european conference on information retrieval | 2003

From uncertain inference to probability of relevance for advanced IR applications

Henrik Nottelmann; Norbert Fuhr

Uncertain inference is a probabilistic generalisation of the logical view on databases, ranking documents according to their probabilities that they logically imply the query. For tasks other than ad-hoc retrieval, estimates of the actual probability of relevance are required. In this paper, we investigate mapping functions between these two types of probability. For this purpose, we consider linear and logistic functions. The former have been proposed before, whereas we give a new theoretic justification for the latter. In a series of upper-bound experiments, we compare the goodness of fit of the two models. A second series of experiments investigates the effect on the resulting retrieval quality in the fusion step of distributed retrieval. These experiments show that good estimates of the actual probability of relevance can be achieved, and the logistic model outperforms the linear one. However, retrieval quality for distributed retrieval (only merging, without resource selection) is only slightly improved by using the logistic function.

european conference on information retrieval | 2006

Comparing different architectures for query routing in peer-to-peer networks

Henrik Nottelmann; Norbert Fuhr

Efficient and effective routing of content-based queries is an emerging problem in peer-to-peer networks, and can be seen as an extension of the traditional “resource selection” problem. Although some approaches have been proposed, finding the best architecture (defined by the network topology, the underlying selection method, and its integration into peer-to-peer networks) is still an open problem. This paper investigates different building blocks of such architectures, among them the decision-theoretic framework, CORI, hierarchical networks, distributed hash tables and HyperCubes. The evaluation on a large test-bed shows that the decision-theoretic framework can be applied effectively and cost-efficiently onto peer-to-peer networks.

Archive | 2006

A Probabilistic, Logic-Based Framework for Automated Web Directory Alignment

Henrik Nottelmann; Umberto Straccia

We introduces oPLMap, a formal framework for automatically learning mapping rules between heterogeneous Web directories, a crucial step towards integrating ontologies and their instances in the Semantic Web. This approach is based on Horn predicate logics and probability theory, which allows for dealing with uncertain mappings (for cases where there is no exact correspondence between classes), and can be extended towards complex ontology models. Different components are combined for finding suitable mapping candidates (together with their weights), and the set of rules with maximum matching probability is selected. Our system oPLMap with different variants has been evaluated on a large test set.

european conference on information retrieval | 2004

Combining CORI and the Decision-Theoretic Approach for Advanced Resource Selection

Henrik Nottelmann; Norbert Fuhr

In this paper we combine two existing resource selection approaches, CORI and the decision-theoretic framework (DTF). The state-of-the-art system CORI belongs to the large group of heuristic resource ranking methods which select a fixed number of libraries with respect to their similarity to the query. In contrast, DTF computes an optimum resource selection with respect to overall costs (from different sources, e.g. retrieval quality, time, money). In this paper, we improve CORI by integrating it with DTF: The number of relevant documents is approximated by applying a linear or a logistic function on the CORI library scores. Based on this value, one of the existing DTF variants (employing a recall-precision function) estimates the number of relevant documents in the result set. Our evaluation shows that precision in the top ranks of this technique is higher than for the existing resource selection methods for long queries and lower for short queries; on average the combined approach outperforms CORI and the other DTF variants.

international acm sigir conference on research and development in information retrieval | 2003

MIND: resource selection and data fusion in multimedia distributed digital libraries

James P. Callan; Fabio Crestani; Henrik Nottelmann; Pietro Pala; Xiao Mang Shou

Jamie Callan, Fabio Crestani, Henrik Nottelmann, Pietro Pala, Xiao Mang Shou School of Computer Studies, Carnegie Mellon University, Pittsburgh, PA, USA Dept. Computer and Information Sciences, University of Strathclyde, Glasgow, UK Institute of Informatics and Interactive Systems, University of Duisburg-Essen, Duisburg, Germany 4 Dip. Sistemi e Informatica, Universita degli Studi di Firenze, Firenze, Italy Dept. of Information Studies, University of Sheffield, Sheffield, UK [email protected], [email protected], [email protected], [email protected], [email protected]

Explore More