Norbert Fuhr
University of Duisburg-Essen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Norbert Fuhr.
ACM Transactions on Information Systems | 1997
Norbert Fuhr; Thomas Rölleke
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. In PRA, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always conform to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modeled. We introduce the concept of vague predicates which yield probabilistic weights instead of Boolean values, thus allowing for queries with vague selection conditions. With these features, PRA implements uncertainty and vagueness in combination with the relational model.
international acm sigir conference on research and development in information retrieval | 2001
Norbert Fuhr; Kai Großjohann
Based on the document-centric view of XML, we present the query language XIRQL. Current proposals for XML query languages lack most IR-related features, which are weighting and ranking, relevance-oriented search, datatypes with vague predicates, and semantic relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented, that also serves as a starting point for query optimization.
The Computer Journal | 1992
Norbert Fuhr
In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability-ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimations, query expansion and the development of models for advanced document representations are discussed.
ACM Transactions on Information Systems | 1999
Norbert Fuhr
In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarily to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.
international acm sigir conference on research and development in information retrieval | 1991
Norbert Fuhr; Chris Buckley
We describe a method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries. Our approach is based on three new concepts: (1) Abstraction from specific terms and documents, which overcomes the restriction of limited relevance information for parameter estimation. (2) Flexibility of the representation, which allows the integration of new text analysis and knowledge-based methods in our approach as well as the consideration of document structures or different types of terms. (3) Probabilistic learning or classification methods for the estimation of the indexing weights making better use of the available relevance information, Our approach can be applied under restrictions that hold for real applications. We give experimental results for five test collections which show improvements over other methods.
international acm sigir conference on research and development in information retrieval | 2003
James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal
Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.
Information Processing and Management | 1989
Norbert Fuhr
Abstract In this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing (BII) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the probability of relevance of this document with respect to queries using this descriptor. Second is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness” to which the probabilities relate. In addition to the probabilistic indexing weights, the RPI model provides the possibility of relevance weighting of search terms. A third model that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darmstadt indexing approach (DIA) for indexing with descriptors from a controlled vocabulary. The experimental results show significant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to probabilistic indexing with free text terms.
International Journal on Digital Libraries | 2007
Norbert Fuhr; Giannis Tsakonas; Trond Aalberg; Maristella Agosti; Preben Hansen; Sarantos Kapidakis; Claus-Peter Klas; László Kovács; Monica Landoni; András Micsik; Christos Papatheodorou; Carol Peters; Ingeborg Sølvberg
Digital libraries (DLs) are new and innovative information systems, under constant development and change, and therefore evaluation is of critical importance to ensure not only their correct evolution but also their acceptance by the user and application communities. The Evaluation activity of the DELOS Network of Excellence has performed a large-scale survey of current DL evaluation activities. This study has resulted in a description of the state of the art in the field, which is presented in this paper. The paper also proposes a new framework for the evaluation of DLs, as well as for recording, describing and analyzing the related research field. The framework includes a methodology for the classification of current evaluation procedures. The objective is to provide a set of flexible and adaptable guidelines for DL evaluation.
international acm sigir conference on research and development in information retrieval | 1995
Norbert Fuhr
listic Datalog — a Logic for Powerful Retrieval Methods
Journal of the Association for Information Science and Technology | 2000
Norbert Fuhr
In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference. Whereas classical IR models are based on propositional logic, we combine Datalog (function-free Horn clause predicate logic) with probability theory. Therefore, probabilistic weights may be attached to both facts and rules. The underlying semantics extends the well-founded semantics of modularly stratified Datalog to a possible worlds semantics. By using default independence assumptions with explicit specification of disjoint events, the inference process always yields point probabilities. We describe an evaluation method and present an implementation. This approach allows for easy formulation of specific retrieval models for arbitrary applications, and classical probabilistic IR models can be implemented by specifying the appropriate rules. In comparison to other approaches, the possibility of recursive rules allows for more powerful inferences, and predicate logic gives the expressiveness required for multimedia retrieval. Furthermore, probabilistic Datalog can be used as a query language for integrated information retrieval and database systems.