Norbert Gövert
University of Duisburg-Essen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Norbert Gövert.
conference on information and knowledge management | 1999
Norbert Gövert; Mounia Lalmas; Norbert Fuhr
The automatic categorisation of web documents is becoming crucial for organising the huge amount of information available in the Internet. We are facing a new challenge due to the fact that web documents have a rich structure and are highly heterogeneous. Two ways to respond to this challenge are (1) using a representation of the content of web documents that captures these two characteristics and (2) using more effective classifiers. Our categorisation approach is based on a probabilistic description-oriented representation of web documents, and a probabilistic interpretation of the k-nearest neighbour classifier. With the former, we provide an enhanced document representation that incorporates the structural and heterogeneous nature of web documents. With the latter, we provide a theoretical sound justification for the various parameters of the k-nearest neighbour classifier. Experimental results show that (1) using an enhanced representation of web documents is crucial for an effective categorisation of web documents, and (2) a theoretical interpretation of the k-nearest neighbour classifier gives us improvement over the standard k-nearest neighbour classifier.
international acm sigir conference on research and development in information retrieval | 1998
Norbert Fuhr; Norbert Gövert; Thomas Rölleke
We describe the design and implementation of a system for logic-based multimedia retrieval. As high-level logic for retrieval of hypermedia documents, we have developed a probabilistic object-oriented logic (POOL) which supports aggregated objects, different kinds of propositions (terms, classifications and attributes) and even rules as being contained in objects. Based on a probabilistic four-valued logic, POOL uses an implicit open world assumption, allows for closed world assumptions and is able to deal with inconsistent knowledge. POOL programs and queries are translated into probabilistic Datalog programs which can be interpreted by the HySpirit inference engine. For storing the multimedia data, we have developed a new basic IR engine which yields physical data abstraction. The overall architecture and the flexibility of each layer supports logic-based methods for multimedia information retrieval.
Lecture Notes in Computer Science | 2003
Gabriella Kazai; Norbert Gövert; Mounia Lalmas; Norbert Fuhr
The widespread use of the extensible Markup Language (XML) on the Web and in Digital Libraries brought about an explosion in the development of XML tools, including systems to store and access XML content. As the number of these systems increases, so is the need to assess their benefit to users. The benefit to a given user depends largely on which aspects of the user’s interaction with the system are being considered. These aspects, among others, include response time, required user effort, usability, and the system’s ability to present the user with the desired information. Users then base their decision whether they are more satisfied with one system or another on a prioritised combination of these factors.
Journal of the Association for Information Science and Technology | 2004
Gabriella Kazai; Mounia Lalmas; Norbert Fuhr; Norbert Gövert
The INitiative for the Evaluation of XML retrieval (INEX) aims at providing an infrastructure to evaluate the effectiveness of content-oriented XML retrieval systems. To this end, in the first round of INEX in 2002, a test collection of real world XML documents along with a set of topics and respective relevance assessments have been created with the collaboration of 36 participating organizations. In this article, we provide an overview of the first round of the INEX initiative.
Information Retrieval | 2006
Norbert Gövert; Norbert Fuhr; Mounia Lalmas; Gabriella Kazai
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.
Lecture Notes in Computer Science | 2003
Mohammad Abolhassani; Norbert Fuhr; Norbert Gövert
As XML is going to become the standard document format, there is still the legacy problem of large amounts of text (written in the past as well as today) that are not available in this format. In order to exploit the benefits of XML, these legacy texts must be converted into XML. In this chapter, we discuss the issues of automatic XML markup of documents. We give a survey on existing approaches, and we describe a specific system in some detail.
international acm sigir conference on research and development in information retrieval | 2002
Norbert Fuhr; Norbert Gövert; Kai Großjohann
XML1 is the emerging standard for representing knowledge in almost arbitrary applications. At least almost every kind of knowledge can be represented in XML. For exploring such knowledge, one needs a search engine which is able to let users benefit from all of the concepts with which XML blesses the world. HyREX is the Hypermedia Retrieval Engine for XML. The HyREX project is an ongoing effort (funded as part of other projects like e. g. CARMEN, CYCLADES, and CLASSIX) for developing an information retrieval engine for XML documents. HyREX’s main characteristics can be derived from the constituents of its name:
european conference on research and advanced technology for digital libraries | 2000
Norbert Gövert; Norbert Fuhr; Claus-Peter Klas
The Internet makes searching for literature in Digital Libraries (DLs) feasible. However, often a user has to contact several DLs to satisfy a given information need. This leads to usability problems due to the heterogeneity of the DLs. One aspect is that the information structures of the systems differ. In fact, relevant information may be spread across several DLs. The other aspect of heterogeneity is differing browsing and searching functionality, of course presented to the user through different user interfaces and query languages.
conference on information and knowledge management | 2002
Norbert Fuhr; Norbert Gövert
Query languages for retrieval of XML documents allow for conditions referring both to the content and the structure of documents. In this paper, we investigate two different approaches for reducing index space of inverted files for XML documents. First, we consider methods for compressing index entries. Second, we develop the new XS tree data structure which contains the structural description of a document in a rather compact form, such that these descriptions can be kept in main memory. Experimental results on two large XML document collections show that very high compression rates for indexes can be achieved, but any compression increases retrieval time. On the other hand, highly compressed indexes may be feasible for applications where storage is limited, such as in PDAs or E-book devices.
cross language evaluation forum | 2000
Norbert Gövert
HyREX is the Hypermedia Retrieval Engine for XML. Its extensibility is based on the implementation of physical data independence; its query interface on the conceptual level consists of data types with respective vague search predicates. This concept enabled us to add search predicates for the data type text to do bilingual text retrieval. Our implementation uses free Internet resources for translating topics in English to German and vice versa.