Donald H. Kraft
Louisiana State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Donald H. Kraft.
Information Processing and Management | 1979
W.G. Waller; Donald H. Kraft
Abstract The use of weights to denote a query representation and/or the indexing of a document is analysed as a generalization of a Boolean retrieval system. Criteria are given for the functions used to evaluate the relevance of the records to a specific query, including self-consistency. Various mechanisms suggested in the literature for evaluating the relevance of records with regard to a given query are tested and found to be less than satisfactory. A new approach is suggested to avoid some of the perils of a weighted Boolean retrieval system.
International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1983
Donald H. Kraft; Duncan A. Buell
Substantial work has been done on the application of fuzzy subset theory to information retrieval. Boolean query processing has been generalized to allow for weights to be attached to individual terms, in either the document indexing or the query representation, or both. Problems with the generalized Boolean lattice structure have been noted, and an alternative approach using query thresholds and appropriate document evaluation functions has been suggested. Problems remain unsolved, however. Criteria generated for the query processing mechanism are inconsistent. The exact functional form and appropriate parameters for the query processing mechanism must be specified. Moreover, the generalized Boolean query model must be reconciled with the vector space approach, suggested new lattice structures for weighted retrieval, and probabilistic retrieval models. Finally, proper retrieval evaluation mechanisms reflecting the fuzzy nature of retrieval are needed.
Information Processing and Management | 1981
Duncan A. Buell; Donald H. Kraft
Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.
Journal of the Association for Information Science and Technology | 1981
Duncan A. Buell; Donald H. Kraft
There has been a good deal of work on information retrieval systems that have continuous weights assigned to the index terms that describe the records in the database, and/or to the query terms that describe the user queries. Recent articles have analyzed retrieval systems with continuous weights of either type and/or with a Boolean structure for the queries. They have also suggested criteria which such systems ought to satisfy and record evaluation mechanisms which partially satisfy these criteria. We offer a more careful analysis, based on a generalization of the discrete weights. We also look at the weights from an entirely different approach involving thresholds, and we generate an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms. This new mechanism allows the user to attach a “threshold” to the query term.
Information Processing and Management | 2001
Padmini Srinivasan; Miguel E. Ruiz; Donald H. Kraft; Jianhua Chen
Abstract Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user’s query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Unified Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of different vocabulary views in information retrieval.
Java Card Workshop | 1999
Donald H. Kraft; Gloria Bordogna; Gabriella Pasi
In this chapter an overview of the application of fuzzy set theory to soften Information Retrieval Systems is presented. It starts with the description of the main functionalities of an Information Retrieval System, then the main information retrieval models defined in the literature are reviewed and a classification of fuzzy information retrieval models is proposed. Further the fuzzy indexing procedures based on the computation of the significance of the document’s descriptors are illustrated and the introduction of soft requirements into queries, based on both numeric and linguistic weights and soft aggregation operators is discussed. The chapter presents also the fuzzy associative retrieval models based on thesauri, pseudothesauri, and documents clustering and relevance feedback techniques. Finally, in the last section some evaluation issues of IRSs are introduced.
Information Sciences - Applications | 1994
Donald H. Kraft; Gloria Bordogna; Gabriella Pasi
Abstract The generalization of Boolean information retrieval systems is still of interest to scholars. In spite of the fact that commercial systems use Boolean retrieval mechanisms, such systems still have some limitations. One of the main problems is that such systems lack the ability to deal well with imprecision and subjectivity. Previous efforts have led to the introduction of numeric weights to improve both document representations (term weights) and query languages (query weights). However, the use of weights requires a clear knowledge of the semantics of the query in order to translate a fuzzy concept into a precise numeric value. Moreover, it is difficult to model the matching of queries to documents in a way that will preserve the semantics of user queries. A linguistic extension has been generated, starting from an existing Boolean weighted retrieval model and formalized within fuzzy set theory, in which numeric query weights are replaced by linguistic descriptors that specify the degree of importance of the terms. In the past, query weights were seen as measures of the importance of a specific term in representing the query or as a threshold to aid in matching a specific document to the query. The linguistic extension was originally modeled to view the query weights as a description of the ideal document, so that deviations would be rejected whether a given document had term weights that were too high or too low. This paper looks at an extension to the linguistic model that is not symmetric in that documents with a term weight below the query weight are treated differently than documents with a term weight above the query weight.
Fuzzy Sets and Systems | 2005
Patrick Bosc; Donald H. Kraft; Frederick E. Petry
Fuzzy set approaches have been applied in the database and information retrieval areas for nearly 30years. Here we give consideration to aspects of these areas that seem to afford the greatest potential for further development. This includes among others database design, preferences for flexible queries and fuzzy functional dependencies and redundancy. Applications to areas such as data mining and geographical information systems are described. Fuzzy information retrieval topics such as multi-media, digital libraries and web retrieval are also discussed.
Journal of the Association for Information Science and Technology | 1978
Donald H. Kraft; Abraham Bookstein
The Swets model of information retrieval, based on a decision theory approach, is discussed, with the overall performance measure being the crucial element reexamined in this paper. The Neyman-Pearson criterion from statistical decision theory, and based on likelihood ratios, is used to determine an optimal range of Z, the variable assigned to each document by the retrieval system in an attempt to discriminate between relevant and nonrelevant documents. This criterion is shown to be directly related to both precision and recall, and is equivalent to the maximization of the expected value of the retrieval decision for a specific query and a given document under certain conditions. Thus, a compromise can be reached between those who advocate precision as a measure, due partially to its ability to be easily measurable empirically, and those who advocate consideration of recall. Several cases of the normal and Poisson distributions for the variable Z are discussed in terms of their implications for the Neyman-Pearson decision rule. It is seen that when the variances are unequal, the Swets rule of retrieving a document if its Z value is large enough is not optimal. Finally, the situation of precision and recall not being inversely related is shown to be possible under certain conditions. Thus, this paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval.
Information Processing and Management | 1979
Donald H. Kraft; T. Lee
Abstract An information retrieval system is modeled from the point of view of a user linearly scanning the output list for relevant records of citations. Expected search length, a measure of retrieval system performance, is shown to be affected by the stopping rule employed by the user to determine when to terminate the search. Three stopping rules are considered: the satiation rule, the disgust rule, and the combination rule. The effects of these various stopping rules on expected search length are examined and discussed in detail.