Abraham Bookstein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abraham Bookstein is active.

Explore More

Publication

Featured researches published by Abraham Bookstein.

Journal of the Association for Information Science and Technology | 1980

Fuzzy Requests: An Approach to Weighted Boolean Searches.

Abraham Bookstein

This article concerns the problem of how to permit a patron to represent the relative importance of various index terms in a Boolean request while retaining the desirable properties of a Boolean system. The character of classical Boolean systems is reviewed and related to the notion of fuzzy sets. The fuzzy set concept then forms the basis of the concept of a fuzzy request in which weights are assigned to index terms. The properties of such a system are discussed, and it is shown that such systems retain the manipulability of traditional Boolean requests.

Journal of the Association for Information Science and Technology | 1983

Information Retrieval: A Sequential Learning Process.

Abraham Bookstein

The fundamental problem of information retrieval is how to decide, on the basis of clues, each of which is an imperfect indicator of docuemnt relevance, which documents to retrieve and the order in which to present them. The most satisfying conceptual approaches have been based on probabilistic decision theoretic models. However, those previously used make a decision about a single document at a time, and extend this to retreiv e multiple docuemnts by ignoring interdocument interaction. The purpose of the articles is to present decision‐theoretic models which intrinsically include the multiple retrieval case. In particular, we argue that information retrieval should be envisioned as a process, in which the information retrieval system responds to a request by presenting documents to the patron in a sequence, gathering feedback as the process proceeds, and using this information to modify future retrieval. A retrieval strategy that naturally results from this model is described. Two examples are examined in detail.

Journal of the Association for Information Science and Technology | 1975

A Decision Theoretic Foundation for Indexing.

Abraham Bookstein; Don R. Swanson

The indexing of a document is among the most crucial steps in preparing that document for retrieval. The adequacy of the indexing determines the ability of the system to respond to patron requests. This paper discusses this process, and document retrieval in general, on the basis of formal decision theory. The basic theoretical approach taken is illustrated by means of a model of word occurrences in documents in the context of a model information system; both models are fully defined in this paper. Though the main purpose of this paper is to provide insights into a very complex process, formulae are developed that may prove to be of value for an automated operating system. The paper concludes with an interpretation of recall and precision curves as seen from the point of view of decision theory.

Journal of the Association for Information Science and Technology | 1990

Informetric Distributions, Part I: Unified Overview.

Abraham Bookstein

This article is the first of a two‐part series on the informetric distributions, a family of regularities found to describe a wide range of phenomena both within and outside of the information sciences. This article introduces the basic forms these regularities take. A model is proposed that makes plausible the possibility that, in spite of marked differences in their appearance, these distributions are variants of a single distribution; heuristic arguments are then given that this is indeed the case. That a single distribution should describe such a wide range of phenomena, often in areas where the existence of any simple description is surprising, suggests that one should look for explanations not in terms of causal models, but in terms of the properties of the single informetric distribution. Some of the consequences of this conclusion are broached in this article, and explored more carefully in Part II.

The Library Quarterly | 1976

A General Mathematical Model for Information Retrieval Systems

Abraham Bookstein; William S. Cooper

This paper presents a mathematical model of an information retrieval system thought to be general enough to serve as an abstract representation of most document and reference retrieval systems. The model is made up of four components that, in one form or another, appear in every functioning system. It is proved that the basic organization of documents that the system provides for a user on receipt of a request follows from the properties and interrelations of the four components. Each component is then discussed in turn and it is seen that much of the existing theory regarding information systems can be viewed as an elaboration of this model.

Journal of the Association for Information Science and Technology | 1998

Clumping properties of content-bearing words

Abraham Bookstein; Shmuel T. Klein; Timo Raita

Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This article defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a words bearing content. This approach is flexible in the sense that it is sensitive to context : a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages.

Journal of the Association for Information Science and Technology | 1978

Evaluation of information retrieval systems: A decision theory approach

Donald H. Kraft; Abraham Bookstein

The Swets model of information retrieval, based on a decision theory approach, is discussed, with the overall performance measure being the crucial element reexamined in this paper. The Neyman-Pearson criterion from statistical decision theory, and based on likelihood ratios, is used to determine an optimal range of Z, the variable assigned to each document by the retrieval system in an attempt to discriminate between relevant and nonrelevant documents. This criterion is shown to be directly related to both precision and recall, and is equivalent to the maximization of the expected value of the retrieval decision for a specific query and a given document under certain conditions. Thus, a compromise can be reached between those who advocate precision as a measure, due partially to its ability to be easily measurable empirically, and those who advocate consideration of recall. Several cases of the normal and Poisson distributions for the variable Z are discussed in terms of their implications for the Neyman-Pearson decision rule. It is seen that when the variances are unequal, the Swets rule of retrieving a document if its Z value is large enough is not optimal. Finally, the situation of precision and recall not being inversely related is shown to be possible under certain conditions. Thus, this paper attempts to extend the understanding of the theoretical foundations of the decision theory approach to information retrieval.

Journal of the Association for Information Science and Technology | 1990

Informetric distributions, part II: Resilience to ambiguity

Abraham Bookstein

This article continues the discussion of the informetric distributions begun in a companion paper. In the earlier paper, the informetric distributions were introduced and found to be variants of a single distribution. It was suggested that this might be explained in terms of that distribution being unusually resilient to ambiguity. In this paper the notion of resilience to ambiguity is made precise. By way of introduction, a number of simple examples of resilience, taken from the social sciences, are discussed. This approach is then applied to the informetric distributions themselves. It is argued that the form taken by the informetric regularities does indeed make them insensitive to the wide range of ambiguities that occur when measuring the output of social activity, and that this ubiquitous form is unusual in having this property.

Information Retrieval | 2002

Generalized Hamming Distance

Abraham Bookstein; Vladimir A. Kulyukin; Timo Raita

Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a “Generalized Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the “Generalized Hamming distance”, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measures effectiveness in relating similar bitstrings.

Journal of the Association for Information Science and Technology | 1978

On the perils of merging boolean and weighted retrieval systems

Abraham Bookstein

Attempts have been made to rank the output of a Boolean search in terms of the overlap of index terms in the request and document. As attractive as the merger of two approaches may seem, a closer examination reveals disturbing ambiguities, with equivalent Boolean expressions yielding different weights for the retrieval documents. An alternative approach is suggested.

Explore More