Harold Borko
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harold Borko.
Journal of the ACM | 1963
Harold Borko; Myrna Bernick
Abstract : Starting with a collection of 405 document abstracts dealing with computers, the experiment in automatic document classification proceeds to construct an empirically based mathematically derived classification system by use of a factor analysis technique. The documents are then classified into these derived categories by five subjects, and the resulting classification serves as a criterion against which the automatic classification is to be evaluated. Of the ninety documents in the Validation Group which contained two or more clue words, and which therefore could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula. These results are almost identical to the results obtained by Maron in a previous experiment using the same data but with a different set of classification categories and a different computational formula. The experimental evidence supports the conclusion that automatic document classification is possible. Additional experiments are described which when executed should improve the accuracy of the automatic classification technique. (Author)
national computer conference | 1962
Harold Borko
This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator selected 90 words for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. Based on these data, a correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and the first 10 eigenvectors were selected as factors. These were rotated for meaning and interpreted as major categories in a classification system. These factors were compared with, and shown to be compatible but not identical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.
Journal of the ACM | 1964
Harold Borko; Myrna Bernick
This study reports the results of a series of experiments in the techniques of automatic document classification. Two different classification schedules are compared along with two methods of automatically classifying documents into categories. It is concluded that, while there is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule. Approximately 55 percent of the document were automatically and correctly classified.
Information Processing and Management | 1977
Harold Borko
Abstract A theory of indexing helps explain the nature of indexing, the structure of the vocabulary, and the quality of the index. Indexing theories formulated by Jonker, Heilprin, Landry and Salton are described. Each formulation has a different focus. Jonker, by means of the Terminological and Connective Continua, provided a basis for understanding the relationships between the size of the vocabulary, the hierarchical organization, and the specificity by which concepts can be described. Heilprin introduced the idea of a search path which leads from query to document. He also added a third dimension to Jonkers model; the three variables are diffuseness, permutivity and hierarchical connectedness. Landry made an ambitious and well conceived attempt to build a comprehensive theory of indexing predicated upon sets of documents, sets of attributes, and sets of relationships between the two. It is expressed in theorems and by formal notation. Salton provided both a notational definition of indexing and procedures for improving the ability of index terms to discriminate between relevant and nonrelevant documents. These separate theories need to be tested experimentally and eventually combined into a unified comprehensive theory of indexing.
Information Processing and Management | 1987
Harold Borko
Abstract In the 1960s, information science researchers pioneered in the design of computer-based document storage and retrieval systems. These efforts were crowned with success, and online systems are now in common use as reference tools. Today the new information science frontier is to design, develop, and test expert systems for use in libraries and other information centers. At UCLA we are exploring the applicability of artificial intelligence and expert systems for modeling the cognitive processes involved in cataloging. Specifically, Zorana Ercegovac, a doctoral student, is designing a prototype expert system in the limited domain of map cataloging that will seek to employ the reasoning used by expert catalogers in applying AACR2 rules. It is anticipated that the research results will shed some light on the way catalogers reason and conceptualize the structure of a catalog entry. The project is still in its initial stages, and in this presentation one can only indicate the design choices that need to be made, the reasons for the decisions made, and the problems encountered.
Information Storage and Retrieval | 1970
Harold Borko
Abstract The most challenging task in preparing an index to a book is to select all and only those terms that are related to the text and are useful for reference purposes. While a knowledgeable human can make the selection on an intuitive basis, automatic indexing requires a precise operational criterion for defining and selecting good and useful index terms. Two principles of selection are proposed: specification and selection of useful terms, and specification and exclusion of useless terms. Because of the nebulous nature and meaning of “good index terms”, and the difficulties involved in devising machine algorithms for their selection, this research in automatic indexing is based on the principle of excluding useless terms. Even so, fully automatic indexing was not achieved in this study. Single words proved to be of little value as index terms. Multiple word terms were generated by the computer, but no algorithm could successfully eliminate the useless phrases. Final selection had to be made by the experimenter. A comprehensive and useful book index was achieved by using machine-aided rather than fully automated indexing techniques.
Journal of the Association for Information Science and Technology | 1984
Harold Borko
Educational programs for library and information science will be changing during the next decade in order to meet societys need for trained information professionals capable of working in libraries, in governmental and industrial information centers, and as information consultants and entrepreneurs. In seeking to discern educational trends, present practices are analyzed in terms of the influences of educational externalities over which the institution has relatively little control, and the educational internalities over which the school has jurisdiction. Based upon this analysis, possible future trends and developments in library and information science education are presented not as a blueprint for the future but as a stimulus for discussion and planning.
Information Processing and Management | 1983
Robert M. Hayes; Harold Borko
Abstract This report presents the results from a study of mathematical models relating to the usage of information systems. For each of four models, the papers developed during the study provide three types of analyses: reviews of the literature relevant to the model, analytical studies, and tests of the models with data drawn from specific operational situations. (1) The Cobb-Douglas model: x0 = ax1bx2(1−b). This classic production model, normally interpreted as applying to the relationship between production, labor, and capital, is applied to a number of information related contexts. These include specifically the performance of libraries, both public and academic, and the use of information resources by the nations industry. The results confirm not only the utility of the Cobb-Douglas model in evaluation of the use of information resources, but demonstrate the extent to which those resources currently are being used at significantly less than optimum levels. (2) Mixture of Poissons: χ 0 = ∑ i=0 n i ∑ j=0 p n j e mj (m j )′/i! where x 0 is the usage and (nj,mj),j = 0 to p, are the p + 1 components of the distribution. This model of heterogeneity is applied to the usage of library materials and of thesaurus terms. In each case, both the applicability and the analytical value of the model are demonstrated. (3) Inverse effects of distance: x = a e−md if c(d) = rd x = ad−m if c(d) = r log(d). These two models reflect different inverse effects of distance, the choice depending upon the cost of transportation. If the cost,c(d), is linear, the usage is inverse exponential; if logarithmic, the usage is inverse power. The literature that discusses the relationship between usage of facilities and the distance from them is reviewed. The models are tested with data from the usage of the Los Angeles Public Library, both Central Library and branches, based on a survey of 3662 users. (4) Weighted entropy: S(x 1 ,x 2 ,...,x n )= - ∑ i=1 n r(x i P(x i ) log (p(x i )). This generalization of the “entropy measure of information” is designed to accommodate the effects of “relevancy”, as measured by r(x), upon the performance of information retrieval systems. The relevant literature is reviewed and the application to retrieval systems is considered.
Perceptual and Motor Skills | 1965
Harold Borko
A sample of approximately 1,000 abstracts was obtained from Psychological Abstracts and keypunched for computer processing. Based upon a frequency distribution of words in these abstracts, 150 tag-terms were selected. These terms were intercorrelated on the basis of their co-occurrence in documents, and the resulting matrix was factor analyzed. The factors were interpreted as representing classification categories. These were compared with, and shown to be similar to, the APA classification system. The study demonstrates that it is possible to determine the basic dimensions of a collection of documents by an analysis of the words used in their abstracts.
Information Processing and Management | 1978
Harold Borko
Abstract All library school students should be provided with an opportunity to obtain hands-on experience in using such bibliographic retrieval systems as ORBIT, DIALOG, OCLC, etc. Yet, such training is both costly and time consumming. Two key issues that must be resolved in order to make the training more efficient and more effective are: (1) the integration of training in the curriculum as a module in cataloging and reference courses or as a separate course; and (2) the method of training which may include the use of videotapes, demonstrations, training manuals, etc. The teaching program at UCLA provides for discussion and demonstration of on-line retrieval techniques in the basic courses and advanced search training in a separate course using a specially prepared training manual.