Ray R. Larson
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ray R. Larson.
Journal of the Association for Information Science and Technology | 1991
Ray R. Larson
Search index usage in a large university online catalog system over a six-year period (representing about 15.3 million searches) was investigated using transaction monitor data. Mathematical models of trends and patterns In the data were developed and tested using regression techniques. The results of the analyses show a consistent decline in the frequency of subject index use by online catalog users, with a corresponding increase in the frequency of title keyword searching. Significant annual patterns in index usage were also identified. Analysis of the transaction data, and related previous studies of online catalog users, suggest a number of factors contributing to the decline in subject search frequency. Chief among these factors are user difficulties in formulating subject queries with Library of Congress Subject Headings, leading to search failure, and the problem of “information overload” as database size increases. This article presents the models and results of the transaction log analysis, discusses the underlying problems with subject searching contributing to the observed decline, and reviews some proposed improvements to online catalog systems to aid in overcoming these problems.
Journal of the Association for Information Science and Technology | 1992
Ray R. Larson
This article presents the results of research into the automatic selection of Library of Congress Classification numbers based on the titles and subject headings in MARC records. The method used in this study was based on partial match retrieval techniques using various elements of new records (i.e., those to be classified) as “queries,” and a test database of classification clusters generated from previously classified MARC records. Sixty individual methods for automatic classification were tested on a set of 283 new records, using all combinations of four different partial match methods, five query types, and three representations of search terms. The results indicate that if the best method for a particular case can be determined, then up to 86% of the new records may be correctly classified. The single method with the best accuracy was able to select the correct classification for about 46% of the new records.
international conference theory and practice digital libraries | 2004
Ray R. Larson; Patricia Frontiera
This paper presents results from an evaluation of algorithms for ranking results by probability of relevance for Geographic Information Retrieval (GIR) applications. We review the work done on GIR and especially on ranking algorithms for GIR. We evaluate these algorithms using a test collection of 2500 metadata records from a geographic digital library. We present an algorithm for GIR ranking based on logistic regression from samples of the test collection. We also examine the effects of different representations of the geographic regions being searched, including minimum bounding rectangles, and convex hulls.
Journal of the Association for Information Science and Technology | 1992
Ray R. Larson
Research on the use and users of online catalogs conducted in the early 1980s found that subject searches were the most common form of online catalog search. At the same time, many of the problems experienced by online catalog users have been traced to difficulties with the subject access mechanisms of the online catalog. Numerous proposals have been made for methods intended to improve subject access to online catalog records. These commonly involve enhancing the catalogs bibliographic records with additional terms, or incorporating subject authority files or additional thesauri in the database. Another stream of research has concentrated on applying retrieval techniques derived from information retrieval (IR) research to replace the Boolean search methods of conventional online catalog systems. This study describes the results of retrieval tests using a variety of these search methods in the CHESHIRE experimental online catalog system.
Journal of the Association for Information Science and Technology | 1996
Ray R. Larson; Jerome McDonough; Paul O'Leary; Lucy Kuntz; Ralph Moon
The Cheshire II online catalog system was designed to provide a bridge between the realms of purely bibliographical information and the rapidly expanding full-text and multimedia collections available online. It is based on a number of national and international standards for data description, communication, and interface technology. The system uses a client-server architecture with X window client communication with an SGML-based probabilistic search engine using the 239.50 information retrieval protocol.
international acm sigir conference on research and development in information retrieval | 2002
Ray R. Larson
This poster session examines a probabilistic approach to distributed information retrieval using a Logistic Regression algorithm for estimation of collection relevance. The algorithm is compared to other methods for distributed search using test collections developed for distributed search evaluation.
Information Retrieval | 2005
Ray R. Larson
In this paper we evaluate the application of data fusion or meta-search methods, combining different algorithms and XML elements, to content-oriented retrieval of XML structured data. The primary approach is the combination of a probabilistic methods using Logistic regression and the Okapi BM-25 algorithm for estimation of document relevance or XML element relevance, in conjunction with Boolean approaches for some query elements. In the evaluation we use the INEX XML test collection to examine the relative performance of individual algorithms and elements and compare these to the performance of the data fusion approaches.
acm/ieee joint conference on digital libraries | 2002
Ray R. Larson; Fredric C. Gey; Aitao Chen
This paper presents a method of information harvesting and consolidation to support the multilingual information requirements for cross-language information retrieval within digital library systems. We describe a way to create both customized bilingual dictionaries and multilingual query mappings from a source language to many target languages. We will describe a multilingual conceptual mapping resource with broad coverage (over 100 written languages can be supported) that is truly multilingual as opposed to bilingual parings usually derived from machine translation. This resource is derived from the 10+ million title online library catalog of the University of California. It is created statistically via maximum likelihood associations from word and phrases in book titles of many languages to human assigned subject headings in English. The 150,000 subject headings can form interlingua mappings between pairs of languages or from one language to several languages. While our current demonstration prototype maps between ten languages (English, Arabic, Chinese, French, German, Italian, Japanese, Portuguese, Russian, Spanish), extensions to additional languages are straightforward. We also describe how this resource is being expanded for languages where linguistic coverage is limited in our initial database, by automatically harvesting new information from international online library catalogs using the Z39.50 networked library search protocol.
international conference theory and practice digital libraries | 2003
Ray R. Larson
This paper examines technology developed to support large-scale distributed digital libraries. We describe the method used for harvesting collection information using standard information retrieval protocols and how this information is used in collection ranking and retrieval. The system that we have developed takes a probabilistic approach to distributed information retrieval using a Logistic regression algorithm for estimation of distributed collection relevance and fusion techniques to combine multiple sources of evidence. We discuss the harvesting method used and how it can be employed in building collection representatives using features of the Z39.50 protocol. The extracted collection representatives are ranked using a fusion of probabilistic retrieval methods. The effectiveness of our algorithm is compared to other distributed search methods using test collections developed for distributed search evaluation. We also describe how this system in currently being applied to operational systems in the U.K.
international acm sigir conference on research and development in information retrieval | 2004
Ray R. Larson; Patricia Frontiera
Information retrieval systems in operation today for applications ranging from Digital Libraries to Web Search make very little use of two major dimensions of the data being searched: location and time. For many applications these dimensions can provide an intuitive and understandable visualization of search constraints and results. However, current metadata representations of geographic characteristics and the uses of these metadata are problematic. Most retrieval and database, even those tailored to geographic information, provide only nascent approaches to the technologically and conceptually difficult challenges of Geographic Information Retrieval (GIR). Much of the problem is rooted in the geospatial metadata used by these systems to index and access geographic data. The primary issues are: