Carolyn J. Crouch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carolyn J. Crouch is active.

Explore More

Publication

Featured researches published by Carolyn J. Crouch.

international acm sigir conference on research and development in information retrieval | 1992

Experiments in automatic statistical thesaurus construction

Carolyn J. Crouch; Bokyung Yang

A well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2] based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a terms membership in a particular thesaurus class, is found not to be useful in distinguishing a “good” from an “indifferent” or “poor” thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.

Information Processing and Management | 1990

An approach to the automatic construction of global thesauri

Carolyn J. Crouch

Abstract The benefits of a well constructed thesaurus to an information retrieval system have long been recognized by both researchers and practitioners in the field. Previous experiments have investigated the construction of thesauri by manual, semiautomatic, and automatic means. Automatic thesaurus generation in particular has proven to be an especially difficult problem. This paper examines both early and current approaches to automatic thesaurus construction and describes an approach to the automatic generation of global thesauri based on the term discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

acm conference on hypertext | 1989

The use of cluster hierarchies in hypertext information retrieval

Donald B. Crouch; Carolyn J. Crouch; Glenn Andreas

The graph-traversal approach to hypertext information retrieval is a conceptualization of hypertext in which the structural aspects of the nodes are emphasized. A user navigates through such hypertext systems by evaluating the semantics associated with links between nodes as well as the information contained in nodes. [Fris88] In this paper we describe an hierarchical structure which effectively supports the graphical traversal of a document collection in a hypertext system. We provide an overview of an interactive browser based on cluster hierarchies. Initial results obtained from the use of the browser in an experimental hypertext retrieval system are presented.

international acm sigir conference on research and development in information retrieval | 1988

A cluster-based approach to thesaurus construction

Carolyn J. Crouch

The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

Information Processing and Management | 2002

Improving the retrieval effectiveness of very short queries

Carolyn J. Crouch; Donald B. Crouch; Qingyan Chen; Steven J. Holtz

Abstract This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.

international acm sigir conference on research and development in information retrieval | 1989

The automatic generation of extended queries

Carolyn J. Crouch; Donald B. Crouch; Krishna R. Nareddy

In the extended vector space model, each document vector consists of a set of subvectors representing the multiple concepts or concept classes present in the document. Typical information concepts, in addition to the usual content terms or descriptors, include author names, bibliographic links, etc. The extended vector space model is known to improve retrieval effectiveness. However, a major impediment to the use of the extended model is the construction of an extended query. In this paper, we describe a method for automatically extending a query containing only content terms (a single concept class) to a representation containing multiple concept classes. No relevance feedback is involved. Experiments using the CACM collection resulted in an average precision 34% better than that obtained using the standard single-concept term vector model.

Information Processing and Management | 1988

An analysis of approximate versus exact discrimination values

Carolyn J. Crouch

Abstract Term discrimination values have been used to characterize and select potential index terms for use during automatic indexing. Two basic approaches to the calculation of discrimination values have been suggested. These approaches differ in their calculation of space density; one method uses the average document-pair similarity for the collection and the other constructs an artificial, “average” document, the centroid, and computes the sum of the similarities of each document with the centroid. The former method has been said to produce “exact” discrimination values and the latter “approximate” values. This article investigates the differences between the algorithms associated with these two approaches (as well as several modified versions of the algorithms) in terms of their impact on the discrimination value model by determining the differences that exist between the rankings of the exact and approximate discrimination values. The experimental results show that the rankings produced by the exact approach and by a centroid-based algorithm suggested by the author are highly compatible. These results indicate that a previously suggested method involving the calculation of exact discrimination values cannot be recommended in view of the excessive cost associated with such an approach; the approximate (i.e., “exact centroid”) approach discussed in this article yields a comparable result at a cost that makes its use feasible for any of the experimental document collections currently in use.

ACM Transactions on Information Systems | 2006

Dynamic element retrieval in a structured environment

Carolyn J. Crouch

This research examines the feasibility of dynamic element retrieval in a structured environment. Structured documents and queries are represented in extended vector form, based on a modification of the basic vector space model suggested by Fox [1983]. A method for the dynamic retrieval of XML elements, which requires only a single indexing of the documents at the level of the basic indexing node, is presented. This method, which we refer to as flexible retrieval, produces a rank ordered list of retrieved elements that is equivalent to the result produced by the same retrieval against an all-element index of the collection. Flexible retrieval obviates the need for storing either an all-element index or multiple indices of the collection.

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval | 2009

A methodology for producing improved focused elements

Carolyn J. Crouch; Donald B. Crouch; Dinesh Bhirud; Pavan Poluri; Chaitanya Polumetla; Varun Sudhakar

This paper reports the results of our experiments to consistently produce highly ranked focused elements in response to the Focused Task of the INEX Ad Hoc Track. The results of these experiments, performed using the 2008 INEX collection, confirm that our current methodology (described herein) produces such elements for this collection. Our goal for 2009 is to apply this methodology to the new, extended 2009 INEX collection to determine its viability in this environment. (These experiments are currently underway.) Our system uses our method for dynamic element retrieval [4], working with the semi-structured text of Wikipedia [5], to produce a rank-ordered list of elements in the context of focused retrieval. It is based on the Vector Space Model [15]; basic functions are performed using the Smart experimental retrieval system [14]. Experimental results are reported for the Focused Task of both the 2008 and 2009 INEX Ad Hoc Tracks.

Focused Access to XML Documents | 2008

Dynamic Element Retrieval in the Wikipedia Collection

Carolyn J. Crouch; Donald B. Crouch; Nachiket Kamat; Vikram Malik; Aditya Mone

This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks.

Explore More