Allison L. Powell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Allison L. Powell is active.

Explore More

Publication

Featured researches published by Allison L. Powell.

international acm sigir conference on research and development in information retrieval | 1999

Comparing the performance of database selection algorithms

James C. French; Allison L. Powell; James P. Callan; Charles L. Viles; Travis Emmitt; Kevin J. Prey; Yun Mou

We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/TIPSTER data into 236 subcollections. We present results of a recent investigation of the performance of the CORI algorithm and compare the performance with earlier work that examined the performance of gGlOSS. The databases from our testbed were ranked using both the gGlOSS and CORI techniques and compared to the RBR baseline, a baseline derived from TREC relevance judgements. We examined the degree to which CORI and gGlOSS approximate this baseline. Our results confirm our earlier observation that the gGlOSS Ideal(l) ranks do not estimate relevance-based ranks well. We also find that CORI is a uniformly better estimator of relevance-based ranks than gGlOSS for the test environment used in this study. Part of the advantage of the CORI algorithm can be explained by a strong correlation between gGlOSS and a size-based baseline (SBR). We also find that CORI produces consistently accurate rankings on testbeds ranging from 100--921 sites. However for a given level of recall, search effort appears to scale linearly with the number of databases.

international conference on data engineering | 1999

Clustering large datasets in arbitrary metric spaces

Venkatesh Ganti; Raghu Ramakrishnan; Johannes Gehrke; Allison L. Powell; James C. French

Clustering partitions a collection of objects into groups called clusters, such that similar objects fall into the same group. Similarity between objects is defined by a distance function satisfying the triangle inequality; this distance function along with the collection of objects describes a distance space. In a distance space, the only operation possible on data objects is the computation of distance between them. All scalable algorithms in the literature assume a special type of distance space, namely a k-dimensional vector space, which allows vector operations on objects. We present two scalable algorithms designed for clustering very large datasets in distance spaces. Our first algorithm BUBBLE is, to our knowledge, the first scalable clustering algorithm for data in a distance space. Our second algorithm BUBBLE-FM improves upon BUBBLE by reducing the number of calls to the distance function, which may be computationally very expensive. Both algorithms make only a single scan over the database while producing high clustering quality. In a detailed experimental evaluation, we study both algorithms in terms of scalability and quality of clustering. We also show results of applying the algorithms to a real life dataset.

international acm sigir conference on research and development in information retrieval | 2000

The impact of database selection on distributed searching

Allison L. Powell; James C. French; James P. Callan; Margaret E. Connell; Charles L. Viles

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.

international acm sigir conference on research and development in information retrieval | 1998

Evaluating database selection techniques: a testbed and experiment

James C. French; Allison L. Powell; Charles L. Viles; Travis Emmitt; Kevin J. Prey

We describe a testbed for database selection techniques and an experiment conducted using this testbed. The testbed is a decomposition of the TREC/TIPSTER data that allows analysis of the data along multiple dimensions, including collection-based and temporal-based analysis. We characterize the subcollections in this testbed in terms of number of documents, queries against which the document,s have been evaluated for relevance, and distribution of relevant documents. We then present initial results from a study conducted using this testbed that examines the effectiveness of the gGlOSS approach to database selection. The databases from our testbed were ranked using the gGl0S.S techniques and compared to the gGlOSS I&l(l) baseline and a baseline derived from TREC relevance judgements. We have examined the degree to which several gGlOSS estimate functions approximate these baselines. Our initial results confirm that the gGZOSS estimators are excellent predictors of the Ideal(Z) ranks but that the Ideal(l) ranks do not estimate relevance-based ranks well.

ACM Transactions on Information Systems | 2003

Comparing the performance of collection selection algorithms

Allison L. Powell; James C. French

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multicollection environment. Multicollection searching is cast in three parts: collection selection (also referred to as database selection), query processing and results merging. In this work, we focus our attention on the evaluation of the first step, collection selection.In this article, we present a detailed discussion of the methodology that we used to evaluate and compare collection selection approaches, covering both test environments and evaluation measures. We compare the CORI, CVV and gGLOSS collection selection approaches using six test environments utilizing three document testbeds. We note similar trends in performance among the collection selection approaches, but the CORI approach consistently outperforms the other approaches, suggesting that effective collection selection can be achieved using limited information about each collection.The contributions of this work are both the assembled evaluation methodology as well as the application of that methodology to compare collection selection approaches in a standardized environment.

conference on information and knowledge management | 1997

Applications of approximate word matching in information retrieval

James C. French; Allison L. Powell; Eric Schulman

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more difficult. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms.

Journal of the Association for Information Science and Technology | 2000

Using clustering strategies for creating authority files

James C. French; Allison L. Powell; Eric Schulman

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographic entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files.

hawaii international conference on system sciences | 2002

Obtaining language models of web collections using query-based sampling techniques

Gary A. Monroe; James C. French; Allison L. Powell

In the context of information retrieval, traditional collection selection algorithms have been widely studied. These algorithms utilize language models, a representation of the contents of each text collection over which selection is to be performed, but these language models cannot always be easily acquired. Query-based sampling is a technique by which these language models are discovered by interacting with a collection and observing the results. Previous work has shown query-based sampling to be a viable solution to the problem of discovering the contents of text collections when the information cannot be otherwise obtained. However, the characteristics of language models of WWW collections created using query-based sampling have not yet been studied. This work evaluates two query-based sampling techniques for building language models of three World Wide Web collections. Experimental results support the effectiveness of query-based sampling as a solution for building language models of web collections. This work also proposes a metric by which it may be possible to determine the point at which further sampling of a given web collection can cease. This metric is used along with other metrics used in previous work to determine the fidelity of these language models.

acm conference on hypertext | 1997

Applying hypertext structures to software documentation

James C. French; John C. Knight; Allison L. Powell

Abstract Software documentation represents a critical resource to the successful functioning of many enterprises. However, because it is static, documentation often fails to meet the needs of the many diverse users who are required to consult it on a regular basis in the course of their daily work. Software documentation is a rich resource that has not been fully exploited. Treatment of software documentation presents a number of interesting problems that require a blend of information retrieval and hypertext techniques for their successful solution. The evolving nature of a software project and the diverse demands on its documentation present an especially challenging environment. This is made even more challenging by the variety of information resources, ranging from formal specification languages to source code, that must be integrated into a coherent whole for the purpose of querying. In this paper we discuss work in progress at the University of Virginia. We consider the issues involved with automating the management of software documentation to better increase its utility. We describe a prototype system, SLEUTH, currently under investigation as a vehicle for software documentation management. The prototype maintains software documentation as a hypertext with typed links for the purpose of browsing by users with varying needs. These links are generated mechanically by the system and kept accurate under update. Appropriate authoring tools provide the system with the information it needs for this maintenance function. Ad hoc querying is provided over the documentation and hypertext documents are synthesized in response to these queries.

acm international conference on digital libraries | 2000

Growth and server availability of the NCSTRL digital library

Allison L. Powell; James C. French

This paper reports on measurements of the NCSTRL digital library taken over a two-year period. We report the growth of the system along two dimensions: number of participating institutions and number of documents indexed by the system. We also report an aspect of reliability for this distributed digital library system.

Explore More