Margaret E. Connell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Margaret E. Connell is active.

Explore More

Publication

Featured researches published by Margaret E. Connell.

ACM Transactions on Information Systems | 2001

Query-based sampling of text databases

James P. Callan; Margaret E. Connell

The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents query-based sampling, a new technicque for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic dtabase selection.

international acm sigir conference on research and development in information retrieval | 2002

Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

Leah S. Larkey; Lisa Ballesteros; Margaret E. Connell

Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stem¿ming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-lan¿guage retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analy¿sis pro¿duced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or mor¿phological analysis.

international conference on management of data | 1999

Automatic discovery of language models for text databases

James P. Callan; Margaret E. Connell; Aiqun Du

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.

international acm sigir conference on research and development in information retrieval | 2000

The impact of database selection on distributed searching

Allison L. Powell; James C. French; James P. Callan; Margaret E. Connell; Charles L. Viles

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.

Archive | 2007

Light Stemming for Arabic Information Retrieval

Leah S. Larkey; Lisa Ballesteros; Margaret E. Connell

Computational Morphology is an urgent problem for Arabic Natural Language Processing, because Arabic is a highly inflected language. We have found, however, that a full solution to this problem is not required for effective information retrieval. Light stemming allows remarkably good information retrieval without providing correct morphological analyses. We developed several light stemmers for Arabic, and assessed their effectiveness for information retrieval using standard TREC data. We have also compared light stemming with several stemmers based on morphological analysis. The light stemmer, light10, outperformed the other approaches. It has been included in the Lemur toolkit, and is becoming widely used Arabic information retrieval.

conference on information and knowledge management | 2000

Collection selection and results merging with topically organized U.S. patents and TREC data

Leah S. Larkey; Margaret E. Connell; James P. Callan

We investigate three issues in d istributed information retrieval, considering both TREC data and U.S. Patents: (1) topical organization o f large text collections, (2) collection ranking and selection with topically organized collections (3) results merging, particularly document score normalization, with topically organized collections. We find that it is better to organize collections topically, and that topical collections can be well ranked using either INQUERY’s CORI algorithm, or the Kullback-Leibler divergence (KL), but KL is far worse than CORI for non-topically organized collections. For r esults merging, collections organized b y topic require global idfs for the best performance. Contrary to results found elsewhere, normalized scores are not as good as global idfs for merging when the collections are topically organized.

computational intelligence | 1987

Learning to control a dynamic physical system

Margaret E. Connell; E. Connell; Paul E. Utgoff

This paper presents an approach to learning to control a dynamic physical system. The approach has been implemented in a program named CART, and applied to a simple physical system studied previously by several researchers. Experiments illustrate that a control method is learned in about 16 trials, an improvement over previous learning programs.

international acm sigir conference on research and development in information retrieval | 2004

Language-specific models in multilingual topic tracking

Leah S. Larkey; Fangfang Feng; Margaret E. Connell; Victor Lavrenko

Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. We propose a native language hypothesis stating that comparisons would be more effective in the original language of the story. We first test and support the hypothesis for story link detection. For topic tracking the hypothesis implies that it should be preferable to build separate language-specific topic models for each language in the stream. We compare different methods of incrementally building such native language topic models.

ACM Transactions on Asian Language Information Processing | 2003

Hindi CLIR in thirty days

Leah S. Larkey; Margaret E. Connell; Nasreen AbdulJaleel

As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi--English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi. The biggest stumbling blocks were collection of parallel English and Hindi text and dealing with numerous proprietary encodings.

Archive | 1991

Conflict Resolution Strategies for Cooperating Expert Agents

Susan E. Lander; Victor R. Lesser; Margaret E. Connell

Problem-solving approaches which incorporate specialized cooperating expert agents seem intuitively appropriate for many complex problems. However, integrating diverse expertise requires that the experts have some mechanism for dealing with conflicts that occur during problem-solving. We describe the Cooperating Experts Framework (CEF), a framework developed to support cooperative problem-solving among sets of knowledge-based systems with limited information about each other’s local states. The systems solve subproblems relevant to their specific expertise and integrate their efforts using conflict resolution strategies that are appropriate to the problem solving context. In choosing a strategy CEF makes tradeoffs between the potential quality of a solution, the amount of processing required to apply a strategy, and the effect of local changes on the global solution. We also describe TEAM, a system implemented in the CEF framework, that designs steam condensers.

Explore More