Thomas R. Lynam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas R. Lynam is active.

Explore More

Publication

Featured researches published by Thomas R. Lynam.

ACM Transactions on Information Systems | 2007

Online supervised spam filter evaluation

Gordon V. Cormack; Thomas R. Lynam

Eleven variants of six widely used open-source spam filters are tested on a chronological sequence of 49086 e-mail messages received by an individual from August 2003 through March 2004. Our approach differs from those previously reported in that the test set is large, comprises uncensored raw messages, and is presented to each filter sequentially with incremental feedback. Misclassification rates and Receiver Operating Characteristic Curve measurements are reported, with statistical confidence intervals. Quantitative results indicate that content-based filters can eliminate 98% of spam while incurring 0.1% legitimate email loss. Qualitative results indicate that the risk of loss depends on the nature of the message, and that messages likely to be lost may be those that are less critical. More generally, our methodology has been encapsulated in a free software toolkit, which may used to conduct similar experiments.

international acm sigir conference on research and development in information retrieval | 2006

Statistical precision of information retrieval evaluation

Gordon V. Cormack; Thomas R. Lynam

We introduce and validate bootstrap techniques to compute confidence intervals that quantify the effect of test-collection variability on average precision (AP) and mean average precision (MAP) IR effectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an infinite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques.

international acm sigir conference on research and development in information retrieval | 2006

On-line spam filter fusion

Thomas R. Lynam; Gordon V. Cormack; David R. Cheriton

We show that a set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters. The results of fifty-three spam filters evaluated at the TREC 2005 Spam Track were combined post-hoc so as to simulate the parallel on-line operation of the filters. The combined results were evaluated using the TREC methodology, yielding more than a factor of two improvement over the best filter. The simplest method -- averaging the binary classifications returned by the individual filters -- yields a remarkably good result. A new method -- averaging log-odds estimates based on the scores returned by the individual filters -- yields a somewhat better result, and provides input to SVM- and logistic-regression-based stacking methods. The stacking methods appear to provide further improvement, but only for very large corpora. Of the stacking methods, logistic regression yields the better result. Finally, we show that it is possible to select a priori small subsets of the filters that, when combined, still outperform the best individual filter by a substantial margin.

international acm sigir conference on research and development in information retrieval | 2002

The impact of corpus size on question answering performance

Charles L. A. Clarke; Gordon V. Cormack; M. Laszlo; Thomas R. Lynam; Egidio L. Terra

Using our question answering system, questions from the TREC 2001 evaluation were executed over a series of Web data collections, with the sizes of the collections increasing from 25 gigabytes up to nearly a terabyte.

text retrieval conference | 2008

Question Answering By Passage Selection

Charles L. A. Clarke; Gordon V. Cormack; Thomas R. Lynam; Egidio L. Terra

The MultiText QA System performs question answering using a two step passage selection method. In the first step, an arbitrary passage retrieval algorithm efficiently identifies hotspots in a large target corpus where the answer might be located. In the second step, an answer selection algorithm analyzes these hotspots, considering such factors as answer type and candidate redundancy, to extract short answer snippets. This chapter describes both steps in detail, with the goal of providing sufficient information to allow independent implementation. The method is evaluated using the test collection developed for the TREC 2001 question answering track.

conference on information and knowledge management | 2004

A multi-system analysis of document and term selection for blind feedback

Thomas R. Lynam; Chris Buckley; Charles L. A. Clarke; Gordon V. Cormack

Experiments were conducted to explore the impact of combining various components of eight leading information retrieval systems. Each system demonstrated improved effectiveness with the use of blind feedback, in which the results of a preliminary retrieval step were used to augment the efficacy of a secondary retrieval step. The hybrid combination of primary and secondary retrieval steps from different systems in a number of cases yielded better effectiveness than either of the constituent systems alone. This positive combining effect was observed when entire documents were passed between the two retrieval steps, but not when only the expansion terms were passed. Several combinations of primary and secondary retrieval steps were fused using the CombMNZ algorithm; all yielded significant effectiveness improvement over the individual systems, with the best yielding a an improvement of 13% (p = 10-6) over the best individual system and an improvement of 4% (p = 10-5) over a simple fusion of the eight systems.

international acm sigir conference on research and development in information retrieval | 2007

Validity and power of t-test for comparing MAP and GMAP

Gordon V. Cormack; Thomas R. Lynam

We examine the validity and power of the t-test, Wilcoxon test, and sign test in determining whether or not the difference in performance between two IR systems is significant. Empirical tests conducted on subsets of the TREC2004 Robust Retrieval collection indicate that the p-values computed by these tests for the difference in mean average precision (MAP) between two systems are very accurate fora wide range of sample sizes and significance estimates. Similarly, these tests have good power, with the t-test proving superior overall. The t-test is also valid for comparing geometric mean average precision (GMAP), exhibiting slightly superior accuracy and slightly inferior power than for MAPcomparison.

international acm sigir conference on research and development in information retrieval | 2007

Power and bias of subset pooling strategies

Gordon V. Cormack; Thomas R. Lynam

We define a method to estimate the random and systematic errors resulting from incomplete relevance assessments.Mean Average Precision (MAP) computed over a large number of topics with a shallow assessment pool substantially outperforms -- for the same adjudication effort MAP computed over fewer topics with deeper pools, and P@k computed with pools of the same depth. Move-to-front pooling,previously reported to yield substantially better rank correlation, yields similar power, and lower bias, compared tofixed-depth pooling.

international conference on human language technology research | 2001

Information extraction with term frequencies

Thomas R. Lynam; Charles L. A. Clarke; Gordon V. Cormack

Every day, millions of people use the internet to answer questions. Unfortunately, at present, there is no simple and successful means to consistently accomplish this goal. One common approach is to enter a few terms from a question into a Web search system and scan the resulting pages for the answer, a laborious process. To address this need, a question answering (QA) system was created to find and extract answers from a corpus. This system contains three parts: a parser for generating question queries and categories, a passage retrieval element, and an information extraction (IE) component. The extraction method was designed to elicit answers from passages collected by the information retrieval engine. The subject of this paper is the information extraction component. It is based on the premise that information related to the answer will be found many times in a large corpus like the Web.

international acm sigir conference on research and development in information retrieval | 2001