Charles L. A. Clarke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Charles L. A. Clarke is active.

Explore More

Publication

Featured researches published by Charles L. A. Clarke.

international acm sigir conference on research and development in information retrieval | 2008

Novelty and diversity in information retrieval evaluation

Charles L. A. Clarke; Maheedhar Kolla; Gordon V. Cormack; Olga Vechtomova; Azin Ashkan; Stefan Büttcher; Ian MacKinnon

Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper, we present a framework for evaluation that systematically rewards novelty and diversity. We develop this framework into a specific evaluation measure, based on cumulative gain. We demonstrate the feasibility of our approach using a test collection based on the TREC question answering track.

Information Retrieval | 2011

Efficient and effective spam filtering and re-ranking for large web datasets

Gordon V. Cormack; Mark D. Smucker; Charles L. A. Clarke

The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam—pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of “honeypot” queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering—from among the worst to among the best.

north american chapter of the association for computational linguistics | 2003

Frequency estimates for statistical word similarity measures

Egidio L. Terra; Charles L. A. Clarke

Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word co-occurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.

international acm sigir conference on research and development in information retrieval | 1998

Efficient construction of large test collections

Gordon V. Cormack; Christopher Palmer; Charles L. A. Clarke

Test collections with a million or more documents are needed for the evaluation of modern information retrieval systems. Yet their construction requires a great deal of effort. Judgements must be rendered as to whether or not documents are relevant to each of a set of queries. Exhaustive judging, in which every document is examined and a judgement rendered, is infeasible for collections of this size. Current practice is represented by the “pooling method”, as used in the TREC conference series, in which only the first k documents from each of a number of sources are judged. We propose two methods, Intemctive Searching and Judging and Moveto-front Pooling, that yield effective test collections while requiring many fewer judgements. Interactive Searching and Judging selects documents to be judged using an interactive search system, and may be used by a small research team to develop an effective test collection using minimal resources. Move-to-Front Pooling directly improves on the standard pooling method by using a variable number of documents from each source depending on its retrieval performance. Move-to-Front Pooling would be an appropriate replacement for the standard pooling method in future collection development efforts involving many independent groups.

The Computer Journal | 1995

An algebra for structured text search and a framework for its implementation

Charles L. A. Clarke; Gordon V. Cormack; Forbes J. Burkowski

A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combined intervals to yield new ones: containing, not containing, contained in, not contained in, one of, both of, followed by. The ultimate result of a query is the set of intervals that satisfy it. An implementation framework is given based on four primitive access functions. Each access function finds the solution to a query nearest to a given position in the database. Recursive definitions for the seven operators are given in terms of these access functions. Search time is at worst proportional to the time required to evaluate the access functions for occurrences of the elementary terms in a query

Information Processing and Management | 2000

Relevance ranking for one to three term queries

Charles L. A. Clarke; Gordon V. Cormack; Elizabeth A. Tudhope

We investigate the application of a novel relevance ranking technique, cover density ranking, to the requirements of Web-based information retrieval, where a typical query consists of a few search terms and a typical result consists of a page indicating several potentially relevant documents. Traditional ranking methods for information retrieval, based on term and inverse document frequencies, have been found to work poorly in this context. Under the cover density measure, ranking is based on term proximity and cooccurrence. Experimental comparisons show performance that compares favorably with previous work.

international acm sigir conference on research and development in information retrieval | 2009

Reciprocal rank fusion outperforms condorcet and individual rank learning methods

Gordon V. Cormack; Charles L. A. Clarke; Stefan Buettcher

Reciprocal Rank Fusion (RRF), a simple method for combining the document rankings from multiple IR systems, consistently yields better results than any individual system, and better results than the standard method Condorcet Fuse. This result is demonstrated by using RRF to combine the results of several TREC experiments, and to build a meta-learner that ranks the LETOR 3 dataset better than any previously reported method

international acm sigir conference on research and development in information retrieval | 2006

Term proximity scoring for ad-hoc retrieval on very large text collections

Stefan Büttcher; Charles L. A. Clarke; Brad Lushman

We propose an integration of term proximity scoring into Okapi BM25. The relative retrieval effectiveness of our retrieval method, compared to pure BM25, varies from collection to collection.We present an experimental evaluation of our method and show that the gains achieved over BM25 as the size of the underlying text collection increases. We also show that for stemmed queries the impact of term proximity scoring is larger than for unstemmed queries.

international acm sigir conference on research and development in information retrieval | 2012

Time-based calibration of effectiveness measures

Mark D. Smucker; Charles L. A. Clarke

Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by the user. In particular, these measures implicitly model users as working down a list of retrieval results, spending equal time assessing each document. In reality, even a careful user, intending to identify as much relevant material as possible, must spend longer on some documents than on others. Aspects such as document length, duplicates and summaries all influence the time required. In this paper, we introduce a time-biased gain measure, which explicitly accommodates such aspects of the search process. By conducting an appropriate user study, we calibrate and validate the measure against the TREC 2005 Robust Track test collection. We examine properties of the measure, contrasting it to traditional effectiveness measures, and exploring its extension to other aspects and environments. As its primary benefit, the measure allows us to evaluate system performance in human terms, while maintaining the simplicity and repeatability of system-oriented tests. Overall, we aim to achieve a clearer connection between user-oriented studies and system-oriented tests, allowing us to better transfer insights and outcomes from one to the other.

web search and data mining | 2011

A comparative analysis of cascade measures for novelty and diversity

Charles L. A. Clarke; Nick Craswell; Ian Soboroff; Azin Ashkan

Traditional editorial effectiveness measures, such as nDCG, remain standard for Web search evaluation. Unfortunately, these traditional measures can inappropriately reward redundant information and can fail to reflect the broad range of user needs that can underlie a Web query. To address these deficiencies, several researchers have recently proposed effectiveness measures for novelty and diversity. Many of these measures are based on simple cascade models of user behavior, which operate by considering the relationship between successive elements of a result list. The properties of these measures are still poorly understood, and it is not clear from prior research that they work as intended. In this paper we examine the properties and performance of cascade measures with the goal of validating them as tools for measuring effectiveness. We explore their commonalities and differences, placing them in a unified framework; we discuss their theoretical difficulties and limitations, and compare the measures experimentally, contrasting them against traditional measures and against other approaches to measuring novelty. Data collected by the TREC 2009 Web Track is used as the basis for our experimental comparison. Our results indicate that these measures reward systems that achieve an balance between novelty and overall precision in their result lists, as intended. Nonetheless, other measures provide insights not captured by the cascade measures, and we suggest that future evaluation efforts continue to report a variety of measures.

Explore More