Is this you? Create Your Porfile

Ian Soboroff

National Institute of Standards and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ian Soboroff is active.

Explore More

Publication

Featured researches published by Ian Soboroff.

international acm sigir conference on research and development in information retrieval | 2001

Ranking retrieval systems without relevance judgments

Ian Soboroff; Charles K. Nicholas; Patrick Cahan

The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to aspseudo-relevance judgments.Rankings of systems with our methodology correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.

web search and data mining | 2011

A comparative analysis of cascade measures for novelty and diversity

Charles L. A. Clarke; Nick Craswell; Ian Soboroff; Azin Ashkan

Traditional editorial effectiveness measures, such as nDCG, remain standard for Web search evaluation. Unfortunately, these traditional measures can inappropriately reward redundant information and can fail to reflect the broad range of user needs that can underlie a Web query. To address these deficiencies, several researchers have recently proposed effectiveness measures for novelty and diversity. Many of these measures are based on simple cascade models of user behavior, which operate by considering the relationship between successive elements of a result list. The properties of these measures are still poorly understood, and it is not clear from prior research that they work as intended. In this paper we examine the properties and performance of cascade measures with the goal of validating them as tools for measuring effectiveness. We explore their commonalities and differences, placing them in a unified framework; we discuss their theoretical difficulties and limitations, and compare the measures experimentally, contrasting them against traditional measures and against other approaches to measuring novelty. Data collected by the TREC 2009 Web Track is used as the basis for our experimental comparison. Our results indicate that these measures reward systems that achieve an balance between novelty and overall precision in their result lists, as intended. Nonetheless, other measures provide insights not captured by the cascade measures, and we suggest that future evaluation efforts continue to report a variety of measures.

international acm sigir conference on research and development in information retrieval | 2007

Reliable information retrieval evaluation with incomplete and biased judgements

Stefan Büttcher; Charles L. A. Clarke; Peter C. K. Yeung; Ian Soboroff

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique. We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.

empirical methods in natural language processing | 2005

Novelty Detection: The TREC Experience

Ian Soboroff; Donna Harman

A challenge for search systems is to detect not only when an item is relevant to the users information need, but also when it contains something new which the user has not seen before. In the TREC novelty track, the task was to highlight sentences containing relevant and new information in a short, topical document stream. This is analogous to highlighting key parts of a document for another person to read, and this kind of output can be useful as input to a summarization system. Search topics involved both news events and reported opinions on hot-button subjects. When people performed this task, they tended to select small blocks of consecutive sentences, whereas current systems identified many relevant and novel passages. We also found that opinions are much harder to track than events.

international acm sigir conference on research and development in information retrieval | 2010

The effect of assessor error on IR system evaluation

Ben Carterette; Ian Soboroff

Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the topics, this is a reasonable approach. However, as topic set sizes grow, and researchers look to crowdsourcing and Amazons Mechanical Turk to collect relevance judgments, we are faced with issues of quality control. This paper examines the robustness of the TREC Million Query track methods when some assessors make significant and systematic errors. We find that while averages are robust, assessor errors can have a large effect on system rankings.

international acm sigir conference on research and development in information retrieval | 2012

On building a reusable Twitter corpus

Richard McCreadie; Ian Soboroff; Jimmy J. Lin; Craig Macdonald; Iadh Ounis; Dean McCullough

The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus - referred to as Tweets2011 - was distributed via lists of tweet identifiers and specialist tweet crawling software. Furthermore, we analyse whether this distribution approach remains robust over time, as tweets in the corpus are removed either by users or Twitter itself. Tweets2011 was successfully used by 58 participating groups for the TREC 2011 Microblog track, while our results attest to the robustness of the crawling methodology over time.

international acm sigir conference on research and development in information retrieval | 2006

Bias and the limits of pooling

Chris Buckley; Darrin L. Dimmick; Ian Soboroff; Ellen M. Voorhees

Modern retrieval test collections are built through a process called pooling in which only a sample of the entire document set is judged for each topic. The idea behind pooling is to find enough relevant documents such that when unjudged documents are assumed to be nonrelevant the resulting judgment set is sufficiently complete and unbiased. As document sets grow larger, a constant-size pool represents an increasingly small percentage of the document set, and at some point the assumption of approximately complete judgments must become invalid.This paper demonstrates that the AQUAINT 2005 test collection exhibits bias caused by pools that were too shallow for the document set size despite having many diverse runs contribute to the pools. The existing judgment set favors relevant documents that contain topic title words even though relevant documents containing few topic title words are known to exist in the document set. The paper concludes with suggested modifications to traditional pooling and evaluation methodology that may allow very large reusable test collections to be built.

international acm sigir conference on research and development in information retrieval | 2010

Blog track research at TREC

Craig Macdonald; Rodrygo L. T. Santos; Iadh Ounis; Ian Soboroff

The TREC Blog track aims to explore information seeking behaviour in the blogosphere, by building reusable test collections for blog-related search tasks. Since, its advent in TREC 2006, the Blog track has led to much research in this growing field, and encapsulated cross-pollination from natural language processing research. This paper recaps on the tasks addressed at the TREC Blog track thus far, covering the period 2006 - 2009. In particular, we describe the used corpora, the tasks addressed within the track, and the resulting published research.

international acm sigir conference on research and development in information retrieval | 2007

The CSIRO enterprise search test collection

Peter Bailey; Nick Craswell; Ian Soboroff; Arjen P. de Vries

This article describes a new TREC Enterprise Track search test collection -- CERC. The collection is designed to represent some real-world search activity within the enterprise, using as a specific example the Commonwealth Scientific and Industrial Research Organisation (CSIRO). It has a deep crawl of CSIROs public-facing information, that is very similar to the crawl of a real-world search service provided by CSIRO. The search tasks are based on the activities of CSIRO Science Communicators, who are CSIRO employees that deal with public-facing information. Topics and judgments are tied to the Science Communicators in various ways, for example by involving them in the topic development process. The overall approach is to enhance the validity of the test collection as a model of enterprise search, by tying it to real-world examples.

international acm sigir conference on research and development in information retrieval | 2000

Collaborative filtering and the generalized vector space model (poster session)

Ian Soboroff; Charles K. Nicholas

Collaborative filtering is a technique for recommending documents to users based on how similar their tastes are to other users. If two users tend to agree on what they like, the system will recommend the same documents to them. The generalized vector space model of information retrieval represents a document by a vector of its similarities to all other documents. The process of collaborative filtering is nearly identical to the process of retrieval using GVSM in a matrix of user ratings. Using this observation, a model for filtering collaboratively using document content is possible.

Explore More