Mark D. Smucker
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark D. Smucker.
conference on information and knowledge management | 2007
Mark D. Smucker; James Allan; Ben Carterette
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Students paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fishers randomization (permutation) test as non-parametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test have a poor ability to detect significance and have the potential to lead to false detections of significance. The Wilcoxon and sign tests are simplified variants of the randomization test and their use should be discontinued for measuring the significance of a difference between means.
Information Retrieval | 2011
Gordon V. Cormack; Mark D. Smucker; Charles L. A. Clarke
The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam—pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the “spamminess” of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of “honeypot” queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering—from among the worst to among the best.
BioSystems | 1996
Daniel Ashlock; Mark D. Smucker; E. Ann Stanley; Leigh Tesfatsion
Partner selection is an important process in many social interactions, permitting individuals to decrease the risks associated with cooperation. In large populations, defectors may escape punishment by roving from partner to partner, but defectors in smaller populations risk social isolation. We investigate these possibilities for an evolutionary Prisoners Dilemma in which agents use expected payoffs to choose and refuse partners. In comparison to random or round-robin partner matching, we find that the average payoffs attained with preferential partner selection tend to be more narrowly confined to a few isolated payoff regions. Most ecologies evolve to essentially full cooperative behavior, but when agents are intolerant of defections, or when the costs of refusal and social isolation are small, we also see the emergence of wallflower ecologies in which all agents are socially isolated. Between these two extremes, we see the emergence of ecologies whose agents tend to engage in a small number of defections followed by cooperation thereafter. The latter ecologies exhibit a plethora of interesting social interaction patterns.
international acm sigir conference on research and development in information retrieval | 2012
Mark D. Smucker; Charles L. A. Clarke
Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by the user. In particular, these measures implicitly model users as working down a list of retrieval results, spending equal time assessing each document. In reality, even a careful user, intending to identify as much relevant material as possible, must spend longer on some documents than on others. Aspects such as document length, duplicates and summaries all influence the time required. In this paper, we introduce a time-biased gain measure, which explicitly accommodates such aspects of the search process. By conducting an appropriate user study, we calibrate and validate the measure against the TREC 2005 Robust Track test collection. We examine properties of the measure, contrasting it to traditional effectiveness measures, and exploring its extension to other aspects and environments. As its primary benefit, the measure allows us to evaluate system performance in human terms, while maintaining the simplicity and repeatability of system-oriented tests. Overall, we aim to achieve a clearer connection between user-oriented studies and system-oriented tests, allowing us to better transfer insights and outcomes from one to the other.
international acm sigir conference on research and development in information retrieval | 2006
Mark D. Smucker; James Allan
Search systems have for some time provided users with the ability to request documents similar to a given document. Interfaces provide this feature via a link or button for each document in the search results. We call this feature find-similar or similarity browsing. We examined find-similar as a search tool, like relevance feedback, for improving retrieval performance. Our investigation focused on find-similars document-to-document similarity, the reexamination of documents during a search, and the users browsing pattern. Find-similar with a query-biased similarity, avoiding the reexamination of documents, and a breadth-like browsing pattern achieved a 23% increase in the arithmetic mean average precision and a 66% increase in the geometric mean average precision over our baseline retrieval. This performance matched that of a more traditionally styled iterative relevance feedback technique.
international acm sigir conference on research and development in information retrieval | 2011
Leif Azzopardi; Kalervo Järvelin; Jaap Kamps; Mark D. Smucker
All search in the real-world is inherently interactive. Information retrieval (IR) has a firm tradition of using simulation to evaluate IR systems as embodied by the Cranfield paradigm. However, to a large extent, such system evaluations ignore user interaction. Simulations provide a way to go beyond this limitation. With an increasing number of researchers using simulation to evaluate interactive IR systems, it is now timely to discuss, develop and advance this powerful methodology within the field of IR. During the SimInt 2010 workshop around 40 participants discussed and presented their views on the simulation of interaction. The main conclusion and general consensus was that simulation offers great potential for the field of IR; and that simulations of user interaction can make explicit the user and the user interface while maintaining the advantages of the Cranfield paradigm.
international acm sigir conference on research and development in information retrieval | 2008
Jimmy J. Lin; Mark D. Smucker
In the context of document retrieval in the biomedical domain, this paper explores the complex relationship between the quality of initial query results and the overall utility of an interactive retrieval system. We demonstrate that a content-similarity browsing tool can compensate for poor retrieval results, and that the relationship between retrieval performance and overall utility is non-linear. Arguments are advanced with user simulations, which characterize the relevance of documents that a user might encounter with different browsing strategies. With broader implications to IR, this work provides a case study of how user simulations can be exploited as a formative tool for automatic utility evaluation. Simulation-based studies provide researchers with an additional evaluation tool to complement interactive and Cranfield-style experiments.
international acm sigir conference on research and development in information retrieval | 2010
Mark D. Smucker; Chandra Prakash Jethani
Several studies have found that the Cranfield approach to evaluation can report significant performance differences between retrieval systems for which little to no performance difference is found for humans completing tasks with these systems. We revisit the relationship between precision and performance by measuring human performance on tightly controlled search tasks and with user interfaces offering limited interaction. We find that human performance and retrieval precision are strongly related. We also find that users change their relevance judging behavior based on the precision of the results. This change in behavior coupled with the well-known lack of perfect inter-assessor agreement can reduce the measured performance gains predicted by increased precision.
european conference on artificial life | 1995
E. Ann Stanley; Daniel Ashlock; Mark D. Smucker
In a series of papers we have examined what happens when individuals make very calculated choices of partners, based on past interaction histories [17, 1, 16]. In Iterated Prisoners Dilemma with Choice and Refusal (IPD/CR), players use expected payoffs, which are based on the play history between the players plus an initial expectation, to assess the relative desirability of potential partners and refuse play with those judged to be intolerable. We have primarily studied this model using evolved populations of finite state machines. In each generation, individual behaviors generate social networks of interacting players. Here we provide an overview of our previous evolutionary results, and include some preliminary results on the impact of increasing the population size and including more randomness into the partner selection procedure.
conference on information and knowledge management | 2007
Ben Carterette; Mark D. Smucker
Information retrieval experimentation generally proceeds in a cycle of development, evaluation, and hypothesis testing. Ideally, the evaluation and testing phases should be short and easy, so as to maximize the amount of time spent in development. There has been recent work on reducing the amount of assessor effort needed to evaluate retrieval systems, but it has not, for the most part, investigated the effects of these methods on tests of significance. In this work, we explore in detail the effects of reduced sets of judgments on the sign test. We demonstrate both analytically and empirically the relationship between the power of the test, the number of topics evaluated, and the number of judgments available. Using these relationships, we can determine the number of topics and judgments needed for the least-cost but highest-confidence significance evaluation. Specifically, testing pairwise significance over 192 topics with fewer than 5 judgments for each is as good as testing significance over 25 topics with an average of 166 judgments for each - 85% less effort producing no additional errors.