Maura R. Grossman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maura R. Grossman is active.

Explore More

Publication

Featured researches published by Maura R. Grossman.

international acm sigir conference on research and development in information retrieval | 2015

Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review

Gordon V. Cormack; Maura R. Grossman

Continuous active learning achieves high recall for technology-assisted review, not only for an overall information need, but also for various facets of that information need, whether explicit or implicit. Through simulations using Cormack and Grossmans TAR Evaluation Toolkit (SIGIR 2014), we show that continuous active learning, applied to a multi-faceted topic, efficiently achieves high recall for each facet of the topic. Our results assuage the concern that continuous active learning may achieve high overall recall at the expense of excluding identifiable categories of relevant information.

international acm sigir conference on research and development in information retrieval | 2015

Impact of Surrogate Assessments on High-Recall Retrieval

Adam Roegiest; Gordon V. Cormack; Charles L. A. Clarke; Maura R. Grossman

We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable proxy for authoritative assessments for this task. Nonetheless, concern persists in some application domains---such as electronic discovery---that errors in surrogate training assessments will be amplified by the learning method, materially degrading performance. We demonstrate, through a re-analysis of data used in previous studies, that, with passive supervised-learning methods, using surrogate assessments for training can substantially impair classifier performance, relative to using the same deemed-authoritative assessor for both training and assessment. In particular, using a single surrogate to replace the authoritative assessor for training often yields a ranking that must be traversed much lower to achieve the same level of recall as the ranking that would have resulted had the authoritative assessor been used for training. We also show that steps can be taken to mitigate, and sometimes overcome, the impact of surrogate assessments for training: relevance assessments may be diversified through the use of multiple surrogates; and, a more liberal view of relevance can be adopted by having the surrogate label borderline documents as relevant. By taking these steps, rankings derived from surrogate assessments can match, and sometimes exceed, the performance of the ranking that would have been achieved, had the authority been used for training. Finally, we show that our results still hold when the role of surrogate and authority are interchanged, indicating that the results may simply reflect differing conceptions of relevance between surrogate and authority, as opposed to the authority having special skill or knowledge lacked by the surrogate.

international acm sigir conference on research and development in information retrieval | 2018

Beyond Pooling

Gordon V. Cormack; Maura R. Grossman

Dynamic Sampling is a novel, non-uniform, statistical sampling strategy in which documents are selected for relevance assessment based on the results of prior assessments. Unlike static and dynamic pooling methods that are commonly used to compile relevance assessments for the creation of information retrieval test collections, Dynamic Sampling yields a statistical sample from which substantially unbiased estimates of effectiveness measures may be derived. In contrast to static sampling strategies, which make no use of relevance assessments, Dynamic Sampling is able to select documents from a much larger universe, yielding superior test collections for a given budget of relevance assessments. These assertions are supported by simulation studies using secondary data from the TREC 2017 Common Core Track.

international acm sigir conference on research and development in information retrieval | 2018

A System for Efficient High-Recall Retrieval

Mustafa Abualsaud; Nimesh Ghelani; Haotian Zhang; Mark D. Smucker; Gordon V. Cormack; Maura R. Grossman

The goal of high-recall information retrieval (HRIR) is to find all or nearly all relevant documents for a search topic. In this paper, we present the design of our system that affords efficient high-recall retrieval. HRIR systems commonly rely on iterative relevance feedback. Our system uses a state-of-the-art implementation of continuous active learning (CAL), and is designed to allow other feedback systems to be attached with little work. Our system allows users to judge documents as fast as possible with no perceptible interface lag. We also support the integration of a search engine for users who would like to interactively search and judge documents. In addition to detailing the design of our system, we report on user feedback collected as part of a 50 participants user study. While we have found that users find the most relevant documents when we restrict user interaction, a majority of participants prefer having flexibility in user interaction. Our work has implications on how to build effective assessment systems and what features of the system are believed to be useful by users.

conference on information and knowledge management | 2018

Effective User Interaction for High-Recall Retrieval: Less is More

Haotian Zhang; Mustafa Abualsaud; Nimesh Ghelani; Mark D. Smucker; Gordon V. Cormack; Maura R. Grossman

High-recall retrieval --- finding all or nearly all relevant documents --- is critical to applications such as electronic discovery, systematic review, and the construction of test collections for information retrieval tasks. The effectiveness of current methods for high-recall information retrieval is limited by their reliance on human input, either to generate queries, or to assess the relevance of documents. Past research has shown that humans can assess the relevance of documents faster and with little loss in accuracy by judging shorter document surrogates, e.g.\ extractive summaries, in place of full documents. To test the hypothesis that short document surrogates can reduce assessment time and effort for high-recall retrieval, we conducted a 50-person, controlled, user study. We designed a high-recall retrieval system using continuous active learning (CAL) that could display either full documents or short document excerpts for relevance assessment. In addition, we tested the value of integrating a search engine with CAL. In the experiment, we asked participants to try to find as many relevant documents as possible within one hour. We observed that our study participants were able to find significantly more relevant documents when they used the system with document excerpts as opposed to full documents. We also found that allowing participants to compose and execute their own search queries did not improve their ability to find relevant documents and, by some measures, impaired performance. These results suggest that for high-recall systems to maximize performance, system designers should think carefully about the amount and nature of user interaction incorporated into the system.

international acm sigir conference on research and development in information retrieval | 2014