Kevyn Collins-Thompson

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevyn Collins-Thompson is active.

Explore More

Publication

Featured researches published by Kevyn Collins-Thompson.

conference on information and knowledge management | 2005

Query expansion using random walk models

Kevyn Collins-Thompson; Jamie Callan

It has long been recognized that capturing term relationships is an important aspect of information retrieval. Even with large amounts of data, we usually only have significant evidence for a fraction of all potential term pairs. It is therefore important to consider whether multiple sources of evidence may be combined to predict term relations more accurately. This is particularly important when trying to predict the probability of relevance of a set of terms given a query, which may involve both lexical and semantic relations between the terms.We describe a Markov chain framework that combines multiple sources of knowledge on term associations. The stationary distribution of the model is used to obtain probability estimates that a potential expansion term reflects aspects of the original query. We use this model for query expansion and evaluate the effectiveness of the model by examining the accuracy and robustness of the expansion methods, and investigate the relative effectiveness of various sources of term evidence. Statistically significant differences in accuracy were observed depending on the weighting of evidence in the random walk. For example, using co-occurrence data later in the walk was generally better than using it early, suggesting further improvements in effectiveness may be possible by learning walk behaviors.

web search and data mining | 2013

Pairwise ranking aggregation in a crowdsourced setting

Xi Chen; Paul N. Bennett; Kevyn Collins-Thompson; Eric Horvitz

Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by sets of annotators. We propose a new model to predict a gold-standard ranking that hinges on combining pairwise comparisons via crowdsourcing. In contrast to traditional ranking aggregation methods, the approach learns about and folds into consideration the quality of contributions of each annotator. In addition, we minimize the cost of assessment by introducing a generalization of the traditional active learning scenario to jointly select the annotator and pair to assess while taking into account the annotator quality, the uncertainty over ordering of the pair, and the current model uncertainty. We formalize this as an active learning strategy that incorporates an exploration-exploitation tradeoff and implement it using an efficient online Bayesian updating scheme. Using simulated and real-world data, we demonstrate that the active learning strategy achieves significant reductions in labeling cost while maintaining accuracy.

conference on information and knowledge management | 2011

Personalizing web search results by reading level

Kevyn Collins-Thompson; Paul N. Bennett; Ryen W. White; Sebastian de la Chica; David Sontag

Traditionally, search engines have ignored the reading difficulty of documents and the reading proficiency of users in computing a document ranking. This is one reason why Web search engines do a poor job of serving an important segment of the population: children. While there are many important problems in interface design, content filtering, and results presentation related to addressing childrens search needs, perhaps the most fundamental challenge is simply that of providing relevant results at the right level of reading difficulty. At the opposite end of the proficiency spectrum, it may also be valuable for technical users to find more advanced material or to filter out material at lower levels of difficulty, such as tutorials and introductory texts. We show how reading level can provide a valuable new relevance signal for both general and personalized Web search. We describe models and algorithms to address the three key problems in improving relevance for search using reading difficulty: estimating user proficiency, estimating result difficulty, and re-ranking based on the difference between user and result reading level profiles. We evaluate our methods on a large volume of Web query traffic and provide a large-scale log analysis that highlights the importance of finding results at an appropriate reading level for the user.

Journal of the Association for Information Science and Technology | 2005

Predicting reading difficulty with statistical language models

Kevyn Collins-Thompson; James P. Callan

A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the students reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for very large collections, so we require an automated technique. Traditional readability measures, such as the widely used Flesch-Kincaid measure, are simple to apply but perform poorly on Web pages and other non-traditional documents. This work focuses on building a broadly applicable statistical model of text for different reading levels that works for a wide range of documents. To do this, we recast the well-studied problem of readability in terms of text categorization and use straightforward techniques from statistical language modeling. We show that with a modified form of text categorization, it is possible to build generally applicable classifiers with relatively little training data. We apply this method to the problem of classifying Web pages according to their reading difficulty level and show that by using a mixture model to interpolate evidence of a words frequency across grades, it is possible to build a classifier that achieves an average root mean squared error of between one and two grade levels for 9 of 12 grades. Such classifiers have very efficient implementations and can be applied in many different scenarios. The models can be varied to focus on smaller or larger grade ranges or easily retrained for a variety of tasks or populations.

conference on information and knowledge management | 2009

Reducing the risk of query expansion via robust constrained optimization

Kevyn Collins-Thompson

We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.

workshop on innovative use of nlp for building educational applications | 2008

An Analysis of Statistical Models and Features for Reading Difficulty Prediction

Michael Heilman; Kevyn Collins-Thompson; Maxine Eskenazi

A reading difficulty measure can be described as a function or model that maps a text to a numerical value corresponding to a difficulty or grade level. We describe a measure of readability that uses a combination of lexical features and grammatical features that are derived from subtrees of syntactic parses. We also tested statistical models for nominal, ordinal, and interval scales of measurement. The results indicate that a model for ordinal regression, such as the proportional odds model, using a combination of grammatical and lexical features is most effective at predicting reading difficulty.

international acm sigir conference on research and development in information retrieval | 2007

Estimation and use of uncertainty in pseudo-relevance feedback

Kevyn Collins-Thompson; Jamie Callan

Existing pseudo-relevance feedback methods typically perform averaging over the top-retrieved documents, but ignore an important statistical dimension: the risk or variance associated with either the individual document models, or their combination. Treating the baseline feedback method as a black box, and the output feedback model as a random variable, we estimate a posterior distribution for the feed-back model by resampling a given querys top-retrieved documents, using the posterior mean or mode as the enhanced feedback model. We then perform model combination over several enhanced models, each based on a slightly modified query sampled from the original query. We find that resampling documents helps increase individual feedback model precision by removing noise terms, while sampling from the query improves robustness (worst-case performance) by emphasizing terms related to multiple query aspects. The result is a meta-feedback algorithm that is both more robust and more precise than the original strong baseline method.

web search and data mining | 2012

Probabilistic models for personalizing web search

David Sontag; Kevyn Collins-Thompson; Paul N. Bennett; Ryen W. White; Susan T. Dumais; Bodo Billerbeck

We present a new approach for personalizing Web search results to a specific user. Ranking functions for Web search engines are typically trained by machine learning algorithms using either direct human relevance judgments or indirect judgments obtained from click-through data from millions of users. The rankings are thus optimized to this generic population of users, not to any specific user. We propose a generative model of relevance which can be used to infer the relevance of a document to a specific user for a search query. The user-specific parameters of this generative model constitute a compact user profile. We show how to learn these profiles from a users long-term search history. Our algorithm for computing the personalized ranking is simple and has little computational overhead. We evaluate our personalization approach using historical search data from thousands of users of a major Web search engine. Our findings demonstrate gains in retrieval performance for queries with high ambiguity, with particularly large improvements for acronym queries.

international acm sigir conference on research and development in information retrieval | 2013

Toward whole-session relevance: exploring intrinsic diversity in web search

Karthik Raman; Paul N. Bennett; Kevyn Collins-Thompson

Current research on web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [20] which span multiple queries across one or more search sessions [26,24]. An ideal search engine would not only retrieve relevant results for a users particular query but also be able to identify when the user is engaged in a more complex task and aid the user in completing that task [29,1]. Toward optimizing whole-session or task relevance, we characterize and address the problem of intrinsic diversity (ID) in retrieval [30], a type of complex task that requires multiple interactions with current search engines. Unlike existing work on extrinsic diversity [30] that deals with ambiguity in intent across multiple users, ID queries often have little ambiguity in intent but seek content covering a variety of aspects on a shared theme. In such scenarios, the underlying needs are typically exploratory, comparative, or breadth-oriented in nature. We identify and address three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.

Developmental Neuropsychology | 2010

Lexical Quality in the Brain: ERP Evidence for Robust Word Learning From Context

Gwen A. Frishkoff; Charles A. Perfetti; Kevyn Collins-Thompson

We examined event-related potentials (ERPs) before and after word learning, using training contexts that differed in their level of contextual support for meaning acquisition. Novel words appeared either in contexts that were semantically constraining, providing strong cues to meaning, or in contexts that were weakly constraining, that is, uninformative. After each sentence, participants were shown the word in isolation and were asked to generate a close synonym. Immediately after training, words trained in high-constraint contexts elicited a smaller left temporal negativity (N300FT7) compared with words trained in low-constraint contexts, and both types of trained words elicited a stronger medial frontal negativity (N350Fz) relative to familiar words. Two days after training the N300FT7 disappeared and was replaced by a later, left parietal (P600Pz) effect. To examine robust learning, we administered a semantic priming test two days after training. Familiar words and words trained in high-constraint contexts elicited strong N400 effects. By contrast, words trained in low-constraint contexts elicited a weak N400 effect, and novel (untrained rare) words elicited no semantic priming. These findings suggest that supportive contexts and the use of an active meaning-generation task may lead to robust word learning. The effects of this training can be observed as changes in an early left frontal component, as well as the classical N400 effect. We discuss implications for theories of “partial” semantic knowledge and for robust word learning and instruction.

Explore More