Katja Hofmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katja Hofmann is active.

Explore More

Publication

Featured researches published by Katja Hofmann.

conference on information and knowledge management | 2011

A probabilistic method for inferring preferences from clicks

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an increasingly popular alternative to traditional evaluation methods based on explicit relevance judgments. Previous work has shown that so-called interleaved comparison methods can utilize click data to detect small differences between rankers and can be applied to learn ranking functions online. In this paper, we analyze three existing interleaved comparison methods and find that they are all either biased or insensitive to some differences between rankers. To address these problems, we present a new method based on a probabilistic interleaving process. We derive an unbiased estimator of comparison outcomes and show how marginalizing over possible comparison outcomes given the observed click data can make this estimator even more effective. We validate our approach using a recently developed simulation framework based on a learning to rank dataset and a model of click behavior. Our experiments confirm the results of our analysis and show that our method is both more accurate and more robust to noise than existing methods.

Information Retrieval | 2013

Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to produce high quality results. We formulate this challenge as an exploration–exploitation dilemma and propose two methods for addressing it. By adding mechanisms for balancing exploration and exploitation during learning, each method extends a state-of-the-art learning to rank method, one based on listwise learning and the other on pairwise learning. Using a recently developed simulation framework that allows assessment of online performance, we empirically evaluate both methods. Our results show that balancing exploration and exploitation can substantially and significantly improve the online retrieval performance of both listwise and pairwise approaches. In addition, the results demonstrate that such a balance affects the two approaches in different ways, especially when user feedback is noisy, yielding new insights relevant to making online learning to rank effective in practice.

european conference on information retrieval | 2011

Balancing Exploration and Exploitation in Learning to Rank Online

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice.

meeting of the association for computational linguistics | 2009

Generating a Non-English Subjectivity Lexicon: Relations That Matter

Valentin Jijkoun; Katja Hofmann

We describe a method for creating a non-English subjectivity lexicon based on an English lexicon, an online translation service and a general purpose thesaurus: Wordnet. We use a PageRank-like algorithm to bootstrap from the translation of the English lexicon and rank the words in the thesaurus by polarity using the network of lexical relations in Wordnet. We apply our method to the Dutch language. The best results are achieved when using synonymy and antonymy relations only, and ranking positive and negative words simultaneously. Our method achieves an accuracy of 0.82 at the top 3,000 negative words, and 0.62 at the top 3,000 positive words.

ACM Transactions on Information Systems | 2013

Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

Ranker evaluation is central to the research into search engines, be it to compare rankers or to provide feedback for learning to rank. Traditional evaluation approaches do not scale well because they require explicit relevance judgments of document-query pairs, which are expensive to obtain. A promising alternative is the use of interleaved comparison methods, which compare rankers using click data obtained when interleaving their rankings. In this article, we propose a framework for analyzing interleaved comparison methods. An interleaved comparison method has fidelity if the expected outcome of ranker comparisons properly corresponds to the true relevance of the ranked documents. It is sound if its estimates of that expected outcome are unbiased and consistent. It is efficient if those estimates are accurate with only little data. We analyze existing interleaved comparison methods and find that, while sound, none meet our criteria for fidelity. We propose a probabilistic interleave method, which is sound and has fidelity. We show empirically that, by marginalizing out variables that are known, it is more efficient than existing interleaved comparison methods. Using importance sampling we derive a sound extension that is able to reuse historical data collected in previous comparisons of other ranker pairs.

international acm sigir conference on research and development in information retrieval | 2014

On user interactions with query auto-completion

Bhaskar Mitra; Milad Shokouhi; Filip Radlinski; Katja Hofmann

Query Auto-Completion (QAC) is a popular feature of web search engines that aims to assist users to formulate queries faster and avoid spelling mistakes by presenting them with possible completions as soon as they start typing. However, despite the wide adoption of auto-completion in search systems, there is little published on how users interact with such services. In this paper, we present the first large-scale study of user interactions with auto-completion based on query logs of Bing, a commercial search engine. Our results confirm that lower-ranked auto-completion suggestions receive substantially lower engagement than those ranked higher. We also observe that users are most likely to engage with auto-completion after typing about half of the query, and in particular at word boundaries. Interestingly, we also noticed that the likelihood of using auto-completion varies with the distance of query characters on the keyboard. Overall, we believe that the results reported in our study provide valuable insights for understanding user engagement with auto-completion, and are likely to inform the design of more effective QAC systems.

knowledge discovery and data mining | 2016

Towards Conversational Recommender Systems

Konstantina Christakopoulou; Filip Radlinski; Katja Hofmann

People often ask others for restaurant recommendations as a way to discover new dining experiences. This makes restaurant recommendation an exciting scenario for recommender systems and has led to substantial research in this area. However, most such systems behave very differently from a human when asked for a recommendation. The goal of this paper is to begin to reduce this gap. In particular, humans can quickly establish preferences when asked to make a recommendation for someone they do not know. We address this cold-start recommendation problem in an online learning setting. We develop a preference elicitation framework to identify which questions to ask a new user to quickly learn their preferences. Taking advantage of latent structure in the recommendation space using a probabilistic latent factor model, our experiments with both synthetic and real world data compare different types of feedback and question selection strategies. We find that our framework can make very effective use of online user feedback, improving personalized recommendations over a static model by 25% after asking only 2 questions. Our results demonstrate dramatic benefits of starting from offline embeddings, and highlight the benefit of bandit-based explore-exploit strategies in this setting.

conference on information and knowledge management | 2012

Estimating interleaved comparison outcomes from historical click data

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

Interleaved comparison methods, which compare rankers using click data, are a promising alternative to traditional information retrieval evaluation methods that require expensive explicit judgments. A major limitation of these methods is that they assume access to live data, meaning that new data must be collected for every pair of rankers compared. We investigate the use of previously collected click data (i.e., historical data) for interleaved comparisons. We start by analyzing to what degree existing interleaved comparison methods can be applied and find that a recent probabilistic method allows such data reuse, even though it is biased when applied to historical data. We then propose an interleaved comparison method that is based on the probabilistic approach but uses importance sampling to compensate for bias. We experimentally confirm that probabilistic methods make the use of historical data for interleaved comparisons possible and effective.

international acm sigir conference on research and development in information retrieval | 2015

Predicting Search Satisfaction Metrics with Interleaved Comparisons

Anne Schuth; Katja Hofmann; Filip Radlinski

The gold standard for online retrieval evaluation is AB testing. Rooted in the idea of a controlled experiment, AB tests compare the performance of an experimental system (treatment) on one sample of the user population, to that of a baseline system (control) on another sample. Given an online evaluation metric that accurately reflects user satisfaction, these tests enjoy high validity. However, due to the high variance across users, these comparisons often have low sensitivity, requiring millions of queries to detect statistically significant differences between systems. Interleaving is an alternative online evaluation approach, where each user is presented with a combination of results from both the control and treatment systems. Compared to AB tests, interleaving has been shown to be substantially more sensitive. However, interleaving methods have so far focused on user clicks only, and lack support for more sophisticated user satisfaction metrics as used in AB testing. In this paper we present the first method for integrating user satisfaction metrics with interleaving. We show how interleaving can be extended to (1) directly match user signals and parameters of AB metrics, and (2) how parameterized interleaving credit functions can be automatically calibrated to predict AB outcomes. We also develop a new method for estimating the relative sensitivity of interleaving and AB metrics, and show that our interleaving credit functions improve agreement with AB metrics without sacrificing sensitivity. Our results, using 38 large-scale online experiments en- compassing over 3 billion clicks in a web search setting, demonstrate up to a 22% improvement in agreement with AB metrics (constituting over a 50% error reduction), while maintaining sensitivity of one to two orders of magnitude above the AB tests. This paves the way towards more sensitive and accurate online evaluation.

multimedia information retrieval | 2008

Assessing concept selection for video retrieval

Bouke Huurnink; Katja Hofmann; Maarten de Rijke

We explore the use of benchmarks to address the problem of assessing concept selection in video retrieval systems. Two benchmarks are presented, one created by human association of queries to concepts, the other generated from an extensively tagged collection. They are compared in terms of reliability, captured semantics, and retrieval performance. Recommendations are given for using the benchmarks to assess concept selection algorithms; the assessment is demonstrated on two existing algorithms. The benchmarks are released to the research community.

Explore More