Tobias Schnabel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tobias Schnabel is active.

Explore More

Publication

Featured researches published by Tobias Schnabel.

empirical methods in natural language processing | 2015

Evaluation methods for unsupervised word embeddings

Tobias Schnabel; Igor Labutov; David M. Mimno

We present a comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text. Different evaluations result in different orderings of embedding methods, calling into question the common assumption that there is one single optimal vector representation. We present new evaluation techniques that directly compare embeddings with respect to specific queries. These methods reduce bias, provide greater insight, and allow us to solicit data-driven relevance judgments rapidly and accurately through crowdsourcing.

web search and data mining | 2017

Unbiased Learning-to-Rank with Biased Feedback

Adith Swaminathan; Tobias Schnabel

Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. In contrast to most conventional approaches to de-biasing the data using click models, this allows training of ranking functions even in settings where queries do not repeat. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model misspecification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.

international world wide web conferences | 2016

Using Shortlists to Support Decision Making and Improve Recommender System Performance

Tobias Schnabel; Paul N. Bennett; Susan T. Dumais

In this paper, we study shortlists as an interface component for recommender systems with the dual goal of supporting the users decision process, as well as improving implicit feedback elicitation for increased recommendation quality. A shortlist is a temporary list of candidates that the user is currently considering, e.g., a list of a few movies the user is currently considering for viewing. From a cognitive perspective, shortlists serve as digital short-term memory where users can off-load the items under consideration -- thereby decreasing their cognitive load. From a machine learning perspective, adding items to the shortlist generates a new implicit feedback signal as a by-product of exploration and decision making which can improve recommendation quality. Shortlisting therefore provides additional data for training recommendation systems without the increases in cognitive load that requesting explicit feedback would incur. We perform an user study with a movie recommendation setup to compare interfaces that offer shortlist support with those that do not. From the user studies we conclude: (i) users make better decisions with a shortlist; (ii) users prefer an interface with shortlist support; and (iii) the additional implicit feedback from sessions with a shortlist improves the quality of recommendations by nearly a factor of two.

international conference on the theory of information retrieval | 2016

Unbiased Comparative Evaluation of Ranking Functions

Tobias Schnabel; Adith Swaminathan; Peter I. Frazier

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing k systems against a baseline, and ranking k systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they often cut the number of required relevance judgments at least in half.

empirical methods in natural language processing | 2015

Online Updating of Word Representations for Part-of-Speech Tagging

Wenpeng Yin; Tobias Schnabel; Hinrich Schütze

We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA.

knowledge discovery and data mining | 2017

Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers

Aman Agarwal; Soumya Basu; Tobias Schnabel

Accurately evaluating new policies (e.g. ad-placement models, ranking functions, recommendation functions) is one of the key prerequisites for improving interactive systems. While the conventional approach to evaluation relies on online A/B tests, recent work has shown that counterfactual estimators can provide an inexpensive and fast alternative, since they can be applied offline using log data that was collected from a different policy fielded in the past. In this paper, we address the question of how to estimate the performance of a new target policy when we have log data from multiple historic policies. This question is of great relevance in practice, since policies get updated frequently in most online systems. We show that naively combining data from multiple logging policies can be highly suboptimal. In particular, we find that the standard Inverse Propensity Score (IPS) estimator suffers especially when logging and target policies diverge -- to a point where throwing away data improves the variance of the estimator. We therefore propose two alternative estimators which we characterize theoretically and compare experimentally. We find that the new estimators can provide substantially improved estimation accuracy.

international world wide web conferences | 2015

Unbiased Ranking Evaluation on a Budget

Tobias Schnabel; Adith Swaminathan

We address the problem of assessing the quality of a ranking system (e.g., search engine, recommender system, review ranker) given a fixed budget for collecting expert judgments. In particular, we propose a method that selects which items to judge in order to optimize the accuracy of the quality estimate. Our method is not only efficient, but also provides estimates that are unbiased --- unlike common approaches that tend to underestimate performance or that have a bias against new systems that are evaluated re-using previous relevance scores.

web search and data mining | 2018

Short-Term Satisfaction and Long-Term Coverage: Understanding How Users Tolerate Algorithmic Exploration

Tobias Schnabel; Paul N. Bennett; Susan T. Dumais

Any learning algorithm for recommendation faces a fundamental trade-off between exploiting partial knowledge of a user»s interests to maximize satisfaction in the short term and discovering additional user interests to maximize satisfaction in the long term. To enable discovery, a machine learning algorithm typically elicits feedback on items it is uncertain about, which is termed algorithmic exploration in machine learning. This exploration comes with a cost to the user, since the items an algorithm chooses for exploration frequently turn out to not match the user»s interests. In this paper, we study how users tolerate such exploration and how presentation strategies can mitigate the exploration cost. To this end, we conduct a behavioral study with over 600 people, where we vary how algorithmic exploration is mixed into the set of recommendations. We find that users respond non-linearly to the amount of exploration, where some exploration mixed into the set of recommendations has little effect on short-term satisfaction and behavior. For long-term satisfaction, the overall goal is to learn via exploration about the items presented. We therefore also analyze the quantity and quality of implicit feedback signals such as clicks and hovers, and how they vary with different amounts of mix-in exploration. Our findings provide insights into how to design presentation strategies for algorithmic exploration in interactive recommender systems, mitigating the short-term costs of algorithmic exploration while aiming to elicit informative feedback data for learning.

international conference on machine learning | 2016