Adith Swaminathan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adith Swaminathan is active.

Explore More

Publication

Featured researches published by Adith Swaminathan.

web search and data mining | 2017

Unbiased Learning-to-Rank with Biased Feedback

Adith Swaminathan; Tobias Schnabel

Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data. Using this framework, we derive a Propensity-Weighted Ranking SVM for discriminative learning from implicit feedback, where click models take the role of the propensity estimator. In contrast to most conventional approaches to de-biasing the data using click models, this allows training of ranking functions even in settings where queries do not repeat. Beyond the theoretical support, we show empirically that the proposed learning method is highly effective in dealing with biases, that it is robust to noise and propensity model misspecification, and that it scales efficiently. We also demonstrate the real-world applicability of our approach on an operational search engine, where it substantially improves retrieval performance.

conference on information and knowledge management | 2012

Temporal corpus summarization using submodular word coverage

Ruben Sipos; Adith Swaminathan; Pannaga Shivaswamy

In many areas of life, we now have almost complete electronic archives reaching back for well over two decades. This includes, for example, the body of research papers in computer science, all news articles written in the US, and most peoples personal email. However, we have only rather limited methods for analyzing and understanding these collections. While keyword-based retrieval systems allow efficient access to individual documents in archives, we still lack methods for understanding a corpus as a whole. In this paper, we explore methods that provide a temporal summary of such corpora in terms of landmark documents, authors, and topics. In particular, we explicitly model the temporal nature of influence between documents and re-interpret summarization as a coverage problem over words anchored in time. The resulting models provide monotone sub-modular objectives for computing informative and non-redundant summaries over time, which can be efficiently optimized with greedy algorithms. Our empirical study shows the effectiveness of our approach over several baselines.

international world wide web conferences | 2015

Counterfactual Risk Minimization

Adith Swaminathan

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we derive generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. Using the CRM principle, we derive a new learning algorithm -- Policy Optimizer for Exponential Models (POEM) -- for structured output prediction. We evaluate POEM on several multi-label classification problems and verify that its empirical performance supports the theory.

international conference on formal concept analysis | 2014

Mining Videos from the Web for Electronic Textbooks

Rakesh Agrawal; Maria Christoforaki; Sreenivas Gollapudi; Anitha Kannan; Krishnaram Kenthapadi; Adith Swaminathan

We propose a system for mining videos from the web for supplementing the content of electronic textbooks in order to enhance their utility. Textbooks are generally organized into sections such that each section explains very few concepts and every concept is primarily explained in one section. Building upon these principles from the education literature and drawing upon the theory of Formal Concept Analysis, we define the focus of a section in terms of a few indicia, which themselves are combinations of concept phrases uniquely present in the section. We identify videos relevant for a section by ensuring that at least one of the indicia for the section is present in the video and measuring the extent to which the video contains the concept phrases occurring in different indicia for the section. Our user study employing two corpora of textbooks on different subjects from two countries demonstrate that our system is able to find useful videos, relevant to individual sections.

international conference on the theory of information retrieval | 2016

Unbiased Comparative Evaluation of Ranking Functions

Tobias Schnabel; Adith Swaminathan; Peter I. Frazier

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing k systems against a baseline, and ranking k systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they often cut the number of required relevance judgments at least in half.

international acm sigir conference on research and development in information retrieval | 2016

Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement

Adith Swaminathan

Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting users to an online learning algorithm? This tutorial summarizes and unifies the emerging body of methods on counterfactual evaluation and learning. These counterfactual techniques provide a well-founded way to evaluate and optimize online metrics by exploiting logs of past user interactions. In particular, the tutorial unifies the causal inference, information retrieval, and machine learning view of this problem, providing the basis for future research in this emerging area of great potential impact. Supplementary material and resources are available online at http://www.cs.cornell.edu/~adith/CfactSIGIR2016.

knowledge discovery and data mining | 2013

Beyond myopic inference in big data pipelines

Karthik Raman; Adith Swaminathan; Johannes Gehrke

Big Data Pipelines decompose complex analyses of large data sets into a series of simpler tasks, with independently tuned components for each task. This modular setup allows re-use of components across several different pipelines. However, the interaction of independently tuned pipeline components yields poor end-to-end performance as errors introduced by one component cascade through the whole pipeline, affecting overall accuracy. We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner. Our key idea is to view the interaction of components as dependencies on an underlying graphical model. Different message passing schemes on this graphical model provide various inference algorithms to trade-off end-to-end performance and computational cost. We instantiate our framework with an efficient beam search algorithm, and demonstrate its efficiency on two Big Data Pipelines: parsing and relation extraction.

international world wide web conferences | 2015

Unbiased Ranking Evaluation on a Budget

Tobias Schnabel; Adith Swaminathan

We address the problem of assessing the quality of a ranking system (e.g., search engine, recommender system, review ranker) given a fixed budget for collecting expert judgments. In particular, we propose a method that selects which items to judge in order to optimize the accuracy of the quality estimate. Our method is not only efficient, but also provides estimates that are unbiased --- unlike common approaches that tend to underestimate performance or that have a bias against new systems that are evaluated re-using previous relevance scores.

conference on recommender systems | 2018

REVEAL 2018: offline evaluation for recommender systems

Adith Swaminathan; Yves Raimond; Olivier Koch; Flavian Vasile

The inaugural REVEAL workshop1 focuses on revisiting the offline evaluation problem for recommender systems. Being able to perform offline experiments is key to rapid innovation; however practitioners often observe significant differences between offline results and the outcome of an online experiment, where users are actually exposed to the resulting recommendations. This is unfortunate because online experiments take time, can be costly, and require access to a live recommender system, when offline experiments are inherently scalable. How can we bridge that gap between offline and online experiments?

international conference on machine learning | 2015