John Langford | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Langford is active.

Explore More

Publication

Featured researches published by John Langford.

international world wide web conferences | 2010

A contextual-bandit approach to personalized news article recommendation

Lihong Li; Wei Chu; John Langford; Robert E. Schapire

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

Machine Learning | 2009

Search-based structured prediction

Harold C. Daumé; John Langford; Daniel Marcu

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.

international conference on machine learning | 2009

Importance weighted active learning

Alina Beygelzimer; Sanjoy Dasgupta; John Langford

We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process.

international conference on machine learning | 2006

PAC model-free reinforcement learning

Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learnings per-experience computation cost is much less than that of previous PAC algorithms.

algorithmic learning theory | 2009

Error-correcting tournaments

Alina Beygelzimer; John Langford; Pradeep Ravikumar

We present a family of pairwise tournaments reducing k-class classification to binary classification. These reductions are provably robust against a constant fraction of binary errors, and match the best possible computation and regret up to a constant.

Algorithmica | 2010

Maintaining Equilibria During Exploration in Sponsored Search Auctions

John Langford; Lihong Li; Yevgeniy Vorobeychik; Jennifer Wortman

We introduce an exploration scheme aimed at learning advertiser click-through rates in sponsored search auctions with minimal effect on advertiser incentives. The scheme preserves both the current ranking and pricing policies of the search engine and only introduces one set of parameters which control the rate of exploration. These parameters can be set so as to allow enough exploration to learn advertiser click-through rates over time, but also eliminate incentives for advertisers to alter their currently submitted bids. When advertisers have much more information than the search engine, we show that although this goal is not achievable, incentives to deviate can be made arbitrarily small by appropriately setting the exploration rate. Given that advertisers do not alter their bids, we bound revenue loss due to exploration.

electronic commerce | 2008

Self-financed wagering mechanisms for forecasting

Nicolas S. Lambert; John Langford; Jennifer Wortman; Yiling Chen; Daniel M. Reeves; Yoav Shoham; David M. Penno k

We examine a class of wagering mechanisms designed to elicit truthful predictions from a group of people without requiring any outside subsidy. We propose a number of desirable properties for wagering mechanisms, identifying one mechanism - weighted-score wagering - that satisfies all of the properties. Moreover, we show that a single-parameter generalization of weighted-score wagering is the only mechanism that satisfies these properties. We explore some variants of the core mechanism based on practical considerations.

Statistical Science | 2014

Doubly Robust Policy Evaluation and Optimization

Miroslav Dudík; Dumitru Erhan; John Langford; Lihong Li

We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strengths and overcome the weaknesses of the two approaches by applying the doubly robust estimation technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust estimation uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.

workshop on internet and network economics | 2007

Maintaining equilibria during exploration in sponsored search auctions

Jennifer Wortman; Yevgeniy Vorobeychik; Lihong Li; John Langford

We introduce an exploration scheme aimed at learning advertiser click-through rates in sponsored search auctions with minimal effect on advertiser incentives. The scheme preserves both the current ranking and pricing policies of the search engine and only introduces one parameter which controls the rate of exploration. This parameter can be set so as to allow enough exploration to learn advertiser click-through rates over time, but also eliminate incentives for advertisers to alter their currently submitted bids. When advertisers have much more information than the search engine, we show that although this goal is not achievable, incentives to deviate can be made arbitrarily small by appropriately setting the exploration rate. Given that advertisers do not alter their bids, we bound revenue loss due to exploration.

Journal of Economic Theory | 2015

An axiomatic characterization of wagering mechanisms

Nicolas S. Lambert; John Langford; Jennifer Wortman Vaughan; Yiling Chen; Daniel M. Reeves; Yoav Shoham; David M. Pennock

We construct a budget-balanced wagering mechanism that flexibly extracts information about event probabilities, as well as the mean, median, and other statistics from a group of individuals whose beliefs are immutable to the actions of others. We show how our mechanism, called the Brier betting mechanism, arises naturally from a modified parimutuel betting market. We prove that it is essentially the unique wagering mechanism that is anonymous, proportional, sybilproof, and homogeneous. While the Brier betting mechanism is designed for individuals with immutable beliefs, we find that it continues to perform well even for Bayesian individuals who learn from the actions of others.

Explore More