Martin Zinkevich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Zinkevich is active.

Explore More

Publication

Featured researches published by Martin Zinkevich.

neural information processing systems | 2009

Monte Carlo Sampling for Regret Minimization in Extensive Games

Marc Lanctot; Kevin Waugh; Martin Zinkevich; Michael H. Bowling

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFRs bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.

knowledge discovery and data mining | 2011

Unbiased online active learning in data streams

Wei Chu; Martin Zinkevich; Lihong Li; Achint Thomas; Belle L. Tseng

Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we focus on binary classification problems and study selective labeling in data streams where a decision is required on each sample sequentially. We consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. In empirical evaluation, we collect a data stream of user-generated comments on a commercial news portal in 30 consecutive days, and carry out offline evaluation to compare various sampling strategies, including unbiased active learning, biased variants, and random sampling. Experimental results verify the usefulness of online active learning, especially in the non-stationary situation with concept drift.

international joint conference on artificial intelligence | 2011

Accelerating best response calculation in large extensive games

Michael Johanson; Kevin Waugh; Michael H. Bowling; Martin Zinkevich

One fundamental evaluation criteria of an AI technique is its performance in the worst-case. For static strategies in extensive games, this can be computed using a best response computation. Conventionally, this requires a full game tree traversal. For very large games, such as poker, that traversal is infeasible to perform on modern hardware. In this paper, we detail a general technique for best response computations that can often avoid a full game tree traversal. Additionally, our method is specifically well-suited for parallel environments. We apply this approach to computing the worst-case performance of a number of strategies in heads-up limit Texas holdem, which, prior to this work, was not possible. We explore these results thoroughly as they provide insight into the effects of abstraction on worst-case performance in large imperfect information games. This is a topic that has received much attention, but could not previously be examined outside of toy domains.

international world wide web conferences | 2009

Adaptive bidding for display advertising

Arpita Ghosh; Benjamin I. P. Rubinstein; Sergei Vassilvitskii; Martin Zinkevich

Motivated by the emergence of auction-based marketplaces for display ads such as the Right Media Exchange, we study the design of a bidding agent that implements a display advertising campaign by bidding in such a marketplace. The bidding agent must acquire a given number of impressions with a given target spend, when the highest external bid in the marketplace is drawn from an unknown distribution P. The quantity and spend constraints arise from the fact that display ads are usually sold on a CPM basis. We consider both the full information setting, where the winning price in each auction is announced publicly, and the partially observable setting where only the winner obtains information about the distribution; these differ in the penalty incurred by the agent while attempting to learn the distribution. We provide algorithms for both settings, and prove performance guarantees using bounds on uniform closeness from statistics, and techniques from online learning. We experimentally evaluate these algorithms: both algorithms perform very well with respect to both target quantity and spend; further, our algorithm for the partially observable case performs nearly as well as that for the fully observable setting despite the higher penalty incurred during learning.

conference on information and knowledge management | 2011

Learning to target: what works for behavioral targeting

Sandeep Pandey; Mohamed Aly; Abraham Bagherjeiran; Andrew O. Hatch; Peter Ciccolo; Adwait Ratnaparkhi; Martin Zinkevich

Understanding what interests and delights users is critical to effective behavioral targeting, especially in information-poor contexts. As users interact with content and advertising, their passive behavior can reveal their interests towards advertising. Two issues are critical for building effective targeting methods: what metric to optimize for and how to optimize. More specifically, we first attempt to understand what the learning objective should be for behavioral targeting so as to maximize advertisers performance. While most popular advertising methods optimize for user clicks, as we will show, maximizing clicks does not necessarily imply maximizing purchase activities or transactions, called conversions, which directly translate to advertisers revenue. In this work we focus on conversions which makes a more relevant metric but also the more challenging one. Second is the issue of how to represent and combine the plethora of user activities such as search queries, page views, ad clicks to perform the targeting. We investigate several sources of user activities as well as methods for inferring conversion likelihood given the activities. We also explore the role played by the temporal aspect of user activities for targeting, e.g., how recent activities compare to the old ones. Based on a rigorous offline empirical evaluation over 200 individual advertising campaigns, we arrive at what we believe are best practices for behavioral targeting. We deploy our approach over live user traffic to demonstrate its superiority over existing state-of-the-art targeting methods.

Sigecom Exchanges | 2011

The lemonade stand game competition: solving unsolvable games

Martin Zinkevich; Michael H. Bowling; Michael Wunder

In December 2009 and November 2010, the first and second Lemonade Stand game competitions were held. In each competition, 9 teams competed, from University of Southampton, University College London, Yahoo!, Rutgers, Carnegie Mellon, Brown, Princeton, et cetera. The competition, in the spirit of Axelrods iterated prisoners dilemma competition, which addressed whether or not you should cooperate, asks the questions, how should you cooperate, and with whom? The third competition (whose results will be announced at IJCAI 2011) is open for submissions until July 1st, 2011.

international conference on machine learning | 2012

On Local Regret

Michael H. Bowling; Martin Zinkevich

Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online decision-making with such hypothesis classes, we introduce local regret, a generalization of regret that aims to perform nearly as well as only nearby hypotheses. We then present a general algorithm to minimize local regret with arbitrary locality graphs. We also show how the graph structure can be exploited to drastically speed learning. These algorithms are then demonstrated on a diverse set of online problems: online disjunct learning, online Max-SAT, and online decision tree learning.

neural information processing systems | 2010