Eyal Even-Dar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eyal Even-Dar is active.

Explore More

Publication

Featured researches published by Eyal Even-Dar.

european conference on computational learning theory | 2004

Learning Rates for Q-learning

Eyal Even-Dar; Yishay Mansour

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω∈(1/2,1), we show that the convergence rate is polynomial in 1/(1-γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-γ). In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates.

symposium on discrete algorithms | 2006

On nash equilibria for a network creation game

Susanne Albers; Stefan Eilts; Eyal Even-Dar; Yishay Mansour; Liam Roditty

We study a network creation game recently proposed by Fabrikant, Luthra, Maneva, Papadimitriou and Shenker. In this game, each player (vertex) can create links (edges) to other players at a cost of α per edge. The goal of every player is to minimize the sum consisting of (a) the cost of the links he has created and (b) the sum of the distances to all other players.Fabrikant et al. conjectured that there exists a constant A such that, for any α > A, all non-transient Nash equilibria graphs are trees. They showed that if a Nash equilibrium is a tree, the price of anarchy is constant. In this paper we disprove the tree conjecture. More precisely, we show that for any positive integer n 0 , there exists a graph built by n ≥ n 0 players which contains cycles and forms a non-transient Nash equilibrium, for any α with 1 < α ≤ √n/2. Our construction makes use of some interesting results on finite affine planes. On the other hand we show that, for α ≥ 12n[log n], every Nash equilibrium forms a tree.Without relying on the tree conjecture, Fabrikant et al. proved an upper bound on the price of anarchy of O(√α), where α ∈ [2, n2]. We improve this bound. Specifically, we derive a constant upper bound for α ∈ O(√n) and for α ≥ 12n[log n]. For the intermediate values we derive an improved bound of O(1 + (min{α2/n, n2/α})1/3).Additionally, we develop characterizations of Nash equilibria and extend our results to a weighted network creation game as well as to scenarios with cost sharing.

international colloquium on automata languages and programming | 2003

Convergence time to Nash equilibria

Eyal Even-Dar; Alexander Kesselman; Yishay Mansour

We study the number of steps required to reach a pure Nash Equilibrium in a load balancing scenario where each job behaves selfishly and attempts to migrate to a machine which will minimize its cost. We consider a variety of load balancing models, including identical, restricted, related and unrelated machines. Our results have a crucial dependence on the weights assigned to jobs. We consider arbitrary weights, integer weights, K distinct weights and identical (unit) weights. We look both at an arbitrary schedule (where the only restriction is that a job migrates to a machine which lowers its cost) and specific efficient schedulers (such as allowing the largest weight job to move first).

conference on learning theory | 2002

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

Eyal Even-Dar; Shie Mannor; Yishay Mansour

The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. This is in contrast to the naive bound of O(n/?2 log n/?). We derive another algorithm whose complexity depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. This is done essentially by simulatingV alue Iteration, and in each iteration invokingt he multi-armed bandit algorithm. Using our PAC algorithm for the multi-armed bandit problem we improve the dependence on the number of actions.

Mathematics of Operations Research | 2009

Online Markov Decision Processes

Eyal Even-Dar; Sham M. Kakade; Yishay Mansour

We consider a Markov decision process (MDP) setting in which the reward function is allowed to change after each time step (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well an agent can do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions.

Machine Learning | 2008

Regret to the best vs. regret to the average

Eyal Even-Dar; Michael J. Kearns; Yishay Mansour; Jennifer Wortman

AbstractWe study online regret minimization algorithms in an experts setting. In this setting, the algorithm chooses a distribution over experts at each time step and receives a gain that is a weighted average of the experts’ instantaneous gains. We consider a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the average of all experts, the regret to any given fixed mixture of experts, or the regret to the worst expert. This study leads both to new understanding of the limitations of existing no-regret algorithms, and to new algorithms with novel performance guarantees. More specifically, we show that any algorithm that achieves only

algorithmic learning theory | 2006

Risk-Sensitive online learning

Eyal Even-Dar; Michael J. Kearns; Jennifer Wortman

O(\sqrt{T})

electronic commerce | 2014

On Nash Equilibria for a Network Creation Game

Susanne Albers; Stefan Eilts; Eyal Even-Dar; Yishay Mansour; Liam Roditty

cumulative regret to the best expert on a sequence of T trials must, in the worst case, suffer regret

conference on learning theory | 2003

Approximate Equivalence of Markov Decision Processes

Eyal Even-Dar; Yishay Mansour

\varOmega(\sqrt{T})

Sigact News | 2008

Theory research at Google

Gagan Aggarwal; Nir Ailon; Florin Constantin; Eyal Even-Dar; Jon Feldman; Gereon Frahling; Monika Henzinger; S. Muthukrishnan; Noam Nisan; Martin Pál; Mark Sandler; Anastasios Sidiropoulos

to the average, and that for a wide class of update rules that includes many existing no-regret algorithms (such as Exponential Weights and Follow the Perturbed Leader), the product of the regret to the best and the regret to the average is, in the worst case, Ω(T). We then describe and analyze two alternate new algorithms that both achieve cumulative regret only

Explore More