Lex Weaver | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lex Weaver is active.

Explore More

Publication

Featured researches published by Lex Weaver.

Journal of Artificial Intelligence Research | 2001

Experiments with infinite-horizon, policy-gradient estimation

Jonathan Baxter; Peter L. Bartlett; Lex Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithms chief advantages are that it uses only one free parameter β ∈ [0, 1], which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of Baxter and Bartlett (2001) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

Machine Learning | 2000

Learning to Play Chess Using Temporal Differences

Jonathan Baxter; Andrew Tridgell; Lex Weaver

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

uncertainty in artificial intelligence | 2001

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

Lex Weaver; Nigel Tao

international conference on machine learning | 2001

A Multi-Agent Policy-Gradient Approach to Network Routing

Nigel Tao; Jonathan Baxter; Lex Weaver

international conference on machine learning | 1998

KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

Jonathan Baxter; Andrew Tridgell; Lex Weaver

ICGA Journal | 1998

Experiments in Parameter Learning Using Temporal Differences

Jonathan Baxter; Andrew Tridgell; Lex Weaver

arXiv: Learning | 1999

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Jonathan Baxter; Andrew Tridgell; Lex Weaver

Archive | 1999

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments

Jonathan Baxter; Lex Weaver; Peter L. Bartlett

international conference on machine learning | 1997

KnightCap: A chess program that learns by combining TD( ) with minimax search

Jonathan Baxter; Andrew Tridgell; Lex Weaver

Archive | 1999

Reinforcement Learning From State and Temporal Differences

Lex Weaver; Jonathan Baxter

Explore More

Collaboration

Dive into the Lex Weaver's collaboration.

Top Co-Authors

Jonathan Baxter

Australian National University

View shared research outputs

Top Co-Authors

Andrew Tridgell

Australian National University

View shared research outputs

Top Co-Authors

Nigel Tao

Australian National University

View shared research outputs

Top Co-Authors

Peter L. Bartlett

University of California

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Lex Weaver is active.

Publication

Featured researches published by Lex Weaver.

Experiments with infinite-horizon, policy-gradient estimation

Learning to Play Chess Using Temporal Differences

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

A Multi-Agent Policy-Gradient Approach to Network Routing

KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

Experiments in Parameter Learning Using Temporal Differences

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments

KnightCap: A chess program that learns by combining TD( ) with minimax search

Reinforcement Learning From State and Temporal Differences

Collaboration

Dive into the Lex Weaver's collaboration.