Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lex Weaver is active.

Publication


Featured researches published by Lex Weaver.


Journal of Artificial Intelligence Research | 2001

Experiments with infinite-horizon, policy-gradient estimation

Jonathan Baxter; Peter L. Bartlett; Lex Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithms chief advantages are that it uses only one free parameter β ∈ [0, 1], which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of Baxter and Bartlett (2001) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.


Machine Learning | 2000

Learning to Play Chess Using Temporal Differences

Jonathan Baxter; Andrew Tridgell; Lex Weaver

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.


uncertainty in artificial intelligence | 2001

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

Lex Weaver; Nigel Tao


international conference on machine learning | 2001

A Multi-Agent Policy-Gradient Approach to Network Routing

Nigel Tao; Jonathan Baxter; Lex Weaver


international conference on machine learning | 1998

KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search

Jonathan Baxter; Andrew Tridgell; Lex Weaver


ICGA Journal | 1998

Experiments in Parameter Learning Using Temporal Differences

Jonathan Baxter; Andrew Tridgell; Lex Weaver


arXiv: Learning | 1999

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Jonathan Baxter; Andrew Tridgell; Lex Weaver


Archive | 1999

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments

Jonathan Baxter; Lex Weaver; Peter L. Bartlett


international conference on machine learning | 1997

KnightCap: A chess program that learns by combining TD( ) with minimax search

Jonathan Baxter; Andrew Tridgell; Lex Weaver


Archive | 1999

Reinforcement Learning From State and Temporal Differences

Lex Weaver; Jonathan Baxter

Collaboration


Dive into the Lex Weaver's collaboration.

Top Co-Authors

Avatar

Jonathan Baxter

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Andrew Tridgell

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Nigel Tao

Australian National University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge