Lex Weaver
Australian National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lex Weaver.
Journal of Artificial Intelligence Research | 2001
Jonathan Baxter; Peter L. Bartlett; Lex Weaver
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithms chief advantages are that it uses only one free parameter β ∈ [0, 1], which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of Baxter and Bartlett (2001) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.
Machine Learning | 2000
Jonathan Baxter; Andrew Tridgell; Lex Weaver
In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.
uncertainty in artificial intelligence | 2001
Lex Weaver; Nigel Tao
international conference on machine learning | 2001
Nigel Tao; Jonathan Baxter; Lex Weaver
international conference on machine learning | 1998
Jonathan Baxter; Andrew Tridgell; Lex Weaver
ICGA Journal | 1998
Jonathan Baxter; Andrew Tridgell; Lex Weaver
arXiv: Learning | 1999
Jonathan Baxter; Andrew Tridgell; Lex Weaver
Archive | 1999
Jonathan Baxter; Lex Weaver; Peter L. Bartlett
international conference on machine learning | 1997
Jonathan Baxter; Andrew Tridgell; Lex Weaver
Archive | 1999
Lex Weaver; Jonathan Baxter