Ronald J. Williams | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ronald J. Williams is active.

Explore More

Publication

Featured researches published by Ronald J. Williams.

Neural Computation | 1989

A learning algorithm for continually running fully recurrent neural networks

Ronald J. Williams; David Zipser

The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks. These algorithms have (1) the advantage that they do not require a precisely defined training interval, operating while the network runs; and (2) the disadvantage that they require nonlocal communication in the network being trained and are computationally expensive. These algorithms allow networks having recurrent connections to learn complex tasks that require the retention of information over time periods having either fixed or indefinite length.

Machine Learning | 1992

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

Neural Computation | 1990

An efficient gradient-based algorithm for on-line training of recurrent network trajectories

Ronald J. Williams; Jing Peng

A novel variant of the familiar backpropagation-through-time approach to training recurrent networks is described. This algorithm is intended to be used on arbitrary recurrent networks that run continually without ever being reset to an initial state, and it is specifically designed for computationally efficient computer implementation. This algorithm can be viewed as a cross between epochwise backpropagation through time, which is not appropriate for continually running networks, and the widely used on-line gradient approximation technique of truncated backpropagation through time.

Connection Science | 1989

Experimental Analysis of the Real-time Recurrent Learning Algorithm

Ronald J. Williams; David Zipser

Abstract The real-time recurrent learning algorithm is a gradient-following learning algorithm for completely recurrent networks running in continually sampled time. Here we use a series of simulation experiments to investigate the power and properties of this algorithm. In the recurrent networks studied here, any unit can be connected to any other, and any unit can receive external input. These networks run continually in the sense that they sample their inputs on every update cycle, and any unit can have a training target on any cycle. The storage required and computation time on each step are independent of time and are completely determined by the size of the network, so no prior knowledge of the temporal structure of the task being learned is required. The algorithm is nonlocal in the sense that each unit must have knowledge of the complete recurrent weight matrix and error vector. The algorithm is computationally intensive in sequential computers, requiring a storage capacity of the order of the thi...

Machine Learning | 1996

Incremental multi-step Q-learning

Jing Peng; Ronald J. Williams

This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

IEEE Control Systems Magazine | 1992

Reinforcement learning is direct adaptive optimal control

Richard S. Sutton; Andrew G. Barto; Ronald J. Williams

Control problems can be divided into two classes: 1) regulation and tracking problems, in which the objective is to follow a reference trajectory, and 2) optimal control problems, in which the objective is to extremize a functional of the controlled systems behavior that is not necessarily defined in terms of a reference trajectory. Adaptive methods for problems of the first kind are well known, and include self-tuning regulators and model-reference methods, whereas adaptive methods for optimal-control problems have received relatively little attention. Moreover, the adaptive optimal-control methods that have been studied are almost all indirect methods, in which controls are recomputed from an estimated system model at each step. This computation is inherently complex, making adaptive methods in which the optimal controls are estimated directly more attractive. Here we present reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems.

simulation of adaptive behavior | 1993

Efficient learning and planning within the Dyna framework

Jing Peng; Ronald J. Williams

Suttons Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks.

international symposium on neural networks | 1992

Training recurrent networks using the extended Kalman filter

Ronald J. Williams

The author describes some relationships between the extended Kalman filter (EKF) as applied to recurrent net learning and some simpler techniques that are more widely used. In particular, making certain simplifications to the EKF gives rise to an algorithm essentially identical to the real-time recurrent learning (RTRL) algorithm. Since the EKF involves adjusting unit activity in the network, it also provides a principled generalization of the teacher forcing technique. Preliminary simulation experiments on simple finite-state Boolean tasks indicated that the EKF can provide substantial speed-up in number of time steps required for training on such problems when compared with simpler online gradient algorithms. The computational requirements of the EKF are steep, but scale with network size at the same rate as RTRL.<<ETX>>

Connection Science | 1991

Function optimization using connectionist reinforcement learning algorithms

Ronald J. Williams; Jing Peng

Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackleys studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackleys studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.

PLOS Computational Biology | 2009

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Wenxu Tong; Ying Wei; Leonel F. Murga; Mary Jo Ondrechen; Ronald J. Williams

A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.

Explore More