Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Edmund J. Collins is active.

Publication


Featured researches published by Edmund J. Collins.


Siam Journal on Control and Optimization | 2005

Individual Q -Learning in Normal Form Games

David S. Leslie; Edmund J. Collins

The single-agent multi-armed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multi-agent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each agent depend on the strategies of the others. We consider the behavior of value-based learning agents in this situation, and show that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached. We introduce a particular value-based learning algorithm, which we call individual Q-learning, and use stochastic approximation to study the asymptotic behavior, showing that strategies will converge to Nash distribution almost surely in 2-player zero-sum games and 2-player partnership games. Player-dependent learning rates are then considered, and it is shown that this extension converges in some games for which many algorithms, including the basic algorithm initially considered, fail to converge.


Siam Review | 2001

Optimality Models in Behavioral Biology

John M. McNamara; Alasdair I. Houston; Edmund J. Collins

The action of natural selection results in organisms that are good at surviving and reproducing. We show how this intuitive idea can be given a formal definition in terms of fitness and reproductive value. An optimal strategy maximizes fitness, and reproductive value provides a common currency for comparing different actions. We provide a broad review of models and methods that have been used in this area, stressing the conceptual issues and exposing the logic of evolutionary explanations.


Games and Economic Behavior | 2006

Generalised weakened fictitious play

David S. Leslie; Edmund J. Collins

A general class of adaptive processes in games is developed, which significantly generalises weakened fictitious play [Van der Genugten, B., 2000. A weakened form of fictitious play in two-person zero-sum games. Int. Game Theory Rev. 2, 307–328] and includes several interesting fictitious-play-like processes as special cases. The general model is rigorously analysed using the best response differential inclusion, and shown to converge in games with the fictitious play property. Furthermore, a new actor–critic process is introduced, in which the only information given to a player is the reward received as a result of selecting an action—a player need not even know they are playing a game. It is shown that this results in a generalised weakened fictitious play process, and can therefore be considered as a first step towards explaining how players might learn to play Nash equilibrium strategies without having any knowledge of the game, or even that they are playing a game.


Proceedings of the Royal Society of London B: Biological Sciences | 2005

The hidden cost of information in collective foraging

François Xavier Dechaume-Moncharmont; Anna Dornhaus; Alasdair I. Houston; John M. McNamara; Edmund J. Collins; Nigel R. Franks

Many animals nest or roost colonially. At the start of a potential foraging period, they may set out independently or await information from returning foragers. When should such individuals act independently and when should they wait for information? In a social insect colony, for example, information transfer may greatly increase a recruits probability of finding food, and it is commonly assumed that this will always increase the colonys net energy gain. We test this assumption with a mathematical model. Energy gain by a colony is a function both of the probability of finding food sources and of the duration of their availability. A key factor is the ratio of pro-active foragers to re-active foragers. When leaving the nest, pro-active foragers search for food independently, whereas re-active foragers rely on information from successful foragers to find food. Under certain conditions, the optimum strategy is totally independent (pro-active) foraging because potentially valuable information that re-active foragers may gain from successful foragers is not worth waiting for. This counter-intuitive outcome is remarkably robust over a wide range of parameters. It occurs because food sources are only available for a limited period. Our study emphasizes the importance of time constraints and the analysis of dynamics, not just steady states, to understand social insect foraging.


Proceedings of the Royal Society of London B: Biological Sciences | 1995

Dynamic Optimization in Fluctuating Environments

John M. McNamara; James N. Webb; Edmund J. Collins

We consider the problem of finding an optimal strategy for an organism in a large population when the environment fluctuates from year to year and cannot be predicted beforehand. In fluctuating environments, geometric mean fitness is the appropriate measure and individual optimization fails. Consequently, optimal strategies cannot be found by stochastic dynamic programming alone. We consider a simplified model in which each year is divided into two non-overlapping time intervals. In the first interval, environmental conditions are the same each year; in the second, they fluctuate from year to year. During the first interval, which ends at time of year T, population members do not reproduce. The state and time dependent strategy employed during the interval determines the probability of survival till T and the probability distribution of possible states at T given survival. In the interval following T, population members reproduce. The state of an individual at T and the ensuing environmental conditions determine the number of surviving descendants left by the individual next year. In this paper, we give a general characterization of optimal dynamic strategies over the first time interval. We show that an optimal strategy is the equilibrium solution of a (non-fluctuating environment) dynamic game. As a consequence, the behaviour of an optimal individual over the first time interval maximizes the expected value of a reward R* obtained at the end of the interval. However,R* cannot be specified in advance and can only be found once an optimal strategy has been determined. We illustrate this procedure with an example based on the foraging decisions of a parasitoid.


Behavioral Ecology and Sociobiology | 2006

Paying for information: partial loads in central place foragers

Anna Dornhaus; Edmund J. Collins; F-X Dechaume-Moncharmont; Alasdair I. Houston; Nigel R. Franks; John M. McNamara

Information about food sources can be crucial to the success of a foraging animal. We predict that this will influence foraging decisions by group-living foragers, which may sacrifice short-term foraging efficiency to collect information more frequently. This result emerges from a model of a central-place forager that can potentially receive information on newly available superior food sources at the central place. Such foragers are expected to return early from food sources, even with just partial loads, if information about the presence of sufficiently valuable food sources is likely to become available. Returning with an incomplete load implies that the forager is at that point not achieving the maximum possible food delivery rate. However, such partial loading can be more than compensated for by an earlier exploitation of a superior food source. Our model does not assume cooperative foraging and could thus be used to investigate this effect for any social central-place forager. We illustrate the approach using numerical calculations for honeybees and leafcutter ants, which do forage cooperatively. For these examples, however, our results indicate that reducing load confers minimal benefits in terms of receiving information. Moreover, the hypothesis that foragers reduce load to give information more quickly (rather than to receive it) fits empirical data from social insects better. Thus, we can conclude that in these two cases of social-insect foraging, efficient distribution of information by successful foragers may be more important than efficient collection of information by unsuccessful ones.


Journal of Theoretical Biology | 2008

Should females prefer to mate with low-quality males?

Codina Cotar; John M. McNamara; Edmund J. Collins; Alasdair I. Houston

We analyse a model of mate choice when males differ in reproductive quality and provide care for their offspring. Females choose males on the basis of the success they will obtain from breeding with them and a male chooses his care time on the basis of his quality so as to maximise his long-term rate of reproductive success. We use this model to establish whether high-quality males should devote a longer period of care to their broods than low-quality males and whether females obtain greater reproductive success from mating with higher quality males. We give sufficient conditions for optimal care times to decrease with increasing male quality. When care times decrease, this does not necessarily mean that high-quality males are less valuable to the female because quality may more than compensate for the lack of care. We give a necessary and sufficient condition for high-quality males to be less valuable mates, and hence for females to prefer low-quality males. Females can prefer low-quality males if offspring produced and cared for by high-quality males do well even if care is short, and do not significantly benefit from additional care, while offspring produced and cared for by low-quality males do well only if they receive a long period of care.


Neural Computation | 2010

Posterior weighted reinforcement learning with state uncertainty

Tobias Larsen; David S. Leslie; Edmund J. Collins; Rafal Bogacz

Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.


Journal of Theoretical Biology | 1997

A General Technique for Computing Evolutionarily Stable Strategies Based on Errors in Decision-making

John M. McNamara; James N. Webb; Edmund J. Collins; Tamás Székely; Alasdair I. Houston


Annals of Applied Probability | 2003

Convergent multiple-timescales reinforcement learning algorithms in normal form games

David S. Leslie; Edmund J. Collins

Collaboration


Dive into the Edmund J. Collins's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Codina Cotar

University College London

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge