Matthew E. Taylor | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew E. Taylor is active.

Explore More

Publication

Featured researches published by Matthew E. Taylor.

international conference on machine learning | 2007

Cross-domain transfer for reinforcement learning

Matthew E. Taylor; Peter Stone

A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically transfer between pairs of very similar tasks. This work introduces Rule Transfer, a transfer algorithm that first learns rules to summarize a source task policy and then leverages those rules to learn faster in a target task. This paper demonstrates that Rule Transfer can effectively speed up learning in Keepaway, a benchmark RL problem in the robot soccer domain, based on experience from source tasks in the gridworld domain. We empirically show, through the use of three distinct transfer metrics, that Rule Transfer is effective across these domains.

adaptive agents and multi-agents systems | 2007

Transfer via inter-task mappings in policy search reinforcement learning

Matthew E. Taylor; Shimon Whiteson; Peter Stone

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods (TVITM-PS) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITMPS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

adaptive agents and multi-agents systems | 2005

Behavior transfer for value-function-based reinforcement learning

Matthew E. Taylor; Peter Stone

Temporal difference (TD) learning methods [22] have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce behavior transfer, a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.

european conference on machine learning | 2008

Transferring instances for model-based reinforcement learning

Matthew E. Taylor; Nicholas K. Jong; Peter Stone

Reinforcement learningagents typically require a significant amount of data before performing well on complex tasks. Transfer learningmethods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces timbrel , a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that timbrel can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of timbrel s effectiveness.

Adaptive Behavior | 2007

Empirical Studies in Action Selection with Reinforcement Learning

Shimon Whiteson; Matthew E. Taylor; Peter Stone

To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. In this article we aim to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to the performance of each method. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.

robot soccer world cup | 2006

Keepaway soccer: from machine learning testbed to benchmark

Peter Stone; Gregory Kuhlmann; Matthew E. Taylor; Yaxin Liu

Keepaway soccer has been previously put forth as a testbed for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning benchmark, suitable for use by researchers across the field.

Ai Magazine | 2011

An Introduction to Intertask Transfer for Reinforcement Learning

Matthew E. Taylor; Peter Stone

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difﬁcult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.

Autonomous Agents and Multi-Agent Systems | 2010

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Shimon Whiteson; Matthew E. Taylor; Peter Stone

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods’ relative performance: (1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa’s learning updates are not reliable in the absence of the Markov property and (2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.

international symposium on neural networks | 2014

Multi-objectivization of reinforcement learning problems by reward shaping

Tim Brys; Anna Harutyunyan; Peter Vrancx; Matthew E. Taylor; Daniel Kudenko; Ann Nowé

Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivization of reinforcement learning problems. We propose a novel method for the multi-objectivization of Markov Decision problems through the use of multiple reward shaping functions. Reward shaping is a technique to speed up reinforcement learning by including additional heuristic knowledge in the reward signal. The resulting composite reward signal is expected to be more informative during learning, leading the learner to identify good actions more quickly. Good reward shaping functions are by definition correlated with the target value function for the base reward signal, and we show in this paper that adding several correlated signals can help to solve the basic single objective problem faster and better. We prove that the total ordering of solutions, and by consequence the optimality of solutions, is preserved in this process, and empirically demonstrate the usefulness of this approach on two reinforcement learning tasks: a pathfinding problem and the Mario domain.

adaptive and learning agents | 2014

Reinforcement learning agents providing advice in complex video games

Matthew E. Taylor; Nicholas Carboni; Anestis Fachantidis; Ioannis P. Vlahavas; Lisa Torrey

This article introduces a teacher–student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

Explore More