Shimon Whiteson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shimon Whiteson is active.

Explore More

Publication

Featured researches published by Shimon Whiteson.

Journal of Artificial Intelligence Research | 2013

A survey of multi-objective sequential decision-making

Diederik M. Roijers; Peter Vamplew; Shimon Whiteson; Richard Dazeley

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

adaptive agents and multi-agents systems | 2007

Transfer via inter-task mappings in policy search reinforcement learning

Matthew E. Taylor; Shimon Whiteson; Peter Stone

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods (TVITM-PS) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that TVITM-PS can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that TVITMPS still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

genetic and evolutionary computation conference | 2005

Automatic feature selection in neuroevolution

Shimon Whiteson; Peter Stone; Kenneth O. Stanley; Risto Miikkulainen; Nate Kohl

Feature selection is the process of finding the set of inputs to a machine learning algorithm that will yield the best performance. Developing a way to solve this problem automatically would make current machine learning methods much more useful. Previous efforts to automate feature selection rely on expensive meta-learning or are applicable only when labeled training data is available. This paper presents a novel method called FS-NEAT which extends the NEAT neuroevolution method to automatically determine an appropriate set of inputs for the networks it evolves. By learning the networks inputs, topology, and weights simultaneously, FS-NEAT addresses the feature selection problem without relying on meta-learning or labeled data. Initial experiments in an autonomous car racing simulation demonstrate that FS-NEAT can learn better and faster than regular NEAT. In addition, the networks it evolves are smaller and require fewer inputs. Furthermore, FS-NEATs performance remains robust even as the feature selection task it faces is made increasingly difficult.

conference on information and knowledge management | 2011

A probabilistic method for inferring preferences from clicks

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

Evaluating rankers using implicit feedback, such as clicks on documents in a result list, is an increasingly popular alternative to traditional evaluation methods based on explicit relevance judgments. Previous work has shown that so-called interleaved comparison methods can utilize click data to detect small differences between rankers and can be applied to learn ranking functions online. In this paper, we analyze three existing interleaved comparison methods and find that they are all either biased or insensitive to some differences between rankers. To address these problems, we present a new method based on a probabilistic interleaving process. We derive an unbiased estimator of comparison outcomes and show how marginalizing over possible comparison outcomes given the observed click data can make this estimator even more effective. We validate our approach using a recently developed simulation framework based on a learning to rank dataset and a model of click behavior. Our experiments confirm the results of our analysis and show that our method is both more accurate and more robust to noise than existing methods.

Machine Learning | 2005

Evolving Soccer Keepaway Players Through Task Decomposition

Shimon Whiteson; Nate Kohl; Risto Miikkulainen; Peter Stone

Complex control tasks can often be solved by decomposing them into hierarchies of manageable subtasks. Such decompositions require designers to decide how much human knowledge should be used to help learn the resulting components. On one hand, encoding human knowledge requires manual effort and may incorrectly constrain the learner’s hypothesis space or guide it away from the best solutions. On the other hand, it may make learning easier and enable the learner to tackle more complex tasks. This article examines the impact of this trade-off in tasks of varying difficulty. A space laid out by two dimensions is explored: (1) how much human assistance is given and (2) how difficult the task is. In particular, the neuroevolution learning algorithm is enhanced with three different methods for learning the components that result from a task decomposition. The first method, coevolution, is mostly unassisted by human knowledge. The second method, layered learning, is highly assisted. The third method, concurrent layered learning, is a novel combination of the first two that attempts to exploit human knowledge while retaining some of coevolution’s flexibility. Detailed empirical results are presented comparing and contrasting these three approaches on two versions of a complex task, namely robot soccer keepaway, that differ in difficulty of learning. These results confirm that, given a suitable task decomposition, neuroevolution can master difficult tasks. Furthermore, they demonstrate that the appropriate level of human assistance depends critically on the difficulty of the problem.

Information Retrieval | 2013

Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to produce high quality results. We formulate this challenge as an exploration–exploitation dilemma and propose two methods for addressing it. By adding mechanisms for balancing exploration and exploitation during learning, each method extends a state-of-the-art learning to rank method, one based on listwise learning and the other on pairwise learning. Using a recently developed simulation framework that allows assessment of online performance, we empirically evaluate both methods. Our results show that balancing exploration and exploitation can substantially and significantly improve the online retrieval performance of both listwise and pairwise approaches. In addition, the results demonstrate that such a balance affects the two approaches in different ways, especially when user feedback is noisy, yielding new insights relevant to making online learning to rank effective in practice.

adaptive agents and multi-agents systems | 2003

Concurrent layered learning

Shimon Whiteson; Peter Stone

Hierarchies are powerful tools for decomposing complex control tasks into manageable subtasks. Several hierarchical approaches have been proposed for creating agents that can execute these tasks. Layered learning is such a hierarchical paradigm that relies on learning the various subtasks necessary for achieving the complete high-level goal. Layered learning prescribes training low-level behaviors (those closer to the environmental inputs) prior to high-level behaviors. In past implementations these lower-level behaviors were always frozen before advancing to the next layer. In this paper, we hypothesize that there are situations where layered learning would work better were the lower layers allowed to keep learning concurrently with the training of subsequent layers, an approach we call concurrent layered learning. We identify a situation where concurrent layered learning is beneficial and present detailed empirical results verifying our hypothesis. In particular, we use neuro-evolution to concurrently learn two layers of a layered learning approach to a simulated robotic soccer keepaway task. The main contribution of this paper is evidence that there exist situations where concurrent layered learning outperforms traditional layered learning. Thus, we establish that, when using layered learning, the concurrent training of layers can be an effective option.

Adaptive Behavior | 2007

Empirical Studies in Action Selection with Reinforcement Learning

Shimon Whiteson; Matthew E. Taylor; Peter Stone

To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. In this article we aim to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to the performance of each method. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.

ieee symposium on adaptive dynamic programming and reinforcement learning | 2009

A theoretical and empirical analysis of Expected Sarsa

Harm van Seijen; Hado van Hasselt; Shimon Whiteson; Marco Wiering

This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsas updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods.

european conference on information retrieval | 2011

Balancing Exploration and Exploitation in Learning to Rank Online

Katja Hofmann; Shimon Whiteson; Maarten de Rijke

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank approaches, retrieval systems can learn directly from implicit feedback, while they are running. In such an online setting, algorithms need to both explore new solutions to obtain feedback for effective learning, and exploit what has already been learned to produce results that are acceptable to users. We formulate this challenge as an exploration-exploitation dilemma and present the first online learning to rank algorithm that works with implicit feedback and balances exploration and exploitation. We leverage existing learning to rank data sets and recently developed click models to evaluate the proposed algorithm. Our results show that finding a balance between exploration and exploitation can substantially improve online retrieval performance, bringing us one step closer to making online learning to rank work in practice.

Explore More