Sam Devlin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sam Devlin is active.

Explore More

Publication

Featured researches published by Sam Devlin.

Advances in Complex Systems | 2011

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

Sam Devlin; Daniel Kudenko; Marek Grześ

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.

european conference on modelling foundations and applications | 2015

Type Inference in Flexible Model-Driven Engineering

Athanasios Zolotas; Nicholas Drivalos Matragkas; Sam Devlin; Dimitrios S. Kolovos; Richard F. Paige

In Model-Driven Engineering (MDE), models conform to metamodels. In flexible modelling, engineers construct example models with free-form drawing tools; these examples may later need to conform to a metamodel. Flexible modelling can lead to errors: drawn elements that should represent the same domain concept could instantiate different types; other drawn elements could be left untyped. We propose a novel type inference approach to calculating types from example models, based on the Classification and Regression Trees (CART) algorithm. We describe the approach and evaluate it on a number of randomly generated models, considering the accuracy and precision of the resultant classifications. Experimental results suggest that on average 80% of element types are correctly identified. In addition, the results reveal a correlation between the accuracy and the ratio of known-to-unknown types in a model.

working conference on virtual enterprises | 2014

A Phylogenetic Classification of the Video-game Industry’s Business Model Ecosystem

Nikolaos Goumagias; Ignazio Cabras; Kiran Jude Fernandes; Feng Li; Alberto Nucciarelli; Peter I. Cowling; Sam Devlin; Daniel Kudenko

Since 1990, Business Models emerged as a new unit of interest among both academics and practitioners. An emerging theme in the growing academic literature is focused on developing a system that employs business models as a focal point of enterprise classification. In this paper we attempt a historical analysis of the video game industry business model evolution and examine the process through the prism of two-sided market economics. Based on the biological school of phylogenetic classification, we develop a cladogram that captures the evolution process and classifies the industry’s business models. The classification system is regarded as a first attempt to provide an exploratory and descriptive research of the video game industry, before attempting an explanatory and predictive analysis, and introduces a system that is not governed by the industry’s specific characteristics and can be universally applied, providing a map for researchers and practitioners to test organisational differences and contribute further to the business model knowledge.

Neurocomputing | 2017

Policy invariance under reward transformations for multi-objective reinforcement learning

Patrick Mannion; Sam Devlin; Karl Mason; Jim Duggan; Enda Howley

Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent’s exploration. Potential-Based Reward Shaping (PBRS) is a specific form of reward shaping that offers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems. Specifically, we provide theoretical proof that PBRS does not alter the true Pareto front in both single- and multi-agent MORL. We also contribute the first published empirical studies of the effect of PBRS in single- and multi-agent MORL problems.

Knowledge Engineering Review | 2016

Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin; Daniel Kudenko

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

IEEE Transactions on Computational Intelligence and Ai in Games | 2015

Player Preference and Style in a Leading Mobile Card Game

Peter I. Cowling; Sam Devlin; Edward Jack Powley; Daniel Whitehouse; Jeff Rollason

Tuning game difficulty prior to release requires careful consideration. Players can quickly lose interest in a game if it is too hard or too easy. Assessing how players will cope prior to release is often inaccurate. However, modern games can now collect sufficient data to perform large scale analysis post deployment and update the product based on these insights. AI Factory Spades is currently the top rated Spades game in the Google Play store. In collaboration with the developers, we have collected gameplay data from 27 592 games and statistics regarding wins/losses for 99 866 games using Google Analytics. Using the data collected, this study analyses the difficulty and behavior of an Information Set Monte Carlo Tree Search player we developed and deployed in the game previously. The methods of data collection and analysis presented in this study are generally applicable. The same workflow could be used to analyze the difficulty and typical player or opponent behavior in any game. Furthermore, addressing issues of difficulty or nonhuman-like opponents postdeployment can positively affect player retention.

Autonomous Agents and Multi-Agent Systems | 2016

Potential-based reward shaping for finite horizon online POMDP planning

Adam Eck; Leen Kiat Soh; Sam Devlin; Daniel Kudenko

In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth planning and build higher quality plans. Specifically, we extend potential-based reward shaping (PBRS) from RL to online POMDP planning. In our extension, information about belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards beyond its planning horizon, and thus achieve greater cumulative rewards. We develop novel potential functions measuring information useful to agent metareasoning in POMDPs (reflecting on agent knowledge and/or histories of experience with the environment), theoretically prove several important properties and benefits of using PBRS for online POMDP planning, and empirically demonstrate these results in a range of classic benchmark POMDP planning problems.

adaptive and learning agents | 2015

Distributed reinforcement learning for adaptive and robust network intrusion response

Kleanthis Malialis; Sam Devlin; Daniel Kudenko

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

web intelligence | 2009

Reinforcement Learning in RoboCup KeepAway with Partial Observability

Sam Devlin; Marek Grzes; Daniel Kudenko

Partially observable environments pose a major challenge to the application of reinforcement learning algorithms. In such environments, due to the Markov property frequently being violated in the system state representation, situations can occur where an agent has insufficient information to decide on the optimal action. In such cases, it is necessary to determine when information gathering actions should be executed, that is, when the agent needs to reduce uncertainty about the current state before deciding on how to act. One possible solution that has been proposed in past research is to manually code rules for execution of information gathering actions in the policy using heuristic (and likely faulty) knowledge. However, such a solution requires explicit expert knowledge about actions which are information gathering. In this paper a flexible solution is proposed which automatically learns when to execute information gathering actions and furthermore to automatically discover which actions gather information. We present an evaluation in the Robo{C}up Keep{A}way domain that empirically shows the robustness of the proposed approach and its success in learning under varying degrees of partial observability. Hence, it eliminates the need for hand-coded rules, is flexible in different situations and does not require knowledge about information gathering actions.

Knowledge Engineering Review | 2016

Overcoming incorrect knowledge in plan-based reward shaping

Kyriakos Efthymiadis; Sam Devlin; Daniel Kudenko

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. Plan-based reward shaping is a successful approach in which a STRIPS plan is used in order to guide the agent to the optimal behaviour. However, if the provided knowledge is wrong, it has been shown the agent will take longer to learn the optimal policy. Previously, in some cases, it was better to ignore all prior knowledge despite it only being partially incorrect. This paper introduces a novel use of knowledge revision to overcome incorrect domain knowledge when provided to an agent receiving plan-based reward shaping. Empirical results show that an agent using this method can outperform the previous agent receiving plan-based reward shaping without knowledge revision.

Explore More