Is this you? Create Your Porfile

Peter Vamplew

Federation University Australia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Vamplew is active.

Explore More

Publication

Featured researches published by Peter Vamplew.

Neurocomputing | 2017

Special issue on multi-objective reinforcement learning

Mm Madalina Drugan; Marco Wiering; Peter Vamplew; Madhu Chetty

Many real-life problems involve dealing with multiple objectives. For example, in network routing the criteria may consist of energy consumption, latency, and channel capacity, which are in essence conflicting objectives. As in many problems there may be multiple (conflicting) objectives, there usually does not exist a single optimal solution. In those cases, it is desirable to obtain a set of trade-off solutions between the objectives. This problem has in the last decade also gained the attention of many researchers in the field of reinforcement learning (RL). RL addresses sequential decision problems in initially (possibly) unknown stochastic environments. The goal is the maximization of the agents reward in an environment that is not always completely observable. The purpose of this special issue is to obtain a broader picture on the algorithmic techniques at the confluence between multi-objective optimization and reinforcement learning. The growing interest in multi-objective reinforcement learning (MORL) was reflected in the quantity and quality of submissions received for this special issue. After a rigorous review process, seven papers were accepted for publication, and they reflect the diversity of research being carried out within this emerging field of research. The accepted papers consider many different aspects of algorithmic design and the evaluation and this editorial puts them in a unified framework.

Neurocomputing | 2017

Softmax exploration strategies for multiobjective reinforcement learning

Peter Vamplew; Richard Dazeley; Cameron Foale

Abstract Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.

Neurocomputing | 2017

Steering approaches to Pareto-optimal multiobjective reinforcement learning

Peter Vamplew; Rustam Issabekov; Richard Dazeley; Cameron Foale; Adam Berry; Tim Moore; Douglas C. Creighton

Abstract For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.

computer games | 2014

Griefers versus the Griefed — what motivates them to play Massively Multiplayer Online Role-Playing Games?

Leigh Achternbosch; Charlynn Miller; Christopher Turville; Peter Vamplew

Abstract‘Griefing’ is a term used to describe when a player within a multiplayer online environment intentionally disrupts another player’s game experience for his or her own personal enjoyment or gain. Every day a certain percentage of users of Massively Multiplayer Online Role-Playing Games (MMORPG) are experiencing some form of griefing. There have been studies conducted in the past that attempted to ascertain the factors that motivate users to play MMORPGs. A limited number of studies specifically examined the motivations of users who perform griefing (who are also known as ‘griefers’). However, those studies did not examine the motivations of users subjected to griefing. Therefore, the aim of this paper is to examine the factors that motivate the subjects of griefing to play MMORPGs, as well as the factors motivating the griefers.The authors conducted an online survey with the intention to discover the motivations for playing MMORPGs among those whom identified themselves as (i) those that perform griefing, and (ii) those who have been subjected to griefing. A previously devised motivational model by Nick Yee that incorporated ten factors was used to determine the respondents’ motivational trends. In general, players who identified themselves as griefers were more likely to be motivated by all three ‘achievement’ sub-factors (advancement, game mechanics and competition) at the detriment of all other factors. The subjects of griefing were highly motivated by ‘advancement’ and ‘mechanics’, but they ranked ‘competition’ significantly lower (compared to the griefers). In addition, ‘immersion’ factors were rated highly by the respondents who were subjected to griefing, with a significantly higher rating of the ‘escapism’ factor (compared with rankings by griefers). In comparison to the griefers, the respondents subjected to griefing with many years’ experience in the genre of MMORPGs, also placed a greater emphasis on the ‘socializing’ and ‘relationship’ factors. Overall, the griefers in this survey considered ‘achievement’ to be a prime motivating factor, whereas the griefed players tended to be motivated by all ten factors to a similar degree.

computer games | 2018

SoniFight: Software to Provide Additional Sonification Cues to Video Games for Visually Impaired Players

Alastair Lansley; Peter Vamplew; Cameron Foale; Philip Smith

SoniFight is utility software designed to provide additional sonification cues to video games, especially those in the fighting game genre, in order to enhance their accessibility for players who are blind or visually impaired. While the software is distributed with configuration files that add sonification to a number of popular video games, configuration files may also be created or modified to provide user-customisable sonification to a wide variety of other games that run on the Windows platform through its built-in user interface. SoniFight is released under the MIT software license and the source code is freely available for use and modification at: https://github.com/feduni/sonifight.

Neurocomputing | 2018

Non-functional regression: A new challenge for neural networks

Peter Vamplew; Richard Dazeley; Cameron Foale; T. A. Choudhury

Abstract This work identifies an important, previously unaddressed issue for regression based on neural networks – learning to accurately approximate problems where the output is not a function of the input (i.e. where the number of outputs required varies across input space). Such non-functional regression problems arise in a number of applications, and can not be adequately handled by existing neural network algorithms. To demonstrate the benefits possible from directly addressing non-functional regression, this paper proposes the first neural algorithm to do so – an extension of the Resource Allocating Network (RAN) which adds additional output neurons to the network structure during training. This new algorithm, called the Resource Allocating Network with Varying Output Cardinality (RANVOC), is demonstrated to be capable of learning to perform non-functional regression, on both artificially constructed data and also on the real-world task of specifying parameter settings for a plasma-spray process. Importantly RANVOC is shown to outperform not just the original RAN algorithm, but also the best possible error rates achievable by any functional form of regression.

Ethics and Information Technology | 2018

Human-aligned artificial intelligence is a multiobjective problem

Peter Vamplew; Richard Dazeley; Cameron Foale; Sally Firmin; Jane Mummery

As the capabilities of artificial intelligence (AI) systems improve, it becomes important to constrain their actions to ensure their behaviour remains beneficial to humanity. A variety of ethical, legal and safety-based frameworks have been proposed as a basis for designing these constraints. Despite their variations, these frameworks share the common characteristic that decision-making must consider multiple potentially conflicting factors. We demonstrate that these alignment frameworks can be represented as utility functions, but that the widely used Maximum Expected Utility (MEU) paradigm provides insufficient support for such multiobjective decision-making. We show that a Multiobjective Maximum Expected Utility paradigm based on the combination of vector utilities and non-linear action–selection can overcome many of the issues which limit MEU’s effectiveness in implementing aligned AI. We examine existing approaches to multiobjective AI, and identify how these can contribute to the development of human-aligned intelligent agents.

international conference on neural information processing | 2017

Evaluating Accuracy in Prudence Analysis for Cyber Security

Omaru Maruatona; Peter Vamplew; Richard Dazeley; Paul A. Watters

Conventional Knowledge-Based Systems (KBS) have no way of detecting or signalling when their knowledge is insufficient to handle a case. Consequently, these systems may produce an uninformed conclusion when presented with a case beyond their current knowledge (brittleness) which results in the KBS giving incorrect conclusions due to insufficient knowledge or ignorance on a specific case. Prudence Analysis (PA) has been shown to be a viable alternative to brittleness in Ripple Down Rules (RDR) knowledge bases. To date, there have been two approaches to Prudence; attribute-based and structural-based prudence. This paper introduces Integrated Prudence Analysis (IPA), a novel Prudence method formed by combining these methods.

computer games | 2017

Correction to: Griefers Versus the Griefed - What Motivates Them to Play Massively Multiplayer Online Role-Playing Games?

Leigh Achterbosch; Charlynn Miller; Christopher Turville; Peter Vamplew

The original version of this article unfortunately contained a mistake. The family name and the e-mail address of the first author have been incorrectly updated as Leigh Achternbosch (l.achternbosch@federation.edu.au) instead of Leigh Achterbosch (l.achterbosch@federation.edu.au).

Proceedings of the International Conference on Compute and Data Analysis | 2017

An Agile Group Aware Process beyond CRISP-DM: A Hospital Data Mining Case Study

Vishakha Sharma; Andrew Stranieri; Julien Ugon; Peter Vamplew; Laura Martin

The CRISP-DM methodology is commonly used in data analytics exercises within an organisation to provide system and structure to data mining processes. However, in providing a rigorous framework, CRISP-DM overlooks two facets of data analytics in organisational contexts; data mining exercises are far more agile and subject to change than presumed in CRISP-DM and central decisions regarding the interpretation of patterns discovered and the direction of analytics exercises are typically not made by individuals but by committees or groups within an organisation. The current study provides a case study of data mining in a hospital setting and suggests how the agile nature of an analytics exercise and the group reasoning inherent in key decisions can be accommodated within a CRISP-DM methodology.

Explore More