Tom Everitt
Australian National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tom Everitt.
congress on evolutionary computation | 2014
Tom Everitt; Tor Lattimore; Marcus Hutter
Function optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured over another. We also prove upper and lower bounds on the size of the free lunch.
artificial general intelligence | 2016
Tom Everitt; Daniel Filan; Mayank Daswani; Marcus Hutter
Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.
australasian joint conference on artificial intelligence | 2015
Tom Everitt; Marcus Hutter
The algorithm selection problem asks to select the best algorithm for a given problem. In the companion paper (Everitt and Hutter 2015b), expected runtime was approximated as a function of search depth and probabilistic goal distribution for tree search versions of breadth-first search (BFS) and depth-first search (DFS). Here we provide an analogous analysis of BFS and DFS graph search, deriving expected runtime as a function of graph structure and goal distribution. The applicability of the method is demonstrated through analysis of two different grammar problems. The approximations come surprisingly close to empirical reality.
artificial general intelligence | 2016
Tom Everitt; Marcus Hutter
How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.
international joint conference on artificial intelligence | 2017
Marcus Hutter; Jarryd Martin; S Suraj Narayanan; Tom Everitt
We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.
artificial general intelligence | 2016
Jarryd Martin; Tom Everitt; Marcus Hutter
Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent’s estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent’s posterior belief that it will survive increases over time.
algorithmic decision theory | 2015
Tom Everitt; Jan Leike; Marcus Hutter
Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.
information theory workshop | 2014
Tansu Alpcan; Tom Everitt; Marcus Hutter
Can we measure the difficulty of an optimization problem? Although optimization plays a crucial role in modern science and technology, a formal framework that puts problems and solution algorithms into a broader context has not been established. This paper presents a conceptual approach which gives a positive answer to the question for a broad class of optimization problems. Adopting an information and computational perspective, the proposed framework builds upon Shannon and algorithmic information theories. As a starting point, a concrete model and definition of optimization problems is provided. Then, a formal definition of optimization difficulty is introduced which builds upon algorithmic information theory. Following an initial analysis, lower and upper bounds on optimization difficulty are established. One of the upper-bounds is closely related to Shannon information theory and black-box optimization. Finally, various computational issues and future research directions are discussed.
international joint conference on artificial intelligence | 2017
Tom Everitt; Victoria Krakovna; Laurent Orseau; Shane Legg
No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agents optimisation, reward corruption can be partially managed under some assumptions.
international joint conference on artificial intelligence | 2018
Tom Everitt; Gary Lea; Marcus Hutter
The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.