Is this you? Create Your Porfile

Tom Everitt

Australian National University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tom Everitt is active.

Explore More

Publication

Featured researches published by Tom Everitt.

congress on evolutionary computation | 2014

Free Lunch for Optimisation under the Universal Distribution

Tom Everitt; Tor Lattimore; Marcus Hutter

Function optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured over another. We also prove upper and lower bounds on the size of the free lunch.

artificial general intelligence | 2016

Self-Modification of Policy and Utility Function in Rational Agents

Tom Everitt; Daniel Filan; Mayank Daswani; Marcus Hutter

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify – for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby ‘escaping’ the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

australasian joint conference on artificial intelligence | 2015

Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part I: Tree Search

Tom Everitt; Marcus Hutter

The algorithm selection problem asks to select the best algorithm for a given problem. In the companion paper (Everitt and Hutter 2015b), expected runtime was approximated as a function of search depth and probabilistic goal distribution for tree search versions of breadth-first search (BFS) and depth-first search (DFS). Here we provide an analogous analysis of BFS and DFS graph search, deriving expected runtime as a function of graph structure and goal distribution. The applicability of the method is demonstrated through analysis of two different grammar problems. The approximations come surprisingly close to empirical reality.

artificial general intelligence | 2016

Avoiding Wireheading with Value Reinforcement Learning

Tom Everitt; Marcus Hutter

How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward – the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent’s belief distributions, and does not require an explicit specification of which actions constitute wireheading.

international joint conference on artificial intelligence | 2017

Count-Based Exploration in Feature Space for Reinforcement Learning

Marcus Hutter; Jarryd Martin; S Suraj Narayanan; Tom Everitt

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

artificial general intelligence | 2016

Death and Suicide in Universal Artificial Intelligence

Jarryd Martin; Tom Everitt; Marcus Hutter

Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent’s estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent’s posterior belief that it will survive increases over time.

algorithmic decision theory | 2015

Sequential Extensions of Causal and Evidential Decision Theory

Tom Everitt; Jan Leike; Marcus Hutter

Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.

information theory workshop | 2014

Can we measure the difficulty of an optimization problem

Tansu Alpcan; Tom Everitt; Marcus Hutter

Can we measure the difficulty of an optimization problem? Although optimization plays a crucial role in modern science and technology, a formal framework that puts problems and solution algorithms into a broader context has not been established. This paper presents a conceptual approach which gives a positive answer to the question for a broad class of optimization problems. Adopting an information and computational perspective, the proposed framework builds upon Shannon and algorithmic information theories. As a starting point, a concrete model and definition of optimization problems is provided. Then, a formal definition of optimization difficulty is introduced which builds upon algorithmic information theory. Following an initial analysis, lower and upper bounds on optimization difficulty are established. One of the upper-bounds is closely related to Shannon information theory and black-box optimization. Finally, various computational issues and future research directions are discussed.

international joint conference on artificial intelligence | 2017

Reinforcement Learning with a Corrupted Reward Channel

Tom Everitt; Victoria Krakovna; Laurent Orseau; Shane Legg

No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agents optimisation, reward corruption can be partially managed under some assumptions.

international joint conference on artificial intelligence | 2018

AGI Safety Literature Review.

Tom Everitt; Gary Lea; Marcus Hutter

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

Explore More