List of Publications
[EOBL19] 
Tom Everitt, Pedro A Ortega, Elizabeth Barnes, and Shane Legg.
Understanding agent incentives using causal influence diagrams. Part
I: Single action settings.
2019. [ bib  arXiv ] 
[LKE^{+}18] 
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane
Legg.
Scalable agent alignment via reward modeling: a research direction.
2018. [ bib  arXiv ] 
[Eve18] 
Tom Everitt.
Towards Safe Artificial General Intelligence.
PhD thesis, Australian National University, May 2018. [ bib  .pdf ] The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in deep learning and reinforcement learning. Computer algorithms now outperform humans at Go, Jeopardy, image classification, and lip reading, and are becoming very competent at driving cars and interpreting natural language. The rapid development has led many to conjecture that artificial intelligence with greaterthanhuman ability on a wide range of tasks may not be far. This in turn raises concerns whether we know how to control such systems, in case we were to successfully build them.

[EH18a] 
Tom Everitt and Marcus Hutter.
The alignment problem for bayesian historybased reinforcement
learners.
Submitted, 2018.
Winner of the AI Alignment Prize. [ bib  .pdf ] Future artificial intelligences may be many times smarter than humans (Bostrom, 2014). If humans should have any chance of controlling such systems, their goals better be aligned with our human goals. Unfortunately, the goals of RL agents as designed today are heavily misaligned with human values for a number of reasons. In this paper, we categorize sources of misalignment, and give examples for each type. We also describe a range of tools for managing misalignment. Combined, the tools yield a number of aligned AI designs, though much future work remains for assessing their practical feasibility.

[ELH18] 
Tom Everitt, Gary Lea, and Marcus Hutter.
AGI safety literature review.
In International Joint Conference on AI (IJCAI), 2018. [ bib  arXiv  http ] The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and uptodate collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

[EH18b] 
Tom Everitt and Marcus Hutter.
Universal artificial intelligence: Practical agents and fundamental
challengs.
In Hussein A. Abbass, Jason Scholz, and Darryn J. Reid, editors,
Foundations of Trusted Autonomy, Studies in Systems, Decision and Control,
chapter 2, pages 1546. Springer, 2018. [ bib  DOI  http ] Foundational theories have contributed greatly to scientific progress in many fields. Examples include ZermeloFraenkel set theory in mathematics, and universal Turing machines in computer science. Universal Artificial Intelligence (UAI) is an increasingly wellstudied foundational theory for artificial intelligence, based on ancient principles in the philosophy of science and modern developments in information and probability theory. Importantly, it refrains from making unrealistic Markov, ergodicity, or stationarity assumptions on the environment. UAI provides a theoretically optimal agent AIXI and principled ideas for constructing practical autonomous agents. The theory also makes it possible to establish formal results on the motivations of AI systems. Such results may greatly enhance the trustability of autonomous agents, and guide design choices towards more robust agent architectures and incentive schemes. Finally, UAI offers a deeper appreciation of fundamental problems such as the induction problem and the explorationexploitation dilemma.

[LMK^{+}17] 
Jan Leike, Miljan Martic, Victoria Krakovna, Pedro Ortega, Tom
Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg.
AI Safety Gridworlds.
ArXiv eprints, November 2017. [ bib  arXiv ] We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to selfmodification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

[EKO^{+}17] 
Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg.
Reinforcement learning with a corrupted reward signal.
In Proceedings of the TwentySixth International Joint
Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia,
August 1926, 2017, pages 47054713, 2017. [ bib  DOI  arXiv  www: ] No realworld reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semisupervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

[MNSEH17] 
Jarryd Martin, Suraj Narayanan S, Tom Everitt, and Marcus Hutter.
Countbased exploration in feature space for reinforcement learning.
In Proceedings of the TwentySixth International Joint
Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia,
August 1926, 2017, pages 24712478, 2017. [ bib  DOI  arXiv  www: ] We introduce a new countbased optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with highdimensional stateaction spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visitcount, which allows the agent to estimate the uncertainty associated with any state. Our phipseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The phiExplorationBonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near stateoftheart results on highdimensional RL benchmarks.

[WBC^{+}17] 
Tobias Wängberg, Mikael Böörs, Elliot Catt, Tom Everitt, and Marcus
Hutter.
A gametheoretic analysis of the offswitch game.
In Tom Everitt, Ben Goertzel, and Alexey Potapov, editors,
Artificial General Intelligence: 10th International Conference, AGI 2017,
Melbourne, VIC, Australia, August 1518, 2017, Proceedings, pages 167177,
Cham, 2017. Springer International Publishing. [ bib  DOI  arXiv  http ] The offswitch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by HadfieldMenell et al. (2016b), the analysis is not fully gametheoretic as the human is modelled as an irrational player, and the robot's best action is only calculated under unrealistic normality and softmax assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot's best action for arbitrary belief and irrationality assumptions.

[EGP17] 
Tom Everitt, Ben Goertzel, and Alexey Potapov, editors.
Artificial General Intelligence: 10th International Conference,
AGI 2017, Melbourne, VIC, Australia, August 1518, 2017, Proceedings.
Springer International Publishing, Cham, 2017. [ bib  DOI  http ] 
[EFDH16] 
Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter.
Selfmodification of policy and utility function in rational agents.
In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors,
Artificial General Intelligence: 9th International Conference, AGI 2016, New
York, NY, USA, July 1619, 2016, Proceedings, pages 111, Cham, 2016.
Springer International Publishing. [ bib  DOI  arXiv ] Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to selfmodify  for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby “escaping” the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the selfmodification possibility is harmless if and only if the value function of the agent anticipates the consequences of selfmodifications and use the current utility function when evaluating the future.

[EH16] 
Tom Everitt and Marcus Hutter.
Avoiding wireheading with value reinforcement learning.
In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors,
Artificial General Intelligence: 9th International Conference, AGI 2016, New
York, NY, USA, July 1619, 2016, Proceedings, pages 1222, Cham, 2016.
Springer International Publishing.
Source
code. [ bib  DOI  arXiv ] How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward  the socalled wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent's belief distributions, and does not require an explicit specification of which actions constitute wireheading.

[MEH16] 
Jarryd Martin, Tom Everitt, and Marcus Hutter.
Death and suicide in universal artificial intelligence.
In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors,
Artificial General Intelligence: 9th International Conference, AGI 2016, New
York, NY, USA, July 1619, 2016, Proceedings, pages 2332, Cham, 2016.
Springer International Publishing. [ bib  DOI  arXiv ] Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent's estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically selfpreserving), and that the agent's posterior belief that it will survive increases over time.

[ELH15] 
Tom Everitt, Jan Leike, and Marcus Hutter.
Sequential extensions of causal and evidential decision theory.
In Toby Walsh, editor, Algorithmic Decision Theory: 4th
International Conference, ADT 2015, Lexington, KY, USA, September 2730,
2015, Proceedings, pages 205221, Cham, 2015. Springer International
Publishing.
Source
code. [ bib  DOI  arXiv ] Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The nondualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.

[EH15a] 
Tom Everitt and Marcus Hutter.
Analytical results on the BFS vs. DFS algorithm selection
problem. Part I: Tree search.
In Bernhard Pfahringer and Jochen Renz, editors, AI 2015:
Advances in Artificial Intelligence: 28th Australasian Joint Conference,
Canberra, ACT, Australia, November 30  December 4, 2015, Proceedings,
pages 157165, Cham, 2015. Springer International Publishing.
Source
code. [ bib  DOI ] Breadthfirst search (BFS) and depthfirst search (DFS) are the two most fundamental search algorithms. We derive approximations of their expected runtimes in complete trees, as a function of tree depth and probabilistic goal distribution. We also demonstrate that the analytical approximations are close to the empirical averages for most parameter settings, and that the results can be used to predict the best algorithm given the relevant problem features.

[EH15b] 
Tom Everitt and Marcus Hutter.
Analytical results on the BFS vs. DFS algorithm selection
problem. Part II: Graph search.
In Bernhard Pfahringer and Jochen Renz, editors, AI 2015:
Advances in Artificial Intelligence: 28th Australasian Joint Conference,
Canberra, ACT, Australia, November 30  December 4, 2015, Proceedings,
pages 166178, Cham, 2015. Springer International Publishing.
Source
code. [ bib  DOI ] The algorithm selection problem asks to select the best algorithm for a given problem. In the companion paper (Everitt and Hutter 2015b), expected runtime was approximated as a function of search depth and probabilistic goal distribution for tree search versions of breadthfirst search (BFS) and depthfirst search (DFS). Here we provide an analogous analysis of BFS and DFS graph search, deriving expected runtime as a function of graph structure and goal distribution. The applicability of the method is demonstrated through analysis of two different grammar problems. The approximations come surprisingly close to empirical reality.

[EH15c] 
Tom Everitt and Marcus Hutter.
A topological approach to metaheuristics: Analytical results on the
BFS vs. DFS algorithm selection problem.
Technical report, Australian National University, 2015. [ bib  arXiv ] Search is a central problem in artificial intelligence, and BFS and DFS the two most fundamental ways to search. In this report we derive results for average BFS and DFS runtime: For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results on two concrete grammar problems. The runtime estimates can be used to select the faster out of BFS and DFS for a given problem, and may form the basis for further analysis of more advanced search methods. Finally, we verify our results experimentally; the analytical approximations come surprisingly close to empirical reality.

[ELH14] 
T. Everitt, T. Lattimore, and M. Hutter.
Free lunch for optimisation under the universal distribution.
In 2014 IEEE Congress on Evolutionary Computation (CEC), pages
167174, July 2014. [ bib  DOI  arXiv ] Function optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured over another. We also prove upper and lower bounds on the size of the free lunch.

[AEH14] 
T. Alpcan, T. Everitt, and M. Hutter.
Can we measure the difficulty of an optimization problem?
In Information Theory Workshop (ITW), 2014 IEEE, pages
356360, Nov 2014. [ bib  DOI  .pdf ] Can we measure the difficulty of an optimization problem? Although optimization plays a crucial role in modern science and technology, a formal framework that puts problems and solution algorithms into a broader context has not been established. This paper presents a conceptual approach which gives a positive answer to the question for a broad class of optimization problems. Adopting an information and computational perspective, the proposed framework builds upon Shannon and algorithmic information theories. As a starting point, a concrete model and definition of optimization problems is provided. Then, a formal definition of optimization difficulty is introduced which builds upon algorithmic information theory. Following an initial analysis, lower and upper bounds on optimization difficulty are established. One of the upperbounds is closely related to Shannon information theory and blackbox optimization. Finally, various computational issues and future research directions are discussed.

[Eve13] 
Tom Everitt.
Universal induction and optimisation: No free lunch?
MSc thesis, Stockholm University, 2013. [ bib  .pdf ] Inductive reasoning is the process of making uncertain but justied inferences; often the goal is to infer a general theory from particular observations. Despite being a central problem in both science and philosophy, a formal understanding of induction was long missing. In 1964, substantial progress was made with Solomononoff's universal induction. Solomonoff formalized Occam's razor by means of algorithmic information theory, and used this to construct a universal Bayesian prior for sequence prediction. The first part of this thesis gives a comprehensive overview of Solomonoff's theory of induction. The optimisation problem of finding the argmax of an unknown function can be approached as an induction problem. However, optimisation differs in important respects from sequence prediction. We adapt universal induction to optimisation, and investigate its performance by putting it against the socalled No Free Lunch (NFL) theorems. The NFL theorems show that under certain conditions, effective optimisation is impossible. We conclude that while universal induction avoids the classical NFL theorems, it does not work nearly as well in optimisation as in sequence prediction.

[Eve10] 
Tom Everitt.
Automated Theorem Proving.
BSc thesis, Stockholm University, 2010. [ bib  .pdf ] The calculator was a great invention for the mathematician. No longer was it necessary to spend the main part of the time doing tedious but trivial arithmetic computations. A machine could do it both faster and more accurately. A similar revolution might be just around the corner for proof searching, the perhaps most time consuming part of the modern mathematician's work. In this essay we present the Resolution procedure, an algorithm that finds proofs for statements in propositional and firstorder logic. This means that any true statement (expressible in either of these logics), in principle can be proven by a computer. In fact, there are already practically usable implementations available; here we will illustrate the usage of one such program, Prover9, by some examples. Just as many other theorem provers, Prover9 is able to prove many nontrivial true statements surprisingly fast.

This file was generated by bibtex2html 1.98.