List of Publications

[EOBL19] Tom Everitt, Pedro A Ortega, Elizabeth Barnes, and Shane Legg. Understanding agent incentives using causal influence diagrams. Part I: Single action settings. 2019.
bib | arXiv ]
[LKE+18] Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. 2018.
bib | arXiv ]
[Eve18] Tom Everitt. Towards Safe Artificial General Intelligence. PhD thesis, Australian National University, May 2018.
bib | .pdf ]
The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in deep learning and reinforcement learning. Computer algorithms now outperform humans at Go, Jeopardy, image classification, and lip reading, and are becoming very competent at driving cars and interpreting natural language. The rapid development has led many to conjecture that artificial intelligence with greater-than-human ability on a wide range of tasks may not be far. This in turn raises concerns whether we know how to control such systems, in case we were to successfully build them.

Indeed, if humanity would find itself in conflict with a system of much greater intelligence than itself, then human society would likely lose. One way to make sure we avoid such a conflict is to ensure that any future AI system with potentially greater-than-human-intelligence has goals that are aligned with the goals of the rest of humanity. For example, it should not wish to kill humans or steal their resources.

The main focus of this thesis will therefore be goal alignment, i.e. how to design artificially intelligent agents with goals coinciding with the goals of their designers. Focus will mainly be directed towards variants of reinforcement learning, as reinforcement learning currently seems to be the most promising path towards powerful artificial intelligence. We identify and categorize goal misalignment problems in reinforcement learning agents as designed today, and give examples of how these agents may cause catastrophes in the future. We also suggest a number of reasonably modest modifications that can be used to avoid or mitigate each identified misalignment problem. Finally, we also study various choices of decision algorithms, and conditions for when a powerful reinforcement learning system will permit us to shut it down.

The central conclusion is that while reinforcement learning systems as designed today are inherently unsafe to scale to human levels of intelligence, there are ways to potentially address many of these issues without straying too far from the currently so successful reinforcement learning paradigm. Much work remains in turning the high-level proposals suggested in this thesis into practical algorithms, however.

[EH18a] Tom Everitt and Marcus Hutter. The alignment problem for bayesian history-based reinforcement learners. Submitted, 2018. Winner of the AI Alignment Prize.
bib | .pdf ]
Future artificial intelligences may be many times smarter than humans (Bostrom, 2014). If humans should have any chance of controlling such systems, their goals better be aligned with our human goals. Unfortunately, the goals of RL agents as designed today are heavily misaligned with human values for a number of reasons. In this paper, we categorize sources of misalignment, and give examples for each type. We also describe a range of tools for managing misalignment. Combined, the tools yield a number of aligned AI designs, though much future work remains for assessing their practical feasibility.

[ELH18] Tom Everitt, Gary Lea, and Marcus Hutter. AGI safety literature review. In International Joint Conference on AI (IJCAI), 2018.
bib | arXiv | http ]
The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

[EH18b] Tom Everitt and Marcus Hutter. Universal artificial intelligence: Practical agents and fundamental challengs. In Hussein A. Abbass, Jason Scholz, and Darryn J. Reid, editors, Foundations of Trusted Autonomy, Studies in Systems, Decision and Control, chapter 2, pages 15--46. Springer, 2018.
bib | DOI | http ]
Foundational theories have contributed greatly to scientific progress in many fields. Examples include Zermelo-Fraenkel set theory in mathematics, and universal Turing machines in computer science. Universal Artificial Intelligence (UAI) is an increasingly well-studied foundational theory for artificial intelligence, based on ancient principles in the philosophy of science and modern developments in information and probability theory. Importantly, it refrains from making unrealistic Markov, ergodicity, or stationarity assumptions on the environment. UAI provides a theoretically optimal agent AIXI and principled ideas for constructing practical autonomous agents. The theory also makes it possible to establish formal results on the motivations of AI systems. Such results may greatly enhance the trustability of autonomous agents, and guide design choices towards more robust agent architectures and incentive schemes. Finally, UAI offers a deeper appreciation of fundamental problems such as the induction problem and the exploration-exploitation dilemma.

[LMK+17] Jan Leike, Miljan Martic, Victoria Krakovna, Pedro Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. AI Safety Gridworlds. ArXiv e-prints, November 2017.
bib | arXiv ]
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

[EKO+17] Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. Reinforcement learning with a corrupted reward signal. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-26, 2017, pages 4705--4713, 2017.
bib | DOI | arXiv | www: ]
No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

[MNSEH17] Jarryd Martin, Suraj Narayanan S, Tom Everitt, and Marcus Hutter. Count-based exploration in feature space for reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-26, 2017, pages 2471--2478, 2017.
bib | DOI | arXiv | www: ]
We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

[WBC+17] Tobias Wängberg, Mikael Böörs, Elliot Catt, Tom Everitt, and Marcus Hutter. A game-theoretic analysis of the off-switch game. In Tom Everitt, Ben Goertzel, and Alexey Potapov, editors, Artificial General Intelligence: 10th International Conference, AGI 2017, Melbourne, VIC, Australia, August 15-18, 2017, Proceedings, pages 167--177, Cham, 2017. Springer International Publishing.
bib | DOI | arXiv | http ]
The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot's best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot's best action for arbitrary belief and irrationality assumptions.

[EGP17] Tom Everitt, Ben Goertzel, and Alexey Potapov, editors. Artificial General Intelligence: 10th International Conference, AGI 2017, Melbourne, VIC, Australia, August 15-18, 2017, Proceedings. Springer International Publishing, Cham, 2017.
bib | DOI | http ]
[EFDH16] Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter. Self-modification of policy and utility function in rational agents. In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors, Artificial General Intelligence: 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings, pages 1--11, Cham, 2016. Springer International Publishing.
bib | DOI | arXiv ]
Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby “escaping” the control of their creators. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.

[EH16] Tom Everitt and Marcus Hutter. Avoiding wireheading with value reinforcement learning. In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors, Artificial General Intelligence: 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings, pages 12--22, Cham, 2016. Springer International Publishing. Source code.
bib | DOI | arXiv ]
How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) may seem like a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent’s actions. The constraint is defined in terms of the agent's belief distributions, and does not require an explicit specification of which actions constitute wireheading.

[MEH16] Jarryd Martin, Tom Everitt, and Marcus Hutter. Death and suicide in universal artificial intelligence. In Bas Steunebrink, Pei Wang, and Ben Goertzel, editors, Artificial General Intelligence: 9th International Conference, AGI 2016, New York, NY, USA, July 16-19, 2016, Proceedings, pages 23--32, Cham, 2016. Springer International Publishing.
bib | DOI | arXiv ]
Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent's estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent's posterior belief that it will survive increases over time.

[ELH15] Tom Everitt, Jan Leike, and Marcus Hutter. Sequential extensions of causal and evidential decision theory. In Toby Walsh, editor, Algorithmic Decision Theory: 4th International Conference, ADT 2015, Lexington, KY, USA, September 27-30, 2015, Proceedings, pages 205--221, Cham, 2015. Springer International Publishing. Source code.
bib | DOI | arXiv ]
Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.

[EH15a] Tom Everitt and Marcus Hutter. Analytical results on the BFS vs. DFS algorithm selection problem. Part I: Tree search. In Bernhard Pfahringer and Jochen Renz, editors, AI 2015: Advances in Artificial Intelligence: 28th Australasian Joint Conference, Canberra, ACT, Australia, November 30 -- December 4, 2015, Proceedings, pages 157--165, Cham, 2015. Springer International Publishing. Source code.
bib | DOI ]
Breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental search algorithms. We derive approximations of their expected runtimes in complete trees, as a function of tree depth and probabilistic goal distribution. We also demonstrate that the analytical approximations are close to the empirical averages for most parameter settings, and that the results can be used to predict the best algorithm given the relevant problem features.

[EH15b] Tom Everitt and Marcus Hutter. Analytical results on the BFS vs. DFS algorithm selection problem. Part II: Graph search. In Bernhard Pfahringer and Jochen Renz, editors, AI 2015: Advances in Artificial Intelligence: 28th Australasian Joint Conference, Canberra, ACT, Australia, November 30 -- December 4, 2015, Proceedings, pages 166--178, Cham, 2015. Springer International Publishing. Source code.
bib | DOI ]
The algorithm selection problem asks to select the best algorithm for a given problem. In the companion paper (Everitt and Hutter 2015b), expected runtime was approximated as a function of search depth and probabilistic goal distribution for tree search versions of breadth-first search (BFS) and depth-first search (DFS). Here we provide an analogous analysis of BFS and DFS graph search, deriving expected runtime as a function of graph structure and goal distribution. The applicability of the method is demonstrated through analysis of two different grammar problems. The approximations come surprisingly close to empirical reality.

[EH15c] Tom Everitt and Marcus Hutter. A topological approach to meta-heuristics: Analytical results on the BFS vs. DFS algorithm selection problem. Technical report, Australian National University, 2015.
bib | arXiv ]
Search is a central problem in artificial intelligence, and BFS and DFS the two most fundamental ways to search. In this report we derive results for average BFS and DFS runtime: For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results on two concrete grammar problems. The runtime estimates can be used to select the faster out of BFS and DFS for a given problem, and may form the basis for further analysis of more advanced search methods. Finally, we verify our results experimentally; the analytical approximations come surprisingly close to empirical reality.

[ELH14] T. Everitt, T. Lattimore, and M. Hutter. Free lunch for optimisation under the universal distribution. In 2014 IEEE Congress on Evolutionary Computation (CEC), pages 167--174, July 2014.
bib | DOI | arXiv ]
Function optimisation is a major challenge in computer science. The No Free Lunch theorems state that if all functions with the same histogram are assumed to be equally probable then no algorithm outperforms any other in expectation. We argue against the uniform assumption and suggest a universal prior exists for which there is a free lunch, but where no particular class of functions is favoured over another. We also prove upper and lower bounds on the size of the free lunch.

[AEH14] T. Alpcan, T. Everitt, and M. Hutter. Can we measure the difficulty of an optimization problem? In Information Theory Workshop (ITW), 2014 IEEE, pages 356--360, Nov 2014.
bib | DOI | .pdf ]
Can we measure the difficulty of an optimization problem? Although optimization plays a crucial role in modern science and technology, a formal framework that puts problems and solution algorithms into a broader context has not been established. This paper presents a conceptual approach which gives a positive answer to the question for a broad class of optimization problems. Adopting an information and computational perspective, the proposed framework builds upon Shannon and algorithmic information theories. As a starting point, a concrete model and definition of optimization problems is provided. Then, a formal definition of optimization difficulty is introduced which builds upon algorithmic information theory. Following an initial analysis, lower and upper bounds on optimization difficulty are established. One of the upper-bounds is closely related to Shannon information theory and black-box optimization. Finally, various computational issues and future research directions are discussed.

[Eve13] Tom Everitt. Universal induction and optimisation: No free lunch? MSc thesis, Stockholm University, 2013.
bib | .pdf ]
Inductive reasoning is the process of making uncertain but justied inferences; often the goal is to infer a general theory from particular observations. Despite being a central problem in both science and philosophy, a formal understanding of induction was long missing. In 1964, substantial progress was made with Solomononoff's universal induction. Solomonoff formalized Occam's razor by means of algorithmic information theory, and used this to construct a universal Bayesian prior for sequence prediction. The first part of this thesis gives a comprehensive overview of Solomonoff's theory of induction. The optimisation problem of finding the argmax of an unknown function can be approached as an induction problem. However, optimisation differs in important respects from sequence prediction. We adapt universal induction to optimisation, and investigate its performance by putting it against the so-called No Free Lunch (NFL) theorems. The NFL theorems show that under certain conditions, effective optimisation is impossible. We conclude that while universal induction avoids the classical NFL theorems, it does not work nearly as well in optimisation as in sequence prediction.

[Eve10] Tom Everitt. Automated Theorem Proving. BSc thesis, Stockholm University, 2010.
bib | .pdf ]
The calculator was a great invention for the mathematician. No longer was it necessary to spend the main part of the time doing tedious but trivial arithmetic computations. A machine could do it both faster and more accurately. A similar revolution might be just around the corner for proof searching, the perhaps most time consuming part of the modern mathematician's work. In this essay we present the Resolution procedure, an algorithm that finds proofs for statements in propositional and first-order logic. This means that any true statement (expressible in either of these logics), in principle can be proven by a computer. In fact, there are already practically usable implementations available; here we will illustrate the usage of one such program, Prover9, by some examples. Just as many other theorem provers, Prover9 is able to prove many non-trivial true statements surprisingly fast.

This file was generated by bibtex2html 1.98.