Tom EverittPhD Student
Australian National University (ANU)
Email: tom.everitt at anu.edu.au
I'm a PhD student in computer science/artificial intelligence (AI) at the Australian National University. My supervisor is Prof. Marcus Hutter.
I'm working on AI Safety, i.e. how we can safely use AI with greater-than-human intelligence.
Reinforcement Learning with Corrupted Reward Channel Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. In IJCAI-17 and arXiv, 2017. Slides.
I'm PC-chair for Artificial General Intelligence (AGI-17) in Melbourne.
Background. An easy way to get an AI to do what we want it to do is to build it to optimise a reward signal. We give the AI reward when it does something good, and no reward if it does something bad (carrot and stick principle, aka reinforcement learning). By construction, all the AI want is to optimise the reward signal, which means that the AI will want to do things that we think are good so that we give it reward.
My draft book chapter and AGI tutorial offer gentle introductions to the formal model I use in my other works:
Self-modification. One risk is that the AI modifies itself to get an easier goal than optimising reward given by capricious humans. It turns out that depending on subtle details in how the AI is constructed, self-modification may be a risk or not:
Self-Modification of Policy and Utility Function in Rational Agents Tom Everitt, Daniel Filan, Mayank Daswani, and Marcus Hutter. In AGI-16 and arXiv, 2016. Slides, video. Winner of the Kurzweil prize for best AGI paper.
Wireheading. There is also a risk that AI finds a way to counterfeit reward by modifying its reward sensor to always report maximum reward. This is called wireheading.
Self-preservation and death. Another risk with AIs is the self-preservation drive. Much like humans and animals, AIs may have a strong desire not to be shut off or being terminated. This self-preservation drive may lead the AI to hide its true powers (so that we don't become scared of it and shut it off).
There is a natural mathematical definition of death in the Universal Artificial Intelligence framework. It turns out that there are both self-preserving and suicidal RL agents:
For cooperative IRL agents, the self-preservation drive is more balanced, with the self-preservation drive often following the desires of the human supervisor:
Decision theory. Strangely, robots and other agents that are part of their environment may be able to infer properties of themselves from their own actions. For example, my having petted a lot of cats in the past may be evidence that I have toxoplasmosis, a disease which makes you fond of cats. Now, if I see a cat, should I avoid petting it to reduce the risk that I have the disease? (note that petting cats never causes toxoplasmosis). The two standard answers for how to reason in this situation are called CDT and EDT. We show that CDT and EDT turns into three possibilities for how to reason in sequential settings where multiple actions are interleaved with observations:
Background. Search and optimisation are fundamental aspects of AI and of intelligence in general. Intelligence can actually be defined as optimisation ability (Legg and Hutter, Universal Intelligence: A Definition of Machine Intelligence, 2007).
(No) Free Lunch. The No Free Lunch theorems state that intelligent optimisation is impossible without knowledge about what you're trying optimise. I argue against these theorems, and show that under a natural definition of complete uncertainty, intelligent (better-than-random) optimisation is possible. Unfortunately, I was also able to show that there are pretty strong limits on how much better intelligent search can be compared to random search.
Universal Induction and Optimisation: No Free Lunch? Tom Everitt Supervised by Tor Lattimore, Peter Sunehag, and Marcus Hutter at ANU. Master thesis, Department of Mathematics, Stockholm University, 2013.
Optimisation difficulty. In a related paper, we give a formal definition of how hard a function is to optimise:
How to search. Two of the most fundamental strategies for search is DFS and BFS. In DFS, you search depth-first; for example, you follow one path until its very end before trying something else. In BFS, you instead try to search as broadly as possible, focusing on breadth rather than depth. I calculate the expected search times for both methods, and derive some results on which method is preferable in which situations:
Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part I, Tree Search. Tom Everitt and Marcus Hutter. In 28th Australasian Joint Conference on AI and arXiv, 2015. Slides, Source Code.
Analytical Results on the BFS vs. DFS Algorithm Selection Problem. Part II, Graph Search. Tom Everitt and Marcus Hutter. In 28th Australasian Joint Conference on AI and arXiv, 2015. Slides, Source Code.
Automated Theorem Proving. Tom Everitt, Supervised by Rikard Bøgvad. Bachelor thesis, Department of Mathematics, Stockholm University, 2010.
AGI Safety and Understanding Invited talk, AGI-17. Slides
AI and the future -- Introduction to AI Safety. ANU Regnet, ANU Learning Communities, ANU XSA (XSAC talk), LessWrong Canberra, 2016 and 2017 Slides
AI Safety -- Overview of recent models and results. Effective Altruism Sydney retreat 2016 Slides
Jarryd Martin, Master of Computing (ANU) Thesis: Optimism and Death in Reinforcement Learning Papers: Count-Based Exploration in Feature Space (IJCAI-17), and Death and Suicide in Universal Artificial Intelligence (AGI-16)
Suraj Narayanan S, Master of Computing (ANU) Thesis: Exploration in Feature Space Paper: Count-Based Exploration in Feature Space (IJCAI-17)
Tobias Wängberg and Mikael Böörs, Bachelor of Mathematics (LiU) Thesis: Classification by Decomposition: A Partitioning of the Space of 2x2 Symmetric Games Paper: A Game-Theoretic Analysis of the Off-Switch Game (AGI-17)
My CV (updated 2017-01-04).
Formal models of reasoning, decision-making, and (artificial) intelligence and their implications for the AI Safety problem. The AI Safety problem is well described in Bostrom's book Superintelligence. See also intelligence.org.
Universal induction (aka Solomonoff induction), which is a general, mathematical solution to the induction problem based on Algorithmic Information Theory and Bayes' rule (see Universal Artificial Intelligence and A Philosophical Treatise of Universal Induction).
Cooperation theory. Robert Axelrod's brilliant computer tournaments and subsequent book The Evolution of Cooperation sparked this field, which investigates when self-interested individuals end up cooperating. See also Nowak's Supercooperators.
Inference of causal relationships from observational data.
Co-founder of Kaus.se, a webservice that lets you see how different aspects of your life affect each other. The analysis is based on automatically and/or manually recorded life data, and is presented as a directed graph.